Page MenuHomePhabricator

Unprototype Drydock (v1)
Closed, ResolvedPublic

Assigned To
Authored By
epriestley
Aug 24 2015, 4:43 PM
Referenced Files
F960257: likefromwall.jpeg
Nov 11 2015, 8:03 AM
F758418: .jpg
Aug 26 2015, 2:28 AM
Tokens
"Like" token, awarded by thoughtpolice."Mountain of Wealth" token, awarded by hach-que.

Description

Drydock is a resource allocation system for hardware and software. It is mostly an infrastructure component which supports other applications, not an application that normal users are expected to interact with much.

The primary use case for Drydock is creating, managing, and destroying repository working copies for build systems. In particular, these are the short-term use cases:

  • (T9123) Harbormaster should be able to ask Drydock to give it a working copy containing an arbitrary commit, then run build processes in that working copy.
  • (T182) Differential should be able to ask Drydock to give it a working copy so it can commit a revision.

In the long term, Drydock will be able to build resources incrementally: you tell it how to allocate hosts and other hardware resources, and it manages pools of hardware and software to satisfy these requests.

For v1, the focus is on enabling T9123 + T182 by allocating working copies, not incremental resource construction or hardware resource management. Roughly, this means:

  • Hardware is in static, pre-allocated pools in Almanac.
  • Push as much dynamic/incremental allocation to later versions as possible.

Revisions and Commits

rP Phabricator
D14349
D14334
D14274
D14272
D14237
D14236
D14235
D14234
D14224
D14215
D14214
D14213
D14212
D14211
D14210
D14202
D14201
D14198
D14197
D14196
D14194
D14180
D14178
D14177
D14161
D14160
D14158
D14157
D14156
D14155
D14154
D14153
D14151
D14150
D14147
D14144
D14143

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

These things are now complete:

  • Resources and leases have a real destruction phase.
  • Resources now have sensible policies.
  • Leases now have formal expiration behaviors.
  • Resource and lease statuses are now consistent (and are now strings).
  • Leases now have a resourcePHID.
  • Blueprints can be disabled, preventing them from allocating new resources or acquiring new leases.
  • Various UI improvements.

These things remain:

  • Logging is still mostly untouched and way under the level it should be at.
  • Yields / temporary failures / permanent failures are still very coarse.
  • Lease policies are still a bit odd.
  • Resources still do not have expiration behaviors.
  • Security landscape isn't documented yet.
  • All the direct writes for phase changes are still non-transactional.
  • Recovery/retry behavior is pretty good if failures happen right away, but not as good if something allocates and then breaks later.

Broadly, T7399 has progressed far enough to let code run on the sbuild tier without fear that we're potentially leaking live cluster key material. I configured blueprints and a build plan on this host and we can now successfully execute "builds":

https://secure.phabricator.com/harbormaster/build/9161/

These "builds" have a lot of weird stuff going on still: for example, we queue a working copy lease, it acquires about 500ms later, then we spend 14500ms waiting around for no reason and 500ms doing the "build". However, this is easy to fix by tweaking yield heuristics or letting the allocator awaken the Harbormaster worker after allocation (which might be trivial).


The major blockers for T9123 (building Phabricator in Phabricator) on the Drydock side are:

  • Blueprint/resource selection, per above. We don't want normal builds running on the saux (higher-trust) tier, but there's currently no way to prevent it. This doesn't block T9123 but does block T182.
  • Running arc is harder for us than for other projects: we can't just put arc on the host in $PATH when building libphutil or arcanist, since it needs to run the version of arc being tested. We also want to use libphutil and arcanist at HEAD of master, even when running tests on phabricator -- not just whatever was last deployed to the box. I don't have a concrete plan for this yet. I think it probably takes the form of letting WorkingCopy blueprints be collections of working copies instead of single working copies.

Resources still do not have expiration behaviors.

This is fixed, with some caveats about range of capabilities in D14176.

we queue a working copy lease, it acquires about 500ms later, then we spend 14500ms waiting around for no reason and 500ms doing the "build".

This is fixed, and we now do acquire + activate + "build" + release in ~1-2 seconds for libphutil/ on this host.

I think it probably takes the form of letting WorkingCopy blueprints be collections of working copies instead of single working copies.

This part is implemented now, although I haven't figured out how users are going to configure it.

Blueprint/resource selection, per above.

I have some ideas on this but nothing concrete yet.

Logging is still mostly untouched and way under the level it should be at.
Yields / temporary failures / permanent failures are still very coarse.
Recovery/retry behavior is pretty good if failures happen right away, but not as good if something allocates and then breaks later.

This stuff is still highly sketchy and probably up next.

@epriestley I just tried to start reimplementing some of my patches on top of HEAD and I've run into a problem:

I need to yield within allocateResource, because the blueprint needs to wait for the IP address to be assigned to the host, but we can't call sleep. However, the allocateResource method doesn't have a resource because it creates one, and presumably if I yield, there's no guarantee that the allocator will continue in the same place?

Allocate, but don't setActivateWhenAllocated(). You'll get a callback to activateResource() later. Check for an IP. If you have one, set it on the resource and call activateResource() to finish activation. If you don't have one yet, throw a yield and you'll get another call later. Repeat until you get an IP. Does that sound approximately reasonable?

e.g.

public function activateResource(
  DrydockBlueprint $blueprint,
  DrydockResource $resource) {

  $ec2_key = $resource->getAttribute('key-in-ec2');

  $ip = hey_ec2_is_there_an_ip_yet($ec2_key);
  if (!$ip) {
    throw yield;
  }

  $resource
    ->setAttribute('ip', $ip)
    ->activateResource();
}

Couple of issues I've seen so far:

  • If a WorkingCopy build step is restarted while getting a working copy, it doesn't clean up the lease. This is because we don't emit an artifact until the very end. We either need to emit the artifact sooner or have a separate cleanup step for other target resources. I'm inclined to just emit the artifact sooner. The build won't move forward until the build step completes, anyway, so it's OK that there's no formal "incomplete artifact" state.
  • If we try to run two concurrent builds, the WorkingCopy blueprint is currently fine with bringing up an unlimited number of resources, but hosts are currently limited to one lease. This can give us resources which will never activate, since they're waiting for a host indefinitely. These limits don't make sense as-is anyway, but this interaction is sort of subtle and may need some finesse to resolve.

We are now building all of the repositories, and all revisions submitted by members of Community.

This stuff is now fixed:

  • Harbormaster and Drydock now guarantee destruction of leases despite aborts/releases.
  • We no longer degrade if there is a burst of requests, but see substantial discussion in D14236 about refining this in the future.
  • Logging is better, although still needs some work.
  • I'm probably not going to make state-change writes non-transactional in v1 since logging does a pretty reasonable job of covering that now.
  • Error handling and distinguishing between temporary and permanent failures is greatly improved. It will still take some time to stabilize, but recent issues have been about cleaning up edge cases, not fundamental mishandling of error states.
  • We're better about dealing with some kinds of resource breaks after activation. These breaks are hard to encounter in the upstream today (all reasonable breaks require operator intervention to resolve anyway) so I don't expect to make this too much more robust in the short term.
  • There's a tiny bit of documentation.

This stuff still needs work:

  • Per above, log observability is better but still isn't great.
  • Documentation is still mostly nonexistent.
  • Blueprint/resource selection stuff still doesn't meaningfully exist.
  • A bunch of limits (mostly, see D14236) are hard-coded and set to nonsense values (usually "1").
  • Lease policies are still a bit odd (although maybe they're just always going to be a bit odd?)

Progress here:

  • There's now a little bit more documentation.
  • Blueprint selection feels reasonable for v1 generally (see T9519 for discussion).
  • I think after D14272 + D14274 the allocator behaves correctly in production (on this one install, in a very limited role, etc). It's a little early to say that I actually fixed all the bugs, but the behavior appeared nearly correct before and the effects of the bugs those changes fixed were pretty straightforward.

Overall, except for the stuff fixed above, things have been working well for a while. Drydock now handles multiple task types (revision builds, commit builds, lands) across multiple pools (saux, sbuild) and seems to be functioning as designed. By all appearances, we could dump as much hardware into these pools as we wanted and scale until MySQL eventually falls over as a coordination server.

Stuff I'm still looking at:

  • Logs will probably get a little more work, although they've been not-terrible for the last few issues I've hit.
  • I'll continue fleshing out the documentation, it's like halfway to where it probably should be for an unprototype.
  • Not really concerned about lease policies for now, probably a v2+ thing if we deal with it.
  • I need to move the hard-coded limits of 1 to config. That's easy, but I want to think about what it will look like in v2/v3 and try to make sure we're moving in that direction rather than somewhere we'll need to migrate away from later.
  • Resource cleanup/release has a big manual component for now but that actually feels fine in practice today. This isn't scalable in the long term but I'm not overly concerned about solving it completely for v1.
epriestley claimed this task.

I'm going to close this out as I think we're generally in good shape here and we're now hosting builds and doing server-side lands in the upstream, and don't anticipate much more Drydock-specific work in this iteration.

I'm not actually unprototyping Drydock yet (and may not for a while) since we also have to unprototype Almanac for it to be useful and they both interact with Phacility. I want to let it stabilize for a while first before we try to do that integration.