Page MenuHomePhabricator

Unprototype Drydock (v1)
Closed, ResolvedPublic

Description

Drydock is a resource allocation system for hardware and software. It is mostly an infrastructure component which supports other applications, not an application that normal users are expected to interact with much.

The primary use case for Drydock is creating, managing, and destroying repository working copies for build systems. In particular, these are the short-term use cases:

  • (T9123) Harbormaster should be able to ask Drydock to give it a working copy containing an arbitrary commit, then run build processes in that working copy.
  • (T182) Differential should be able to ask Drydock to give it a working copy so it can commit a revision.

In the long term, Drydock will be able to build resources incrementally: you tell it how to allocate hosts and other hardware resources, and it manages pools of hardware and software to satisfy these requests.

For v1, the focus is on enabling T9123 + T182 by allocating working copies, not incremental resource construction or hardware resource management. Roughly, this means:

  • Hardware is in static, pre-allocated pools in Almanac.
  • Push as much dynamic/incremental allocation to later versions as possible.

Details

Commits
D14349 / rPa763f9510e76: Add some Drydock documentation plus "Test Configuration" for repository…
D14334 / rPc059149eb98e: Remove Drydock host resource limits and give working copies simple limits
D14274 / rPac7edf54afe4: Fix bad counting in SQL when enforcing Drydock allocator soft limits
D14272 / rP083a321dad1b: Fix an issue where newly created Drydock resources could be improperly acquired
D14237 / rP4d5278af1148: Put Drydock build steps into their own group in Harbormaster
D14236 / rPee937e99fb9a: Fix unbounded expansion of allocating resource pool
D14235 / rPde2bbfef7d14: Allow PhabricatorWorker->queueTask() to take full $options
D14234 / rP4cf1270ecdd8: In Harbormaster, make sure artifacts are destroyed even if a build is aborted
D14224 / rPbb4667cb8490: Fix WorkingCopy step to read correct commit variables
D14215 / rPc95fcb8970ca: Add a little Drydock documentation
D14214 / rP449617692489: Add staging area support to Harbormaster/Drydock + various fixes
D14213 / rPd4a0b1c8709b: Remove names from Drydock resources
D14212 / rPb219bcfb3d70: Improve error and exception handling for Drydock leases
D14211 / rPe589d152310a: Improve error and exception handling for Drydock resources
D14210 / rP6b775e609053: Add more Drydock log types and some additional logging
D14202 / rP4ac82be5ed22: Merge the DrydockLease workers into a single worker
D14201 / rP91e5ca0ee28c: Merge the DrydockResource workers into a single worker
D14198 / rP8bf59050247d: Add Drydock log types and more logging
D14197 / rP06f927250290: Garbage collect Drydock logs after 30 days
D14196 / rP2ef5b5321d1f: Move Drydock logs to PHIDs and increased structure
D14194 / rP9d997df9643b: Reset Drydock git working copies better
D14180 / rP33be8f719ff3: Allow WorkingCopy resources to have multiple working copies
D14178 / rP9b29d46e60f3: Make Drydock lease infrastructure more nimble
D14177 / rPcd2dd2a08f81: Give visual feedback when a Drydock resource or lease is releasing
D14161 / rPd735c7adf2d5: Allow Harbormaster to run commands on Drydock working copies
D14160 / rP284fe0fe51ce: Allow Harbormaster to lease working copies from Drydock
D14158 / rP64ed97103993: Show recent active leases on Drydock resource detail
D14157 / rP3b2f4c258f1b: Show recent active resources on Drydock blueprint detail, with link to all
D14156 / rPb441e8b81e31: Allow Drydock blueprints to be disabled
D14155 / rP1491269b72e4: Modernize Drydock SearchEngine implementations
D14154 / rPb71ce90b9cc1: Straighten out Drydock policies for Resources
D14153 / rPe117ace8c7fb: Convert Drydock lease and resource constants to strings
D14151 / rPc6aade439283: Give Drydock leases a resourcePHID instead of a resourceID
D14150 / rP309aadc595a1: Rename Drydock Lease STATUS_EXPIRED to STATUS_DESTROYED
D14147 / rPfcb6d1e2faa5: Strip some obsolete code out of Drydock
D14144 / rP1f311d64c608: Give Drydock resources and leases a real "destroy" lifecycle phase
D14143 / rP789df89c84b5: Add a command queue to Drydock to manage lease/resource release

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

These things are now complete:

  • Resources and leases have a real destruction phase.
  • Resources now have sensible policies.
  • Leases now have formal expiration behaviors.
  • Resource and lease statuses are now consistent (and are now strings).
  • Leases now have a resourcePHID.
  • Blueprints can be disabled, preventing them from allocating new resources or acquiring new leases.
  • Various UI improvements.

These things remain:

  • Logging is still mostly untouched and way under the level it should be at.
  • Yields / temporary failures / permanent failures are still very coarse.
  • Lease policies are still a bit odd.
  • Resources still do not have expiration behaviors.
  • Security landscape isn't documented yet.
  • All the direct writes for phase changes are still non-transactional.
  • Recovery/retry behavior is pretty good if failures happen right away, but not as good if something allocates and then breaks later.

Broadly, T7399 has progressed far enough to let code run on the sbuild tier without fear that we're potentially leaking live cluster key material. I configured blueprints and a build plan on this host and we can now successfully execute "builds":

https://secure.phabricator.com/harbormaster/build/9161/

These "builds" have a lot of weird stuff going on still: for example, we queue a working copy lease, it acquires about 500ms later, then we spend 14500ms waiting around for no reason and 500ms doing the "build". However, this is easy to fix by tweaking yield heuristics or letting the allocator awaken the Harbormaster worker after allocation (which might be trivial).


The major blockers for T9123 (building Phabricator in Phabricator) on the Drydock side are:

  • Blueprint/resource selection, per above. We don't want normal builds running on the saux (higher-trust) tier, but there's currently no way to prevent it. This doesn't block T9123 but does block T182.
  • Running arc is harder for us than for other projects: we can't just put arc on the host in $PATH when building libphutil or arcanist, since it needs to run the version of arc being tested. We also want to use libphutil and arcanist at HEAD of master, even when running tests on phabricator -- not just whatever was last deployed to the box. I don't have a concrete plan for this yet. I think it probably takes the form of letting WorkingCopy blueprints be collections of working copies instead of single working copies.

Resources still do not have expiration behaviors.

This is fixed, with some caveats about range of capabilities in D14176.

we queue a working copy lease, it acquires about 500ms later, then we spend 14500ms waiting around for no reason and 500ms doing the "build".

This is fixed, and we now do acquire + activate + "build" + release in ~1-2 seconds for libphutil/ on this host.

I think it probably takes the form of letting WorkingCopy blueprints be collections of working copies instead of single working copies.

This part is implemented now, although I haven't figured out how users are going to configure it.

Blueprint/resource selection, per above.

I have some ideas on this but nothing concrete yet.

Logging is still mostly untouched and way under the level it should be at.
Yields / temporary failures / permanent failures are still very coarse.
Recovery/retry behavior is pretty good if failures happen right away, but not as good if something allocates and then breaks later.

This stuff is still highly sketchy and probably up next.

J5lx added a subscriber: J5lx.Sep 29 2015, 9:07 PM

@epriestley I just tried to start reimplementing some of my patches on top of HEAD and I've run into a problem:

I need to yield within allocateResource, because the blueprint needs to wait for the IP address to be assigned to the host, but we can't call sleep. However, the allocateResource method doesn't have a resource because it creates one, and presumably if I yield, there's no guarantee that the allocator will continue in the same place?

Allocate, but don't setActivateWhenAllocated(). You'll get a callback to activateResource() later. Check for an IP. If you have one, set it on the resource and call activateResource() to finish activation. If you don't have one yet, throw a yield and you'll get another call later. Repeat until you get an IP. Does that sound approximately reasonable?

e.g.

public function activateResource(
  DrydockBlueprint $blueprint,
  DrydockResource $resource) {

  $ec2_key = $resource->getAttribute('key-in-ec2');

  $ip = hey_ec2_is_there_an_ip_yet($ec2_key);
  if (!$ip) {
    throw yield;
  }

  $resource
    ->setAttribute('ip', $ip)
    ->activateResource();
}

Couple of issues I've seen so far:

  • If a WorkingCopy build step is restarted while getting a working copy, it doesn't clean up the lease. This is because we don't emit an artifact until the very end. We either need to emit the artifact sooner or have a separate cleanup step for other target resources. I'm inclined to just emit the artifact sooner. The build won't move forward until the build step completes, anyway, so it's OK that there's no formal "incomplete artifact" state.
  • If we try to run two concurrent builds, the WorkingCopy blueprint is currently fine with bringing up an unlimited number of resources, but hosts are currently limited to one lease. This can give us resources which will never activate, since they're waiting for a host indefinitely. These limits don't make sense as-is anyway, but this interaction is sort of subtle and may need some finesse to resolve.
gabe added a subscriber: gabe.Oct 4 2015, 12:17 AM

We are now building all of the repositories, and all revisions submitted by members of Community.

This stuff is now fixed:

  • Harbormaster and Drydock now guarantee destruction of leases despite aborts/releases.
  • We no longer degrade if there is a burst of requests, but see substantial discussion in D14236 about refining this in the future.
  • Logging is better, although still needs some work.
  • I'm probably not going to make state-change writes non-transactional in v1 since logging does a pretty reasonable job of covering that now.
  • Error handling and distinguishing between temporary and permanent failures is greatly improved. It will still take some time to stabilize, but recent issues have been about cleaning up edge cases, not fundamental mishandling of error states.
  • We're better about dealing with some kinds of resource breaks after activation. These breaks are hard to encounter in the upstream today (all reasonable breaks require operator intervention to resolve anyway) so I don't expect to make this too much more robust in the short term.
  • There's a tiny bit of documentation.

This stuff still needs work:

  • Per above, log observability is better but still isn't great.
  • Documentation is still mostly nonexistent.
  • Blueprint/resource selection stuff still doesn't meaningfully exist.
  • A bunch of limits (mostly, see D14236) are hard-coded and set to nonsense values (usually "1").
  • Lease policies are still a bit odd (although maybe they're just always going to be a bit odd?)
vhbit added a subscriber: vhbit.Oct 9 2015, 8:04 AM
epriestley moved this task from Preflight to Paused on the Prioritized board.Oct 10 2015, 1:30 PM

Progress here:

  • There's now a little bit more documentation.
  • Blueprint selection feels reasonable for v1 generally (see T9519 for discussion).
  • I think after D14272 + D14274 the allocator behaves correctly in production (on this one install, in a very limited role, etc). It's a little early to say that I actually fixed all the bugs, but the behavior appeared nearly correct before and the effects of the bugs those changes fixed were pretty straightforward.

Overall, except for the stuff fixed above, things have been working well for a while. Drydock now handles multiple task types (revision builds, commit builds, lands) across multiple pools (saux, sbuild) and seems to be functioning as designed. By all appearances, we could dump as much hardware into these pools as we wanted and scale until MySQL eventually falls over as a coordination server.

Stuff I'm still looking at:

  • Logs will probably get a little more work, although they've been not-terrible for the last few issues I've hit.
  • I'll continue fleshing out the documentation, it's like halfway to where it probably should be for an unprototype.
  • Not really concerned about lease policies for now, probably a v2+ thing if we deal with it.
  • I need to move the hard-coded limits of 1 to config. That's easy, but I want to think about what it will look like in v2/v3 and try to make sure we're moving in that direction rather than somewhere we'll need to migrate away from later.
  • Resource cleanup/release has a big manual component for now but that actually feels fine in practice today. This isn't scalable in the long term but I'm not overly concerned about solving it completely for v1.
jra3 added a subscriber: jra3.Oct 14 2015, 8:50 PM
epriestley closed this task as Resolved.Oct 27 2015, 9:54 PM
epriestley claimed this task.

I'm going to close this out as I think we're generally in good shape here and we're now hosting builds and doing server-side lands in the upstream, and don't anticipate much more Drydock-specific work in this iteration.

I'm not actually unprototyping Drydock yet (and may not for a while) since we also have to unprototype Almanac for it to be useful and they both interact with Phacility. I want to let it stabilize for a while first before we try to do that integration.

eadler added a subscriber: eadler.Dec 16 2015, 8:16 PM
urzds added a subscriber: urzds.Jul 12 2017, 11:12 AM