Page MenuHomePhabricator

Implement a rough AlmanacService blueprint in Drydock
ClosedPublic

Authored by epriestley on Sep 16 2015, 1:03 PM.

Details

Summary

Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes.

This attempts to make language more consistent: resources are "allocated" and leases are "acquired".

This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do.

In particular, I've made the blueprint operations on $resource and $lease objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like:

$lease
  ->setActivateWhenAcquired(true)
  ->needSlotLock('x')
  ->needSlotLock('y')
  ->acquireOnResource($resource);

In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward.

This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes:

  • The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode.
  • The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster.
  • This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a sleep(15 * 60) in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks.

Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem).

The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier.

Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to.

This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself:

  • A bunch of code from that (e.g., interfaces) doesn't exist yet.
  • I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing.

This also removes bin/drydock create-resource, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct.

NOTE: This technically works but doesn't do anything useful yet.
Test Plan

Used bin/drydock lease --type host to acquire leases against these blueprints.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Implement a rough AlmanacService blueprint in Drydock.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added reviewers: chad, hach-que.
hach-que edited edge metadata.
This revision is now accepted and ready to land.Sep 16 2015, 3:47 PM

One other thing I'm considering here is that we may eventually have resources which we need to bring up in order to understand capacity for. I think these will be rare, but you could imagine EC2 might have an API like "give me a really cheap machine -- I don't care what the specs are, just the cheapest source of compute resources currently available".

Allocating will always be easier if we know what's available ahead of time, so I think it's reasonable to expect the pipeline to degrade when the available pool of potential resources is a total mystery.

I'm not specifically provisioning for this "mystery box" case, but we can accommodate it as-written. One approach might be:

  • Blueprints allocate new resources only if there are no PENDING resources (or fewer than 2, or 3, or whatever makes sense).
  • Blueprints refuse to acquire leases on PENDING resources.

This is nonstandard, but will work fine in practice. The workflow would go something like this:

  1. Check for resources (see none or all full or whatever).
  2. Allocate a new resource (succeeds).
  3. Lease on a free resource (fails, resource not OPEN yet).

This is unusual, but can happen normally when allocators race (see T6074). It's fine for it to happen routinely with unusual blueprints. The allocation would fail and retry:

  1. Check for resources (see one).
  2. Lease on free resource (fails, resource not OPEN yet).

Again, unusual but fine. Also fine if we try to allocate again, since the blueprint would see that there's already a PENDING resource and deny the allocation. On the third retry:

  1. Check for resources (see one).
  2. Lease on free resource (succeeds, mystery box got opened up and we know about its properties now).

This all seems perfectly reasonable to me. It's even OK if the lease will sometimes fail on the open resource (e.g., the mystery box may have a too-small resource inside it) since we'll just go through the retry process again until we get it right.

This will take longer and involve more steps than if we knew about the resources ahead of time, but that's fine: the price you pay for allocating mystery boxes is more complexity in the allocation process.

This revision was automatically updated to reflect the committed changes.