Page MenuHomePhabricator

Determine the best way to prevent the "Lost an allocation race" exception in Drydock
Closed, ResolvedPublic

Description

Because of the new locking mechanisms around Drydock allocation, this occurs much more frequently that it probably previously did.

I thought about just adding the "bespoke" status like the comment said, but then I think we run into this scenario:

  • Build A and build B both start requesting leases.
  • Build A obtains the allocator lock when determining how to grab the lease.
  • Build B is waiting.
  • Build A exits the lock determining that it needs to allocate a resource, and has one sitting in STATUS_ALLOCATING.
  • Build A moves the resource into STATUS_PENDING, which now allows other things to take a lease on it.
  • Build B takes a lease on the Pending resource while Build A is still finishing setup (i.e. waiting for EC2 to fully start).
  • Build A moves the resource into STATUS_OPEN.
  • Build A attempts to allocate a lease, but the blueprint only allows one lease per resource, so it loses the allocation race.

The "obvious" solution here is to disable taking leases on pending resources, but this results in a less desirable situation where if the blueprint allows more than 1 lease per resource, it will allocate more resources instead of leasing against pending resources (and resources generally equate to actual expenditure, whereas leases do not).

I wonder if the right solution here is just to do:

$lease->setStatus(DrydockLeaseStatus::STATUS_ACQUIRING);
$lease->setResourceID($resource->getID());
$lease->save();

right after we move the resource into STATUS_ALLOCATING, and before the allocator exits the lock? This should prevent Build B taking a lease against the pending resource, since there will already be a lease against it when it goes to check the lease count.

Event Timeline

hach-que renamed this task from Use "bespoke" status to prevent "Lost an allocation race" exception to Determine the best way to prevent the "Lost an allocation race" exception in Drydock.
hach-que claimed this task.
hach-que raised the priority of this task from to Normal.
hach-que updated the task description. (Show Details)
hach-que updated the task description. (Show Details)
hach-que added subscribers: epriestley, hach-que.

This block has been removed in D14114 and we no longer distinguish between "lost an allocation race" and "all resources are currently full". The logic goes:

  1. Find the pool of open resources.
  2. If there aren't any, allocate a new resource and add it to the pool.
  3. Try to acquire a lease on a resource in the pool.

The acquisition in (3) may fail if we're racing with other allocators and another allocator eats up all the free space in the pool between the time we do a read in (1) or a write in (2) and actually get around to performing an allocation.

This is perfectly fine, and occasionally expected if resource allocations are occurring at a high rate. We don't own anything in the pool, so it's OK if another allocator "steals" a resource that was free a moment ago. We'll just fail temporarily and retry. Allocation still made progress overall if blueprints are implemented properly, because another allocator performed work.

Currently we fail fairly abruptly, but there's a TODO to smooth this out and make the temporary nature of the failure more explicit (it will likely end up being a yield, but this just smooths the queue out). We could do other things, like see if we can add another resource to the pool immediately, but I think we get the same end result and this is simpler.

There are also no locks in the allocator after D14118.

In the specific scenario above, build A would just retry shortly and bring up another host. Put another way, if we have this scenario:

  • Similar builds A and B each need a unique resource.

...these cases of allocation are no longer distinguishable, which I think is correct:

  1. Scenario 1
    • Build A creates the resource.
    • Build A acquires the resource.
    • Build B fails and retries later.
  2. Scenario 2
    • Build A creates the resource.
    • Build B sneaks in and acquires the resource.
    • Build A fails and retires later.
  3. Scenario 3
    • Build B creates the resource.
    • Build B acquires the resource.
    • Build A fails and retries later.

In all cases, one build acquires the unique resource and the other one fails and retries. In some cases, the lease which triggered the resource allocation is not the same as the lease which acquires the resource, but that's fine.

It's possible to write a blueprint where this doesn't work (e.g., it builds resources which are highly specific and hard-coded to leases, but then acquires any similar-looking lease on them), but this blueprint is implemented incorrectly. If resource A and lease B aren't completely compatible, it shouldn't let lease B acquire on resource A.

It's also possible to write a blueprint where this works but the allocation is sub-optimal. For example, lease A might ask for a huge resource while lease B asks for a tiny resource. If lease B "steals" the resource generated by lease A, it may be "wasting" a lot of the size of that resource (e.g., memory or CPU or disk or whatever). However, allocation/leasing rules are so flexible that I think this is almost certainly straightforward to mitigate in most cases (e.g., refuse to acquire unique leases on grossly outsized resources), and we can generally cross this bridge when we come to it.

D14117 discusses a vaguely related case where a blueprint might intentionally deny lease acquisition on a resource that it needs to process further. This pattern seems fine conceptually if/when we find real resources in the wild which fit into it.