Page MenuHomePhabricator

Drydock may grow resource pools too cautiously
Closed, ResolvedPublic


See PHI2177. An install reports excessive throttling of Drydock pool growth. A likely reproduction case is:

  • Add a sleep 60 && ... before git clone in "DrydockWorkingCopyBlueprintImplementation" to make this more observable.
  • With a valid "Working Copy" blueprint with a large limit (say, 32).
  • Start 8 taksmasters (bin/phd launch 8 taskmaster).
  • Destroy all the working copy resources.
  • Lease 32 resources (bin/drydock lease --type working-copy --attributes ...).

Reproduction behavior:

  • Drydock takes ~32 minutes to build ~32 resources serially.

Expected behavior (i.e., what the code attempts to implement):

  • Drydock builds a few resources initially, then expands the resource pool faster once more resources exist.

Desired behavior:

  • Drydock builds no more than 32 resources.
  • Drydock builds the resources reasonably quickly (i.e., much faster than 32 minutes).
  • But maybe it doesn't immediately build 32 resources in parallel -- there's room to realize the ramp-up rate as a tuning parameter.

I'm going to try to reproduce this as described first.

Event Timeline

epriestley triaged this task as Normal priority.May 4 2022, 9:37 PM
epriestley created this task.

This issue partially reproduces (consistent with the original report, not immediately consistent with my theorizing about a root cause in PHI2177 -- actually, looks like both parts are right, see below): Drydock builds ~1 working copy per minute serially until it reaches a pool size of 5 resources. Then, it begins allocating 2 simultaneous resources.

DrydockBlueprintImplementation->shouldLimitAllocatingPoolSize() has an old mechanism that prevents a pool from expanding by more than 25% of current, active resources at once. This is arbitrary, and it means that pools with between 0 and 4 resources can only launch 1 additional resource at once, even if the pool limit is arbitrarily large.

When this mechanism is removed (by commenting out the logic that cares about the 25% limit), we'd expect Drydock to build 8 resources at a time (limited by number of taskmasters). It actually builds ~1-4. This is because executeAllocator() does this:

  • Select all "ACTIVE" or "PENDING" resources which the lease could acquire.
  • If any of those resources don't have any leases, try to lease them instead of building new resources.

This means that as soon as workers see a "PENDING" resource, they'll all keep trying to lease it until it activates and one succeeds. Only once there are no more pending resources can workers begin allocating again.

The goal in the original change (D19762) was to prevent leases from creating too many resources while waiting for allocations to occur. I think the desired behavior is for leases to do this instead:

  • When a lease allocates a resource, save the resource PHID on the lease.
  • Ignore "PENDING" resources in loadResourcesForAllocatingLease().
  • When preparing to allocate, check if we have a saved allocating resource that's still pending. If we do, yield.

This limits allocating resources to the number of active leases requesting resources, which should reasonably prevent the "try to allocate infinite resources" issue but remove the implicit throttling of allocation rate.

The outline above isn't quite sufficient because when the active resource list is nonempty, we don't actually reach the "new allocation" logic. Broadly, executeAllocator() is kind of wonky and needs some additional restructuring to cover both the D19762 case ("allocate up to the resource limit before reusing resources") and the normal set of cases. The proper logic is something like:

  • Get the list of active resources.
  • If any have no leases, try to lease them. Return if successful.
  • If we want supplemental allocations, try to allocate. Yield if successful.
  • Try to lease any that aren't overallocated. Return if successful.
  • If we've already allocated a pending resource, yield. (New logic per above.)
  • Try to allocate. Yield if successful.
  • Try to reclaim. Yield if successful.

When this mechanism is removed (by commenting out the logic that cares about the 25% limit), we'd expect Drydock to build 8 resources at a time (limited by number of taskmasters). It actually builds ~1-4...

I commented out the wrong thing and didn't disable the mechanism properly. Disabling this mechanism is sufficient to remove the throttle on the allocator. Pending Working Copy resources remove themselves from the list of candidates in canAcquireLeaseOnResource(), so they don't see the allocating resources and don't get throttled.

Then, because of how the worker queue works, resources tend to build before leases trigger again, at least if they take less than 15 seconds to build.

But this isn't a good state of affairs. We can force the allocator into bad behavior by waking a lease update worker repeatedly (this is artificial, but nothing guarantees it won't happen in real life): in this case, it maxes out the available pool of resources (the issue that the throttling mechanism was originally intended to prevent).

So rewriting this logic remains desirable, even though the throttles don't work quite like I described above.

The proper logic is something like...

This is what I've implemented:

  • Get a list of active resources.
    • Divide them into "free" resources (which are better than creating a new resource) and "used" resources (which are worse than creating a new resource).
  • Try to lease a "free" resource. Return if successful.
  • If this lease has allocated a resource and that resource is still pending, yield.
  • Try to create a new resource. If successful:
    • Try to lease it. Return if successful, yield otherwise.
  • Try to lease a "used" resource. Return if successful.
  • If this lease has reclaimed a resource and that resource is still active, yield.
  • Try to reclaim a resource.
  • Yield and wait.

There may be additional work here, but presuming this is more or less resolved until evidence to the contrary arises.