Drydock may grow resource pools too cautiously
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	May 4 2022, 9:37 PM

Description

See PHI2177. An install reports excessive throttling of Drydock pool growth. A likely reproduction case is:

Add a sleep 60 && ... before git clone in "DrydockWorkingCopyBlueprintImplementation" to make this more observable.
With a valid "Working Copy" blueprint with a large limit (say, 32).
Start 8 taksmasters (bin/phd launch 8 taskmaster).
Destroy all the working copy resources.
Lease 32 resources (bin/drydock lease --type working-copy --attributes ...).

Reproduction behavior:

Drydock takes ~32 minutes to build ~32 resources serially.

Expected behavior (i.e., what the code attempts to implement):

Drydock builds a few resources initially, then expands the resource pool faster once more resources exist.

Desired behavior:

Drydock builds no more than 32 resources.
Drydock builds the resources reasonably quickly (i.e., much faster than 32 minutes).
But maybe it doesn't immediately build 32 resources in parallel -- there's room to realize the ramp-up rate as a tuning parameter.

I'm going to try to reproduce this as described first.

Revisions and Commits

rP Phabricator
	D21809	rPa640a4a4930f In Drydock, yield for reclaiming resources in the "released" state
	D21808	rP1835ca1918a4 Remove the "25% of active pool" growth rate throttle from Drydock
	D21806	rP13a37282bcb3 Formalize some more Drydock conditions and bookkeeping
	D21807	rP62360ea40689 Adjust the Drydock allocator to limit each pending lease to one allocating…
	D21805	rP1b6727ac3d6a Add "--all" flags to "release-lease" and "release-resource" workflows in…

Related Objects

Mentioned Here: D19762: Allow Drydock Blueprints to control "supplemental allocation" behavior so all hosts in an Almanac pool get used

Event Timeline

epriestley triaged this task as Normal priority.May 4 2022, 9:37 PM

epriestley created this task.

This issue partially reproduces (consistent with the original report, not immediately consistent with my theorizing about a root cause in PHI2177 -- actually, looks like both parts are right, see below): Drydock builds ~1 working copy per minute serially until it reaches a pool size of 5 resources. Then, it begins allocating 2 simultaneous resources.

DrydockBlueprintImplementation->shouldLimitAllocatingPoolSize() has an old mechanism that prevents a pool from expanding by more than 25% of current, active resources at once. This is arbitrary, and it means that pools with between 0 and 4 resources can only launch 1 additional resource at once, even if the pool limit is arbitrarily large.

When this mechanism is removed (by commenting out the logic that cares about the 25% limit), we'd expect Drydock to build 8 resources at a time (limited by number of taskmasters). It actually builds ~1-4. This is because executeAllocator() does this:

Select all "ACTIVE" or "PENDING" resources which the lease could acquire.
If any of those resources don't have any leases, try to lease them instead of building new resources.

This means that as soon as workers see a "PENDING" resource, they'll all keep trying to lease it until it activates and one succeeds. Only once there are no more pending resources can workers begin allocating again.

The goal in the original change (D19762) was to prevent leases from creating too many resources while waiting for allocations to occur. I think the desired behavior is for leases to do this instead:

When a lease allocates a resource, save the resource PHID on the lease.
Ignore "PENDING" resources in loadResourcesForAllocatingLease().
When preparing to allocate, check if we have a saved allocating resource that's still pending. If we do, yield.

This limits allocating resources to the number of active leases requesting resources, which should reasonably prevent the "try to allocate infinite resources" issue but remove the implicit throttling of allocation rate.

The outline above isn't quite sufficient because when the active resource list is nonempty, we don't actually reach the "new allocation" logic. Broadly, executeAllocator() is kind of wonky and needs some additional restructuring to cover both the D19762 case ("allocate up to the resource limit before reusing resources") and the normal set of cases. The proper logic is something like:

Get the list of active resources.
If any have no leases, try to lease them. Return if successful.
If we want supplemental allocations, try to allocate. Yield if successful.
Try to lease any that aren't overallocated. Return if successful.
If we've already allocated a pending resource, yield. (New logic per above.)
Try to allocate. Yield if successful.
Try to reclaim. Yield if successful.

epriestley added a revision: D21805: Add "--all" flags to "release-lease" and "release-resource" workflows in "bin/drydock".May 5 2022, 11:01 PM

When this mechanism is removed (by commenting out the logic that cares about the 25% limit), we'd expect Drydock to build 8 resources at a time (limited by number of taskmasters). It actually builds ~1-4...

I commented out the wrong thing and didn't disable the mechanism properly. Disabling this mechanism is sufficient to remove the throttle on the allocator. Pending Working Copy resources remove themselves from the list of candidates in canAcquireLeaseOnResource(), so they don't see the allocating resources and don't get throttled.

Then, because of how the worker queue works, resources tend to build before leases trigger again, at least if they take less than 15 seconds to build.

But this isn't a good state of affairs. We can force the allocator into bad behavior by waking a lease update worker repeatedly (this is artificial, but nothing guarantees it won't happen in real life): in this case, it maxes out the available pool of resources (the issue that the throttling mechanism was originally intended to prevent).

So rewriting this logic remains desirable, even though the throttles don't work quite like I described above.

The proper logic is something like...

This is what I've implemented:

Get a list of active resources.
- Divide them into "free" resources (which are better than creating a new resource) and "used" resources (which are worse than creating a new resource).
Try to lease a "free" resource. Return if successful.
If this lease has allocated a resource and that resource is still pending, yield.
Try to create a new resource. If successful:
- Try to lease it. Return if successful, yield otherwise.
Try to lease a "used" resource. Return if successful.
If this lease has reclaimed a resource and that resource is still active, yield.
Try to reclaim a resource.
Yield and wait.

epriestley added a revision: D21806: Formalize some more Drydock conditions and bookkeeping.May 9 2022, 5:09 PM

epriestley added a revision: D21807: Adjust the Drydock allocator to limit each pending lease to one allocating resource.May 9 2022, 5:24 PM

epriestley added a revision: D21808: Remove the "25% of active pool" growth rate throttle from Drydock.May 9 2022, 5:30 PM

epriestley added a commit: rP1b6727ac3d6a: Add "--all" flags to "release-lease" and "release-resource" workflows in….May 9 2022, 5:40 PM

epriestley added a commit: rP13a37282bcb3: Formalize some more Drydock conditions and bookkeeping.

epriestley added a commit: rP62360ea40689: Adjust the Drydock allocator to limit each pending lease to one allocating….

epriestley added a commit: rP1835ca1918a4: Remove the "25% of active pool" growth rate throttle from Drydock.

epriestley added a revision: D21809: In Drydock, yield for reclaiming resources in the "released" state.May 9 2022, 5:45 PM

epriestley added a commit: rPa640a4a4930f: In Drydock, yield for reclaiming resources in the "released" state.May 9 2022, 5:45 PM

There may be additional work here, but presuming this is more or less resolved until evidence to the contrary arises.

Drydock may grow resource pools too cautiouslyClosed, ResolvedPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Drydock may grow resource pools too cautiously
Closed, ResolvedPublic
Actions