Paths

Table of Contentst

Differential D10470

Prevent allocation race by allocating lease before leaving lock
AbandonedPublic
Actions

Authored by hach-que on Sep 11 2014, 1:57 AM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Maniphest Tasks

T6074: Determine the best way to prevent the "Lost an allocation race" exception in Drydock

Summary

Resolves T6074. This attempts to prevent an allocation race from occurring by saving the lease against the resource before we exit the lock.

Test Plan

Tested that allocations still work inside Harbormaster locally, but can't be entirely sure that the issue is resolved until we leave this fix in production for a little while.

Diff Detail

Repository

rP Phabricator

Branch

prevent-allocation-race

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 2505
Build 2509: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

hach-que updated this revision to Diff 25181.Sep 11 2014, 1:57 AM

hach-que retitled this revision from to Prevent allocation race by allocating lease before leaving lock.

hach-que updated this object.

hach-que edited the test plan for this revision. (Show Details)

hach-que added a reviewer: epriestley.

hach-que added a task: T6074: Determine the best way to prevent the "Lost an allocation race" exception in Drydock.

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptSep 11 2014, 1:57 AM

Herald added subscribers: Korvin, epriestley. · View Herald Transcript

Harbormaster completed remote builds in B2505: Diff 25181.Sep 11 2014, 1:58 AM

joshuaspence added a subscriber: joshuaspence.Feb 27 2015, 5:38 AM

This seems bad: we hold the lock for an arbitrarily long period of time, and may cycle and deadlock, if allocating requires leasing another resource of the same type. This is legal and permissible (the second resource of the same type can have different parameters, and this is fine; cycling on resource types does not mean that the allocation plan cycles as a whole).

I'd expect to just retry or yield if we lose a race.

I don't even really understand the root problem here. STATUS_ALLOCATING does not exist at HEAD and I think was added and then removed elsewhere? Does this actually repro against HEAD, or is it plausible that other changes intorduced this problem?

This revision now requires changes to proceed.Aug 8 2015, 6:06 PM

More broadly:

There isn't even a lock at head so I don't really know what this code is doing.
It's OK for us to lease as soon as other processes are allowed to lease. If they can lease earlier, it's OK for us to lease earlier too. In this version of the code, leasing before allocateResource() might be OK, for example. At HEAD, there's nowhere earlier to lease.
It's potentially OK to not let anything lease until later, but I think this is undesirable (in bursty workloads, we can do a better job of satisfying resource demand with the current approach: preventing allocations will spin up more slowly or overallocate resources).
It's not OK to hold a lock while waiting for a resource to come up.
- This can take arbitrarily long.
- This can cycle.
If we're allowed to lease immediately after releasing the lock, it's probably OK to lease immediately before releasing the lock, but I'd be very hesitant to do this.
I think letting the code race is basically desirable, and that we can just yield if we lose a race.

When I originally made this fix, I was as confused as you are, but it turns out allocateLease doesn't actually do any work. The only function it calls that's controllable by the derived implementation is shouldAllocateLease, and this function already has an explicit warning that checks in this function should be quick because the function may be called from a lock.

The lease acquisition work is always done inside acquireLease, which always resides outside of a lock.

Anyway, I'm going to merge this down into D10304 since it's related to ensuring Drydock has race-free allocation behaviour.

hach-que mentioned this in D10304: [drydock/core] Implement resource constraints and correct locking mechanisms with yielding for Drydock.Aug 24 2015, 8:48 AM

hach-que abandoned this revision.Aug 24 2015, 8:51 AM

Revision Contents
Changeset List

Path

Size

src/

applications/

drydock/

worker/

DrydockAllocatorWorker.php

61 lines

Diff 25181

View Options

src/applications/drydock/worker/DrydockAllocatorWorker.php

Prevent allocation race by allocating lease before leaving lockAbandonedPublicActions