Page MenuHomePhabricator

Fix an issue where newly created Drydock resources could be improperly acquired
ClosedPublic

Authored by epriestley on Oct 14 2015, 12:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Dec 17, 6:20 AM
Unknown Object (File)
Tue, Dec 17, 5:49 AM
Unknown Object (File)
Fri, Dec 13, 9:42 AM
Unknown Object (File)
Fri, Nov 29, 5:22 PM
Unknown Object (File)
Fri, Nov 29, 4:38 PM
Unknown Object (File)
Thu, Nov 28, 3:16 AM
Unknown Object (File)
Sun, Nov 24, 2:40 PM
Unknown Object (File)
Wed, Nov 20, 9:34 AM
Subscribers
None

Details

Summary

Ref T9252. This is mostly a fix for an edge case from D14236. Here's the setup:

  • There are no resources.
  • A request for a new resource arrives.
  • We build a new resource.

Now, if we were leasing an existing resource, we'd call canAcquireLeaseOnResource() before acquiring a lease on the new resource.

However, for new resources we don't do that: we just acquire a lease immediately. This is wrong, because we now allow and expect some resources to be unleasable when created.

In a more complex workflow, this can also produce the wrong result and leave the lease acquired sub-optimally (and, today, deadlocked).

Make the "can we acquire?" pathway consistent for new and existing resources, so we always do the same set of checks.

Test Plan
  • Started daemons.
  • Deleted all working copy resources.
  • Ran two working-copy-using build plans at the same time.
  • Before this change, one would often [1] acquire a lease on a pending resource which never allocated, then deadlock.
  • After this change, the same thing happens except that the lease remains pending and the work completes.

[1] Although the race this implies is allowed (resource pool limits are soft/advisory, and it is expected that we may occasionally run over them), it's MUCH easier to hit right now than I would expect it to be, so I think there's probably at least one more small bug here somewhere. I'll see if I can root it out after this change.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Fix an issue where newly created Drydock resources could be improperly acquired.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.

Although the race this implies is allowed (resource pool limits are soft/advisory, and it is expected that we may occasionally run over them), it's MUCH easier to hit right now than I would expect it to be, so I think there's probably at least one more small bug here somewhere. I'll see if I can root it out after this change.

D14274 has the fix for this, it turned out to be a bad query.

This revision is now accepted and ready to land.Oct 14 2015, 1:12 PM