Page MenuHomePhabricator

Implement optimistic "slot locks" in Drydock
ClosedPublic

Authored by epriestley on Sep 16 2015, 2:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 11, 8:37 AM
Unknown Object (File)
Tue, Apr 2, 9:09 PM
Unknown Object (File)
Mar 20 2024, 8:01 PM
Unknown Object (File)
Mar 12 2024, 9:44 AM
Unknown Object (File)
Mar 6 2024, 4:44 AM
Unknown Object (File)
Mar 5 2024, 5:31 AM
Unknown Object (File)
Mar 4 2024, 6:08 PM
Unknown Object (File)
Mar 3 2024, 7:37 PM
Subscribers
None

Details

Summary

See discussion in D10304. There's a lot of context there, but the general idea is:

  • Blueprints should manage locks in a granular way during the actual allocation/acquisition phase.
  • Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases.

The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work:

...
->needSlotLock("mylock(PHID-XYZQ-...)")
...

When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do.

If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this:

mylock(whatever).slot(2)

Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks.

This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing.

(The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.)

Test Plan

To create a race where the same binding is allocated as a resource twice:

  • Add sleep(10) near the beginning of allocateResource(), after the free bindings are loaded but before resources are allocated.
  • (Comment out slot lock acquisition if you have this patch.)
  • Run bin/drydock lease ... in two windows, within 10 seconds of one another.

This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free.

To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Implement optimistic "slot locks" in Drydock.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added reviewers: chad, hach-que.
hach-que edited edge metadata.
This revision is now accepted and ready to land.Sep 16 2015, 3:04 PM
chad edited edge metadata.
This revision was automatically updated to reflect the committed changes.