Page MenuHomePhabricator

Make repository daemon locks more granular and forgiving
ClosedPublic

Authored by epriestley on May 12 2016, 11:24 PM.
Tags
None
Referenced Files
F14062149: D15903.diff
Mon, Nov 18, 10:42 AM
F14060724: D15903.id.diff
Mon, Nov 18, 3:11 AM
F14050862: D15903.diff
Thu, Nov 14, 9:04 PM
F14040802: D15903.id38308.diff
Mon, Nov 11, 1:35 PM
F14038704: D15903.id.diff
Mon, Nov 11, 1:31 AM
F14034540: D15903.diff
Sun, Nov 10, 12:52 AM
F14031816: D15903.id38303.diff
Sat, Nov 9, 11:59 AM
F14031187: D15903.id.diff
Sat, Nov 9, 9:45 AM
Subscribers

Details

Summary

Ref T4292. Currently, we hold one big lock around the whole bin/repository update workflow.

When running multiple daemons on different hosts, this lock can end up being contentious. In particular, we'll hold it during git fetch on every host globally, even though it's only useful to hold it locally per-device (that is, it's fine/good/expected if repo001 and repo002 happen to be fetching from a repository they are observing at the same time).

Instead, split it into two locks:

  • One lock is scoped to the current device, and held during pull (usually git fetch). This just keeps multiple daemons accidentally running on the same host from making a mess when trying to initialize or update a working copy.
  • One lock is scoped globally, and held during discovery. This makes sure daemons on different hosts don't step on each other when updating the database.

If we fail to acquire either lock, assume some other process is legitimately doing the work and bail more quietly instead of fataling. In approximately 100% of cases where users have hit this lock contention, that was the case: some other daemon was running somewhere doing the work and the error didn't actually represent an issue.

If there's an actual problem, we still raise a diagnostically useful message if you run bin/repository update manually, so there are still tools to figure out that something is hung or whatever.

Test Plan
  • Ran bin/repository update, pull, discover.
  • Added sleep(5), forced processes to contend, got lock exceptions and graceful exit with diagnostic message.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Make repository daemon locks more granular and forgiving.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.May 13 2016, 3:05 AM
This revision was automatically updated to reflect the committed changes.

This diff seems to cause an issue to my instance. Can you check Q401? Thanks.