Make repository daemon locks more granular and forgiving
ClosedPublic
Actions

Authored by epriestley on May 12 2016, 11:24 PM.

Details

Reviewers

chad

Maniphest Tasks

T4292: Implement repository replication

Commits

rP1c73ad6a1bb0: Make repository daemon locks more granular and forgiving

Summary

Ref T4292. Currently, we hold one big lock around the whole bin/repository update workflow.

When running multiple daemons on different hosts, this lock can end up being contentious. In particular, we'll hold it during git fetch on every host globally, even though it's only useful to hold it locally per-device (that is, it's fine/good/expected if repo001 and repo002 happen to be fetching from a repository they are observing at the same time).

Instead, split it into two locks:

One lock is scoped to the current device, and held during pull (usually git fetch). This just keeps multiple daemons accidentally running on the same host from making a mess when trying to initialize or update a working copy.
One lock is scoped globally, and held during discovery. This makes sure daemons on different hosts don't step on each other when updating the database.

If we fail to acquire either lock, assume some other process is legitimately doing the work and bail more quietly instead of fataling. In approximately 100% of cases where users have hit this lock contention, that was the case: some other daemon was running somewhere doing the work and the error didn't actually represent an issue.

If there's an actual problem, we still raise a diagnostically useful message if you run bin/repository update manually, so there are still tools to figure out that something is hung or whatever.

Test Plan

Ran bin/repository update, pull, discover.
Added sleep(5), forced processes to contend, got lock exceptions and graceful exit with diagnostic message.