Page MenuHomePhabricator

Make repository synchronization safer when leaders are ambiguous
ClosedPublic

Authored by epriestley on Apr 19 2016, 7:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Mar 27, 6:45 AM
Unknown Object (File)
Sun, Mar 17, 10:45 PM
Unknown Object (File)
Sun, Mar 17, 8:36 PM
Unknown Object (File)
Fri, Mar 15, 4:39 PM
Unknown Object (File)
Fri, Mar 15, 2:53 PM
Unknown Object (File)
Fri, Mar 15, 2:35 PM
Unknown Object (File)
Sun, Mar 10, 5:05 PM
Unknown Object (File)
Thu, Feb 29, 8:53 AM
Subscribers
None

Details

Summary

Ref T4292. Right now, repository versions only get marked when a write happens.

This potentially creates a problem: if I pushed all the sync code to secure and enabled secure002 as a repository host, the daemons would create empty copies of all the repositories on that host.

Usually, this would be fine. Most repositories have already received a write on secure001, so that working copy has a verison and is a leader.

However, when a write happened to a rarely-used repository (say, rKEYSTORE) that hadn't received any write recently, it might be sent to secure002 randomly. Now, we'd try to figure out if secure002 has the most up-to-date copy of the repository or not.

We wouldn't be able to, since we don't have any information about which node has the data on it, since we never got a write before. The old code could guess wrong and decide that secure002 is a leader, then accept the write. Since this would bump the version on secure002, that would make it an authoritative leader, and secure001 would synchronize from it passively (or on the next read or write), which would potentially destroy data.

Instead:

  • Refuse to continue in situations like this.
  • When a repository is on exactly one device, mark it as a leader with version "0".
  • When a repository is created into a cluster service, mark its version as "0" on all devices (they're all leaders, since the repository is empty).

This should mean that we won't lose data no matter how much weird stuff we run into.

Test Plan
  • In single-node mode, used repository update to verify that 0 was written properly.
  • With multiple nodes, used repository update to verify that we refuse to continue.
  • Created a new repository, verified versions were initialized correctly.

Diff Detail

Repository
rP Phabricator
Branch
rlock9
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 11843
Build 14865: Run Core Tests
Build 14864: arc lint + arc unit

Event Timeline

epriestley retitled this revision from to Make repository synchronization safer when leaders are ambiguous.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.Apr 19 2016, 7:53 PM
This revision was automatically updated to reflect the committed changes.