Page MenuHomePhabricator

Clusterizing a repository onto a service with multiple machines results in ambiguous leader
Closed, ResolvedPublic

Description

We have a cluster repository service bound to two machines. bin/repository clusterize ran successfully, but subsequent attempts to browse the repository in Diffusion result in errors like this:

ERR-CONDUIT-CORE: Repository "rREPO" exists on more than one device, but no device has any repository version information. Phabricator can not guess which copy of the existing data is authoritative. Remove all but one device from service to mark the remaining device as the authority.

/config/cluster/repositories indicates that the repository "has an ambiguous leader."

Following the instructions in the error (disabling the bindings to all but one device, then reenabling) resolved the error.

From the Diviner documentation and the command's help text, it's not clear to me whether clusterizing is supposed to work cleanly with multiple devices. If it's not supposed to work, I'd expect the clusterize command to result in an error instead, or emit a warning that further operations are necessary.

Event Timeline

jboning renamed this task from Clusterizing a repository onto a service with multiple machines results in non-working state to Clusterizing a repository onto a service with multiple machines results in ambiguous leader.Jan 10 2017, 12:51 AM

The "Ambiguous Leaders" section in the docs should more or less cover this, and using bin/repository thaw to promote your favorite node to leadership is probably the easiest way to resolve it. This pathway to ambiguity is fairly mundane compared to what the documentation imagines, though, and both clusterize and that UI warning could point toward bin/repository thaw --promote. I'll make these more clear.

After D17169, the clusterize command, the error message, and the documentation should be more helpful in anticipating and resolving this situation. (The documentation may take ~24 hours to actually regenerate.)