Split RefCursor into a "name" table and a "ref" table so ref names have stable PHIDs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	avivey
	Nov 5 2016, 7:09 PM

Description

This task originally reported a fairly mild, self-healing issue with clustered Git repositories. However, it's adjacent to a more practical issue.

In Differential's "Land Revision" workflow, we currently give the user a tokenizer and ask them to select a target branch, often, say, "master". This kind of workflow (where users select refs using a tokenizer) is likely to become more prevalent in the future as we do more repository operations.

This tokenizer is backed by the RefCursor table, and uses PHIDs from that table, but the rows in that table are currently transient and the PHIDs are not stable or persistent. If a repository is moving quickly enough, the PHIDs may shift between the time the user opens the dialog and submits the form, or even while they're typing.

The format of the table is currently <type, name, commit>, e.g. <"branch", "master", "abcd1234">. There is no unique key on <type, name> because Mercurial repositories may have multiple active heads for a single branch.

This table format does not lend itself well to identifying ref names with a single, stable, persistent PHID. A better format would be to use two tables: one for names (<type, name>) and one for the zero or more things the ref is actually pointing at (<refID, objectHash>).

The way this table is updated also currently has at least two other issues:

The mild probable-race originally reported here, where Git repositories may end up with duplicate pointers to the same ref. We could make this state impossible by putting a unique key on the ref table.
Incredibly high churn rate on row IDs (see comment below).

Taking slightly more care in performing writes to this table can likely clear up both of these issues.

Original Report

Browsing to rP (here) right now shows "The ref "master" is ambiguous in this repository. View Alternatives" warning; There's only one value under available from that link.

Revisions and Commits

rP Phabricator
	D18615	rP49b718178066 Update utility "bin/repository parents" workflow to work with RefPosition
	D18614	rP8982e3e52dd8 Update major RefCursor callsites to work properly with RefPosition
	D18613	rP5cf62f86d728 Remove obsolete columns from RefCursor table
	D18612	rP782b18e7e2ed Migrate RefCursor data to RefPosition table
	D18602	rP9d5a2b3b4f68 Add a RefPosition table to hold branch/tag positions once the RefCursor table…
	D18599	rPbc995048c913 (stable) Work around workflow blocking error with duplicate "master" refs in…
	D18599	rPc310f08b7a5c Work around workflow blocking error with duplicate "master" refs in "Land…

Related Objects

Mentioned In: rP5240cffd9c20: Fix an issue where Diffusion could fatal if the default branch was deleted
D18811: Fix an issue where Diffusion could fatal if the default branch was deleted
T12987: `key_position` on CursorPosition needs to be guaranteed by migrations
Mentioned Here: T6522: "Commit's branches contains" condition is not re-evaluated for commits in multiple branches
T11953: Herald rule for commits on autoclose branch didn't trigger when a commit was first pushed to a branch, then to master (fast-forward)
T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails

Event Timeline

avivey created this task.Nov 5 2016, 7:09 PM

Hmm...

mysql> select * from repository_refcursor where repositoryPHID = 'PHID-REPO-9a38b7db1b795ae3e348'\G
*************************** 1. row ***************************
              id: 16859
            phid: PHID-RREF-2v6ff7fmx4p62ddr3x7w
  repositoryPHID: PHID-REPO-9a38b7db1b795ae3e348
         refType: branch
     refNameHash: fCKLrzDyaVYn
      refNameRaw: master
 refNameEncoding: utf8
commitIdentifier: e1566bef63c273404d3e8a3e8b4b4e23e387d772
        isClosed: 0
*************************** 2. row ***************************
              id: 16860
            phid: PHID-RREF-ev7y6ezkb4iqbbhpm3jp
  repositoryPHID: PHID-REPO-9a38b7db1b795ae3e348
         refType: branch
     refNameHash: fCKLrzDyaVYn
      refNameRaw: master
 refNameEncoding: utf8
commitIdentifier: e1566bef63c273404d3e8a3e8b4b4e23e387d772
        isClosed: 0
*************************** 3. row ***************************
              id: 16854
            phid: PHID-RREF-bu25xmjsnzlb5lvdux2g
  repositoryPHID: PHID-REPO-9a38b7db1b795ae3e348
         refType: branch
     refNameHash: fWcLw_rNrNJF
      refNameRaw: stable
 refNameEncoding: utf8
commitIdentifier: 111c63955155046bbd57c3836c78d460ebe13dfb
        isClosed: 0
3 rows in set (0.00 sec)

This self-healed (probably after another commit was pushed) so I don't expect to look at it for a while unless it happens again.

joshuaspence added a subscriber: joshuaspence.Nov 28 2016, 7:36 PM

It's doing it again now...

frowncat

hmm, cat no show in feed

ok now cat is in feed

I see cat :)

cat is slow to appear

hskiba added a subscriber: hskiba.May 15 2017, 1:35 PM

See PHI11. This has material impact when using PhabricatorRepositoryRefCursorQuery to query landing targets, and likely impacts the logic in DifferentialRevisionOperationController.

In the refcursor table, our current autoincrement is at 312,036,016. This is only 7% of the way to int unsigned, but the fact that we've made it to 7% in 18 months might suggest that column should be BIGINT, unless there are performance impacts of that.

eadler added a subscriber: eadler.Jul 25 2017, 6:42 PM

epriestley renamed this task from "The ref "master" is ambiguous in this repository" in rP to Split RefCursor into a "name" table and a "ref" table so ref names have stable PHIDs.Jul 28 2017, 6:37 PM

epriestley raised the priority of this task from Wishlist to Normal.

epriestley updated the task description. (Show Details)

The simplest way to approach this is probably to do double writes: start writing the new tables, switch readers over, stop writing the old tables.

This is likely straightforward. However, this is also a long path with a whole lot of steps since we have to cycle the PHIDs and a pile of readers. This table is basically just a cache, and theoretically we should be able to cheat.

Current readers are:

The "Land Revision" workflow, which only cares about ref names.
The repository browse page, which only cares about ref names (just testing for existence of the default branch).
DiffusionRefDatasource, which is powering the "Land Revision" typeahead, and only cares about ref names.
PhabricatorRepositoryRefCursorPHIDType, which loads refs based on PHIDs, and only cares about ref names.
PhabricatorRepositoryRefCursorQuery, which loads refs but doesn't actually care about the commit identifiers.

None of those really care about this change.

That just leaves:

PhabricatorRepositoryRefEngine, which actually does logic.
PhabricatorRepositoryManagementParentsWorkflow, which is a support workflow for doing a cache rebuild, easy to update, and not critical.

So I think we can do this in one shot, primarily with changes to PhabricatorRepositoryRefEngine.

One open questions:

In Mercurial, can a branch have some open heads and some closed heads? I don't remember. This can be tested in a couple minutes from the CLI, but affects which table the isClosed flag ends up in.

And then one other concern: I'd like to restructure how pulls are handled and represented, so we treat a pull (say, which updates master and stable) as though it were an equivalent push, and we generate a new type of "ref content" event, to attack T6522 and T11953. This won't completely resolve those issues but should mitigate them to some degree. I believe this also impacts T12392, in that I'd like to use these synthetic pushes as a means to version observed repositories. Although I believe none of that stuff will actually change the design here, it would be good to lay out the plans first and be more confident that we're truly in the clear.

epriestley added a revision: D18599: Work around workflow blocking error with duplicate "master" refs in "Land Revision".Sep 13 2017, 7:21 PM

epriestley added a commit: rPbc995048c913: (stable) Work around workflow blocking error with duplicate "master" refs in….Sep 13 2017, 11:05 PM

In Mercurial, can a branch have some open heads and some closed heads?

Yes, of course it can.

epriestley@orbital ~/dev/scratch/hg-test2 $ hg log --template '{node} {branch}\n' --rev 'head()'
b7be5c96129711154d369dcb15bc37308ad67b8f default
e038759934be33fdf73872ae27de5b62b011eacc default
epriestley@orbital ~/dev/scratch/hg-test2 $ hg log --template '{node} {branch}\n' --rev 'head() and not closed()'
b7be5c96129711154d369dcb15bc37308ad67b8f default

So the isClosed flag needs to go on the table with the hash, not the table with the ref name.

epriestley added a revision: D18602: Add a RefPosition table to hold branch/tag positions once the RefCursor table is split.Sep 14 2017, 12:13 PM

epriestley added a revision: D18612: Migrate RefCursor data to RefPosition table.Sep 15 2017, 3:28 PM

epriestley added a revision: D18613: Remove obsolete columns from RefCursor table.Sep 15 2017, 3:36 PM

epriestley added a revision: D18614: Update major RefCursor callsites to work properly with RefPosition.Sep 15 2017, 4:20 PM

epriestley added a revision: D18615: Update utility "bin/repository parents" workflow to work with RefPosition.Sep 15 2017, 4:24 PM

epriestley added a commit: rP9d5a2b3b4f68: Add a RefPosition table to hold branch/tag positions once the RefCursor table….Sep 15 2017, 5:19 PM

epriestley added a commit: rP782b18e7e2ed: Migrate RefCursor data to RefPosition table.

epriestley added a commit: rP5cf62f86d728: Remove obsolete columns from RefCursor table.

epriestley added a commit: rP8982e3e52dd8: Update major RefCursor callsites to work properly with RefPosition.Sep 15 2017, 5:21 PM

epriestley added a commit: rP49b718178066: Update utility "bin/repository parents" workflow to work with RefPosition.

This is now deployed on this server, we'll see how it holds up. It won't promote to stable until next week, and I'd discourage anyone from picking it up right away.

(If you do upgrade into it, you should really, truly actually stop all the daemons before running migrations. We had a bit of an issue with runaway re-parsing here which I currently believe was because we cheat a bit and do upgrades with daemons up most of the time, since it usually works.)

epriestley mentioned this in T12987: `key_position` on CursorPosition needs to be guaranteed by migrations.Sep 18 2017, 4:26 PM

This is now in stable, and deployed here and to the Phacility production cluster.

epriestley mentioned this in D18811: Fix an issue where Diffusion could fatal if the default branch was deleted.Nov 30 2017, 8:35 PM

epriestley mentioned this in rP5240cffd9c20: Fix an issue where Diffusion could fatal if the default branch was deleted.Nov 30 2017, 10:06 PM

Split RefCursor into a "name" table and a "ref" table so ref names have stable PHIDsClosed, ResolvedPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Split RefCursor into a "name" table and a "ref" table so ref names have stable PHIDs
Closed, ResolvedPublic
Actions