Page MenuHomePhabricator

Split RefCursor into a "name" table and a "ref" table so ref names have stable PHIDs
Closed, ResolvedPublic

Description

This task originally reported a fairly mild, self-healing issue with clustered Git repositories. However, it's adjacent to a more practical issue.

In Differential's "Land Revision" workflow, we currently give the user a tokenizer and ask them to select a target branch, often, say, "master". This kind of workflow (where users select refs using a tokenizer) is likely to become more prevalent in the future as we do more repository operations.

This tokenizer is backed by the RefCursor table, and uses PHIDs from that table, but the rows in that table are currently transient and the PHIDs are not stable or persistent. If a repository is moving quickly enough, the PHIDs may shift between the time the user opens the dialog and submits the form, or even while they're typing.

The format of the table is currently <type, name, commit>, e.g. <"branch", "master", "abcd1234">. There is no unique key on <type, name> because Mercurial repositories may have multiple active heads for a single branch.

This table format does not lend itself well to identifying ref names with a single, stable, persistent PHID. A better format would be to use two tables: one for names (<type, name>) and one for the zero or more things the ref is actually pointing at (<refID, objectHash>).

The way this table is updated also currently has at least two other issues:

  • The mild probable-race originally reported here, where Git repositories may end up with duplicate pointers to the same ref. We could make this state impossible by putting a unique key on the ref table.
  • Incredibly high churn rate on row IDs (see comment below).

Taking slightly more care in performing writes to this table can likely clear up both of these issues.


Original Report

Browsing to rP (here) right now shows "The ref "master" is ambiguous in this repository. View Alternatives" warning; There's only one value under available from that link.

Event Timeline

Hmm...

mysql> select * from repository_refcursor where repositoryPHID = 'PHID-REPO-9a38b7db1b795ae3e348'\G
*************************** 1. row ***************************
              id: 16859
            phid: PHID-RREF-2v6ff7fmx4p62ddr3x7w
  repositoryPHID: PHID-REPO-9a38b7db1b795ae3e348
         refType: branch
     refNameHash: fCKLrzDyaVYn
      refNameRaw: master
 refNameEncoding: utf8
commitIdentifier: e1566bef63c273404d3e8a3e8b4b4e23e387d772
        isClosed: 0
*************************** 2. row ***************************
              id: 16860
            phid: PHID-RREF-ev7y6ezkb4iqbbhpm3jp
  repositoryPHID: PHID-REPO-9a38b7db1b795ae3e348
         refType: branch
     refNameHash: fCKLrzDyaVYn
      refNameRaw: master
 refNameEncoding: utf8
commitIdentifier: e1566bef63c273404d3e8a3e8b4b4e23e387d772
        isClosed: 0
*************************** 3. row ***************************
              id: 16854
            phid: PHID-RREF-bu25xmjsnzlb5lvdux2g
  repositoryPHID: PHID-REPO-9a38b7db1b795ae3e348
         refType: branch
     refNameHash: fWcLw_rNrNJF
      refNameRaw: stable
 refNameEncoding: utf8
commitIdentifier: 111c63955155046bbd57c3836c78d460ebe13dfb
        isClosed: 0
3 rows in set (0.00 sec)
epriestley triaged this task as Wishlist priority.Nov 7 2016, 11:45 PM

This self-healed (probably after another commit was pushed) so I don't expect to look at it for a while unless it happens again.

See PHI11. This has material impact when using PhabricatorRepositoryRefCursorQuery to query landing targets, and likely impacts the logic in DifferentialRevisionOperationController.

In the refcursor table, our current autoincrement is at 312,036,016. This is only 7% of the way to int unsigned, but the fact that we've made it to 7% in 18 months might suggest that column should be BIGINT, unless there are performance impacts of that.

epriestley renamed this task from "The ref "master" is ambiguous in this repository" in rP to Split RefCursor into a "name" table and a "ref" table so ref names have stable PHIDs.Jul 28 2017, 6:37 PM
epriestley raised the priority of this task from Wishlist to Normal.
epriestley updated the task description. (Show Details)

The simplest way to approach this is probably to do double writes: start writing the new tables, switch readers over, stop writing the old tables.

This is likely straightforward. However, this is also a long path with a whole lot of steps since we have to cycle the PHIDs and a pile of readers. This table is basically just a cache, and theoretically we should be able to cheat.

Current readers are:

  • The "Land Revision" workflow, which only cares about ref names.
  • The repository browse page, which only cares about ref names (just testing for existence of the default branch).
  • DiffusionRefDatasource, which is powering the "Land Revision" typeahead, and only cares about ref names.
  • PhabricatorRepositoryRefCursorPHIDType, which loads refs based on PHIDs, and only cares about ref names.
  • PhabricatorRepositoryRefCursorQuery, which loads refs but doesn't actually care about the commit identifiers.

None of those really care about this change.

That just leaves:

  • PhabricatorRepositoryRefEngine, which actually does logic.
  • PhabricatorRepositoryManagementParentsWorkflow, which is a support workflow for doing a cache rebuild, easy to update, and not critical.

So I think we can do this in one shot, primarily with changes to PhabricatorRepositoryRefEngine.

One open questions:

  • In Mercurial, can a branch have some open heads and some closed heads? I don't remember. This can be tested in a couple minutes from the CLI, but affects which table the isClosed flag ends up in.

And then one other concern: I'd like to restructure how pulls are handled and represented, so we treat a pull (say, which updates master and stable) as though it were an equivalent push, and we generate a new type of "ref content" event, to attack T6522 and T11953. This won't completely resolve those issues but should mitigate them to some degree. I believe this also impacts T12392, in that I'd like to use these synthetic pushes as a means to version observed repositories. Although I believe none of that stuff will actually change the design here, it would be good to lay out the plans first and be more confident that we're truly in the clear.

In Mercurial, can a branch have some open heads and some closed heads?

Yes, of course it can.

epriestley@orbital ~/dev/scratch/hg-test2 $ hg log --template '{node} {branch}\n' --rev 'head()'
b7be5c96129711154d369dcb15bc37308ad67b8f default
e038759934be33fdf73872ae27de5b62b011eacc default
epriestley@orbital ~/dev/scratch/hg-test2 $ hg log --template '{node} {branch}\n' --rev 'head() and not closed()'
b7be5c96129711154d369dcb15bc37308ad67b8f default

So the isClosed flag needs to go on the table with the hash, not the table with the ref name.

This is now deployed on this server, we'll see how it holds up. It won't promote to stable until next week, and I'd discourage anyone from picking it up right away.

(If you do upgrade into it, you should really, truly actually stop all the daemons before running migrations. We had a bit of an issue with runaway re-parsing here which I currently believe was because we cheat a bit and do upgrades with daemons up most of the time, since it usually works.)

epriestley claimed this task.

This is now in stable, and deployed here and to the Phacility production cluster.