Page MenuHomePhabricator

When a previously non-permanent ref is marked permanent, unpublished commits reachable from that ref are not published
Closed, ResolvedPublic

Description

See PHI1225. I haven't actually walked through these repro steps so this is slightly hypothetical, but:

  • Create a repository R with only branch X as a permanent ref.
  • Push commit C to branch Y.
  • Mark branch Y as permanent.

Expected behavior: C is published (it is now reachable from permanent ref Y).

Actual behavior: C remains unpublished.

Workarounds: Probably, push another commit to Y, or push C to another permanent ref.

In PhabricatorRepositoryRefEngine, we should close every reachable commit on Y if it becomes a permanent ref. We'll currently skip this step because we won't create a new cursor for it.


Likely adjacent is that the RefPosition table should likely store (a) is this ref the repository default ref and (b) is this ref a permanent ref. We probably need to do (b) to fix this (automatically publish commits when a ref becomes permanent). Doing both (a) and (b) will let us query a list of branches with a more human-readable ordering (see D20493).

Event Timeline

epriestley created this task.

Probably, push another commit to Y

This usually won't work. In the RefEngine, we identify new commits reachable from permanent refs like this:

  • Store a list of the old heads of all the permanent refs we know about (A, B, C, ...) ($all_closing_heads).
  • Pull a list of the new heads of all the permanent refs from disk.
  • For each new ref X which does not match our old list, git log X --not A B C to find commits which are newly ancestors of a permanent ref.

When branch aardvark is not a permanent ref and points at commit Q, but then you change the configuration to make it a permanent ref, the next RefEngine update sees it as one of the old positions of a permanent ref, so it is never selected as a new ancestor. It remains unpublished.

See PHI1447. In this case, the install reports a sequence of events roughly like this:

  • Disable publishing.
  • Import a repository.
  • Enable publishing.
  • A bunch of commits discovered during import never publish.

See PHI1458. This one looks simpler:

  • Import a repository.
  • Some commits never publish.

I think these are all the same root cause and the fix is to add a "was this ref a permanent ref the last time we discovered commits?" flag to ref cursors.

The algorithm in PhabricatorRepositoryRefEngine->updateRefs() is approximately:

  • Get a list of where every ref currently points with git for-each-ref.
  • Load the "ref cursors" (a saved list of ref positions -- where the refs were the last time we did this) from the database.
    • Discard all the cursors that are not permanent refs.
      • [Problem A] We should instead discard cursors which were not permanent refs when we wrote them to the database, i.e. discard refs which were not permanent and are still not permanent.
    • Discard all the cursors that point at commits which no longer exist (this is just housekeeping).
  • For each ref which is now permanent and has changed position, run git log <ref> --not <every ref we've already closed ancestors of>. Due to (A) above, the "every ref" list may incorrectly include refs which are now ready to close but which we haven't yet closed ancestors of.
    • Close/publish all commits in that list.

This explains the original case, where a ref switches from non-permanent to permanent. However, I'm not yet satisfied it explains the other two cases.

The culprit there might be this:

if ($this->hasNoCursors) {
  // If we don't have any cursors, don't close things. Particularly, this
  // corresponds to the case where you've just updated to this code on an
  // existing repository: we don't want to requeue message steps for every
  // commit on a closeable ref.
  return false;
}

We'll hit this during initial import, too. This could probably cause us to write master as a permanent ref but not close its ancestors, which could explain the two import cases.

If this theory holds up, the "swap a branch" case can be reproduced like this:

  • Create a repository with branch X as temporary.
  • Fully import the repository.
  • Mark branch X as permanent.
  • Run bin/repository refs.
    • Expect: no-op because we git log ... --not X when finding new commits.
    • Desire: commits on X are now published and git log does not specify the head of X in the --not subclause.

The "import a repo" case can be reproduced like this:

  • Create a repository with daemons off. Continue manually:
  • Fetch and discover the repository.
  • Run the refs step on the repository.
    • Expect: no-op because we skip closing when there are no refs yet.
    • Desire: close those refs.
  • Run the refs step on the repository again.
    • Expect: no-op because we already marked everything permanent.
    • Desire: no-op because we already closed everything.

Assuming that all lines up, the remaining question is "what happens to affected commits in existing repositories"?

I think they'll currently skip the message step (which is somewhat bad) and end up in the publish step, but now they'll actually publish because the repository is not importing.

So I think this probably needs to be coupled with a migration to put every repository in initial import mode, or we need to go set the flag properly on every ref cursor and then have separate "fix broken stuff" command.

If this theory holds up, the "swap a branch" case can be reproduced like this:

In spellbook, I set "Permanent Refs" to "master", then ran bin/repository update. (This is sort of a bad test case since the repo is full of garbage, but whatever.)

With daemons off, I pushed a commit ("e19be7a7") to branch "turpentine" in the origin. This should create the branch. I ran bin/repository update to fully sync things. This created the cursor:

| 752 | PHID-RREF-omaoeqnstwxuwxo4s4wt | PHID-REPO-nx62hdj3yynd6nwbgxwy | branch  | hl4outERB5_r | turpentine         | utf8            |

...at the right position:

mysql> select * from repository_refposition where cursorID = 752;
+------+----------+------------------------------------------+----------+
| id   | cursorID | commitIdentifier                         | isClosed |
+------+----------+------------------------------------------+----------+
| 3115 |      752 | e19be7a7da8b1229bef9716ebcfbaa2a3841c319 |        0 |
+------+----------+------------------------------------------+----------+

Actually, I deleted all the garbage branches in the remote and ran bin/repository update twice. So now things are clean:

mysql> select * from repository_refcursor c join repository_refposition p on c.id = p.cursorID  where repositoryPHID = 'PHID-REPO-nx62hdj3yynd6nwbgxwy' and refType = 'branch' order by refNameRaw\G
*************************** 1. row ***************************
              id: 532
            phid: PHID-RREF-257esghrnso2ukjlv46p
  repositoryPHID: PHID-REPO-nx62hdj3yynd6nwbgxwy
         refType: branch
     refNameHash: fCKLrzDyaVYn
      refNameRaw: master
 refNameEncoding: utf8
              id: 3088
        cursorID: 532
commitIdentifier: aca95b69aec339bcb9a6c6f70e70bf12d122ae6f
        isClosed: 0
*************************** 2. row ***************************
              id: 752
            phid: PHID-RREF-omaoeqnstwxuwxo4s4wt
  repositoryPHID: PHID-REPO-nx62hdj3yynd6nwbgxwy
         refType: branch
     refNameHash: hl4outERB5_r
      refNameRaw: turpentine
 refNameEncoding: utf8
              id: 3115
        cursorID: 752
commitIdentifier: e19be7a7da8b1229bef9716ebcfbaa2a3841c319
        isClosed: 0
2 rows in set (0.00 sec)

And the commit is unpublished:

Screen Shot 2019-09-24 at 4.40.32 PM.png (410×552 px, 38 KB)

Okay, so now we mark "turpentine" as a permanent ref and re-run refs. There are actually two different problems here:

  • We won't see this as a ref position change, since the ref position didn't change.
  • If we change the ref position, we expect to see git log <new position> --not e19be7a7da8b1229bef9716ebcfbaa2a3841c319.

Ugh. Let me deal with the first case first.

I marked "turpentine" permanent. Then I ran update instead of refs, and ran it without trace, destroying my test state.

I put a new commit on "turpentine2" and pushed it, updated, marked it permanent, and ran bin/repository refs spellbook --trace.

No surprises: marking the commit was skipped because the ref hadn't moved, and the code doesn't realize that "became permanent" is a valid change.

So here's the other test: I put a new commit on "turpentine3" and pushed it, then updated. I pushed another commit, ran bin/repository discover spellbook, then ran bin/repository refs spellbook --trace. My new commit is "1ae199d" and the old head was "cf2f726".

$ ./bin/repository refs spellbook --trace
...
>>> [20] (+60) <exec> $ git log --format='%H' '1ae199d0415d237a3fecc2c223bcca4ce1aaea61' --not 'cf2f72684e269cd99a2160dd3e20d3f4f871117c' ...
...

So that aligns with predictions, and is a consistent explanation of the observed behavior and the bug.

The "import a repo" case can be reproduced like this:

This one is theoretically easier to test. I created a new "spellbook2" repository and pushed a branch to it. I ran discover.

I ran refs --trace and saw nothing happen, which is consistent with the theory above.


So I think this likely describes the state of the world, and the remedy is:

  • Remove the hasNoCursors code.
  • Add an isPermanent marker to RefCursor.
  • Only include refs which are marked as isPermanent in the list of --not refs.
  • Include refs which changed from not isPermanent to isPermanent in the list of refs to update.
  • At the end, update the isPermanent flag on refs which have changed permanence.

Migration-wise, my plan is:

  • Mark all existing refs as previously-permanent.
  • Add a bin/repository refs --rebuild or similar flag to repair repositories affected by this bug.

I think this issue is not terribly important/widespread and the alternative (mark all existing refs as non-permanent and all repositories as importing, then trust them to get into the right state) creates a little too much peril.

Retrying this on the patch:

The "import a repo" case can be reproduced like this:

I created a new "spellbook3" repository and pushed a branch to it. I ran discover and refs --trace.

This time, all the commits got published:

>>> [20] (+63) <query> UPDATE `repository_commit` SET importStatus = (importStatus | 1024) WHERE id = 18019

Post-patch:

If this theory holds up, the "swap a branch" case can be reproduced like this:

I pushed commits to (nonpermanent branch) "turpentine4". I updated. This hit a minor issue where a deleted tag was marked permanent by the migration, which normally can't happen. I corrected it, then repeated the update, which worked.

I marked "turpentine4" as permanent and ran refs.

This time, the previously unpublished commit on "turpentine4" which was now reachable was published:

>>> [23] (+82) <query> UPDATE `repository_commit` SET importStatus = (importStatus | 1024) WHERE id = 18020

I did the other test, pushing changes to "turpentine5", then updating, then pushing more changes, then marking it as permanent, then running discover + refs.

This updated both commits correctly.

Add a bin/repository refs --rebuild or similar flag to repair repositories affected by this bug.

I identified bugged commits on "turpentine2", then ran bin/repository refs --rebuild --trace --verbose.

This didn't work out of the gate because I was ignoring the cache when building the list of refs to search, but still using the cache when building the list of refs to exclude.

I modified the code, then re-ran it. This time, the bugged commits were correctly identified and their state was fixed.

This fix is not retroactive (and, I think, isn't easy to make retroactive in an obviously safe way) so you'll need to manually repair affected repositories if you want to fix existing commits affected by this bug. This bug doesn't likely doesn't have any far-reaching implications so there's no need to go on a grand adventure to hunt these down, but if you've run into some you can fix them like this:

  • Stop the daemons.
  • Put the repository in importing mode so it won't send email. If you don't do this step, all the affected commits will fully publish when you repair them, which may send a lot of email.
phabricator/ $ ./bin/repository mark-imported --mark-not-imported <repository>
  • Rebuild refs using the new --rebuild flag, and optionally the --verbose flag (to generate human-readable progress information) and the --trace flag (to generate detailed debugging information).
phabricator/ $ ./bin/repository refs --rebuild <repository> [--verbose] [--trace]
  • Start the daemons.

(I'll include these instructions in the changelog, too.)