Page MenuHomePhabricator

arc land might identify wrong revision if branch is missing revision
Closed, InvalidPublic

Description

Repro steps:

Set up repo with
master -> A -> B
Where A and B are dependent branches.
A doesn't have a revision associated (say we wanted to skip review)
B has a revision associated with both commits A and B.

git checkout master
git checkout -b A
# do edit A
git commit -a -m "Commit A"
git checkout -b B
# do edit B
git commit -a -m "Commit B (requires review)"
arc diff master

Now, if you go to branch A and try to land, it will block on the revision from commit B

git checkout A
arc land
nipunn-mbp:server nipunn$ arc land
On branch A.
Updating branch master...
On branch A. Identifying and merging...


    Revision 'D137659: Test Commit A Test Commit B (requires review)' has
    not been accepted. Continue anyway? [y/N] N

Although I understand why this happened (Commit A is technically inside the revision), I was still thrown for a loop when this happened. I expected either:

  1. Land branch A successfully
  2. Fail to land because there's no revision in the tree for A

I believe this is the code that does the lookup.
https://github.com/phacility/arcanist/blob/master/src/repository/api/ArcanistGitAPI.php#L1038

Event Timeline

nipunn raised the priority of this task from to Needs Triage.
nipunn updated the task description. (Show Details)
nipunn added projects: Arcanist, Restricted Project.
nipunn added subscribers: nipunn, gregprice.

I do this routinely (roughly every day) without encountering this problem, so I'm surprised by this behavior.

I'd expect this only if:

  • The message for Revision A incorrectly includes the text Depends on D123, where D123 is Revision B.
  • (Maybe commit B is empty, so that Revision A and Revision B have the same tree hash?)

I do this routinely without errors and am not sure how to reproduce it.

I went back to repro this and realized that my repro steps weren't perfect. On second thought, this seems like an expected behavior, but it still threw me for a bit of a loop. I had created one commit and split it up into two (the first of which was trivial and could skip review). I think the behavior is reasonable - I just didn't realize why it had associated branch A with the revision on branch B.

Why is "Commit A" in "Revision B"?

That is, if you had run arc diff A instead of arc diff master, I believe this workflow would do what you expect (specifically: refuse to arc land on A). arc diff --base git:branch-unique(origin/master) would also do what you want.

Did this workflow use arc diff master intentionally, or just out of habit? If you type this out of habit, is there a particular reason you don't use base rules like git:branch-unique()? (e.g., don't know they exist, don't understand them, didn't realize they could solve this problem, etc?)

If we refused to arc land in this case it would break this other weird workflow, where you stack multiple changes in a branch with immutable history configured, then amend one of them:

$ arc set-config history.immutable true
$ git checkout -b A
$ nano ...
$ git commit -m checkpoint1
$ nano ...
$ git commit -m checkpoint2
$ arc diff
$ sleep 600
$ nano ...
$ git commit --amend -m 'checkpoint2fix'
$ arc land

This workflow is probably more bizarre than the workflow above by a significant margin, but we can currently get it right (use the stable tree/commit hash of "checkpoint1" to identify the revision, even though the tree and commit hashes of "checkpoint2" have changed).

We could add more heuristics or prompts here (e.g., "this branch contains commit X, which has tree hash Y; tree hash Y is also part of revision A but not at the head of revision A, continue with revision A anyway?") but I think the net effect would be about the same.

You can already use arc which to figure out revisions arc believes are present on a branch. The output for the branch matching stuff isn't quite as detailed as it could be (we don't show which commit/tree hashes match), but I believe cases of confusion here are generally very rare.

I honestly am not sure how I ended up in the state where there were two commits against one revision. I was splitting one revision into two and unfortunately, I'm not exactly sure what I did at the time. I believe it was a git rebase -i HEAD^ followed by git add -p followed by git commit and git rebase --continue, but there may have been some more nuance to it.

It was most likely a mistake that I had two git revisions attached to one differential revision.

Upon discussion, I think that the existing behavior is reasonable.

A small amount of text making it clear that we picked up the differential revision not from the commit message, but from phabricator would have avoided this confusion for me, but I don't think it's necessary as the confusion was probably rooted in an error I made.

Yeah, I'd generally like to make workflows more clear about how they're arriving at results. arc diff is the major culprit here but all of the automatic resolution we do in selecting commit ranges, revisions and repositories can likely be made more clear.

angie moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.