Page MenuHomePhabricator

Arc diff is not creating a new revision when creating a new topic branch off a topic branch with already a revision associated
Open, Needs TriagePublic

Description

NOTE: Observed on Phacility and with very latest Arcanist on OS X.

Circumstances

  • I have a topic branch test branched off master
  • It has 4 commits including a merge one from master
  • There's an associated revision which is up-to-date (arc diff was called while on tip of topic branch)
  • I then created a new topic branch off the tip of this topic branch and made a commit on it

Bug

arc which says it will update the revision instead of creating a new one.

Repo topology

pasted_file (599×990 px, 44 KB)

Output of arc branch

$ arc branch
  test   Needs Review D129: Testing things again
  master No Revision  Fix
* retest No Revision  Update

Output of arc which

$ arc which
REPOSITORY
To identify the repository associated with this working copy, arc followed this process:

    Configuration value "repository.callsign" is empty.

    This repository has no VCS UUID (this is normal for git/hg).

    The remote URI for this working copy is
    "git@github.com:lifeonairteam/test.git".

    Found a unique matching repository.

This working copy is associated with the Test repository.

COMMIT RANGE
If you run 'arc diff', changes between the commit:

    3ab00423b0068413  Fix

...and the current working copy state will be sent to Differential, because
it is the merge-base of 'origin/master' and HEAD, as specified in
'.git/arc/default-relative-commit'.

You can see the exact changes that will be sent by running this command:

    $ git diff 3ab00423b0068413..HEAD

These commits will be included in the diff:

    4def279945a7a890  Update
    e8eaafa0a61e27ed  Fix
    be1bc08e863d9135  Merge branch 'master' into test
    02e66df95f56950a  Update
    b1867a16194702d9  Testing things again


MATCHING REVISIONS
These Differential revisions match the changes in this working copy:

    D129 Testing things again
        Reason: Commit message for 'b1867a16194702d9' has explicit 'Differential Revision'.

Since exactly one revision in Differential matches this working copy, it will
be updated if you run 'arc diff'.

Event Timeline

Interestingly, using arc diff --create to force create a new revision generates one with a patch that still contains the diff of the original topic branch. The solution is to force arc diff to use the tip of the original topic branch as the base whenever calling arc diff while on this secondary topic branch.

Based on the output, what do you believe arc is doing? What do you think it should do instead? Which part of the behavior do you believe to be a bug?

I believe everything here is working exactly as intended and described completely and exhaustively in the output, but that you want arc to make some assumptions which it can not: likely, they are valid assumptions in your workflow but are not valid in other workflows. I don't think arc is misrepresenting what it's doing in way, or suggesting that it makes any assumptions it doesn't.

We can walk through your expectations, identify those assumptions, I can point out why arc can't make (and why it isn't saying it's making them). Once I understand exactly what you want arc to do, I can tell you which base rule you should use to encode that behavior. There is a strong likelihood that the base rule you want is impossible (i.e., not computable without external knowledge that only humans have).

Note that you can avoid this completely by setting base to arc:prompt. This will make arc ask you which commit to diff against every time, and you won't have to configure or teach anyone how base rules work.

Or maybe a simpler question is: when you run arc diff without arguments, what is the complete algorithm you want arc to follow to figure out which commit range you intend to send for review?

Expected flow:

  • have a topic branch and call arc diff
  • create a new topic off the first one then call arc diff again
  • Arcanist understands this is a new topic branch which no associated revision yet, so it does arc diff --create <first_topic_branch>

Per our email discussion, I was under the impression that's how things work since you said IIRC that using --create should not be needed?

Arcanist understands this is a new topic branch which no associated revision yet,

How?

Also, how do you expect to arc to distinguish between these two use cases, each of which some users prefer?

  • You want all commits on the topic branch (for some definition of that).
  • You want only the most recent commit (or a subset of "all the commits"), because you stack different changes on the same named topic branch.

Maybe a better path forward is this:

Arcanist understands this is a new topic branch which no associated revision yet

arc tries to avoid magic like this because it is so open to interpretation. Instead, what arc does is blindly execute a very narrow "base" rule exactly as you've configured it to. The default rule is an extremely simple rule that tries to make the behavior of arc as simple to understand and predict as possible.

arc which tells you which "base" rule it's executing, and what result it got from that:

...because it is the merge-base of 'origin/master' and HEAD, as specified in
'.git/arc/default-relative-commit'.

Specifically, you have configured arc so that when you run arc diff with no arguments, it means arc diff origin/master, which boils down to git diff `git merge-base origin/master HEAD`.

If you want arc to use some other rule, like "identify unique commits on only this branch" or "go through all ancestors of HEAD until you identify a commit which is already associated with a revision", you need to configure base to do that instead. These are much more complex behaviors which can do much less predictable things. And, while everyone wants arc to "do what they mean", no one agrees on what arc diff with no arguments means. Because of this, we try to do something reasonably sensible but simple and predictable by default.

The documentation on commit ranges describes some of the other builtin "base" rule components you can use:

https://secure.phabricator.com/book/phabricator/article/arcanist_commit_ranges/

In particular, git:branch-unique(...) can be used to select commits unique to the current branch (but note that this does complex, confusing things if the user rebases test, since all commits on retest will become unique to that branch!).

arc:exec(*) can be used to have arc use some completely custom behavior which you have total control over. If you believe arc should work in a particular way, you can use arc:exec(*) to make it work in exactly that way. But I suspect that if you start writing a script to make "arc diff" always do what everyone means, you'll soon have a very large, complicated script which everyone complains is too large and complicated and does the wrong thing. So you'll add a way to configure it with a "base" rule, and add support to have your script execute some other script so users can truly control what arc diff does.

The description of the "git:branch-unique(...)" rule in the "Arcanist Commit Ranges" document may be helpful in understanding why this is hard, or at least hard enough for me that I was unsuccessful in producing a "do the right thing" or "do what users expect" behavior when I tried and we ended up with "base" instead:

https://secure.phabricator.com/book/phabricator/article/arcanist_commit_ranges/#git-branch-unique

Note that all of those cases only talk about two branches with amends and normal commits. They don't even touch on rebases, merges, cases with more than two branches, or cases where users stack multiple revisions on the same branch.

Arcanist understands this is a new topic branch which no associated revision yet,

How?

Because it's a branch with no associated revision? I mean doesn't Arcanist track this?

Also, how do you expect to arc to distinguish between these two use cases, each of which some users prefer?

You don't. You prompt 😄

The default rule is an extremely simple rule that tries to make the behavior of arc as simple to understand and predict as possible.

I understand why things work this way. The whole "confusion" came from this statement in an email from you in our email discussion about branching off branches:

You should never normally need to use "--create". The primary purpose of "--create" is to allow users who are in a state they don't understand to force "arc" to do what they want without actually understanding why it's doing what it's doing.

Which I read as "Arcanist will figure a new revision is needed since this is a new topic branch AND by definition will use the branch point of that secondary topic branch as the base"...

I mean doesn't Arcanist track this?

No. Users rename branches, HEAD may be the tip of multiple branches, some users work directly on master, moving changes around with .diff files doesn't preserve branch names, arc patch uses arcpatch-* branch names, etc. See also some discussion in T3875.

You prompt

We did, the first time you ran arc:

$ arc diff
 Select a Default Commit Range 

You're running a command which operates on a range of revisions (usually,
from some revision to HEAD) but have not specified the revision that should
determine the start of the range.

Previously, arc assumed you meant 'HEAD^' when you did not specify a start
revision, but this behavior does not make much sense in most workflows
outside of Facebook's historic git-svn workflow.

arc no longer assumes 'HEAD^'. You must specify a relative commit explicitly
when you invoke a command (e.g., `arc diff HEAD^`, not just `arc diff`) or
select a default for this working copy.

In most cases, the best default is 'origin/master'. You can also select
'HEAD^' to preserve the old behavior, or some other remote or branch. But you
almost certainly want to select 'origin/master'.

(Technically: the merge-base of the selected revision and HEAD is used to
determine the start of the commit range.)

    What default do you want to use? [origin/master]

When you hit "return" at this prompt, you configured arc diff with no arguments to mean arc diff origin/master, as the huge wall of text describes.

email discussion

What I meant here is that if you configure a base rule which encodes what you expect arc diff with no arguments to mean, you should not need to use --create or --update, not that arc can figure it out without configuration.