Page MenuHomePhabricator

Diff dependency should cause shared lines to be hidden
Closed, ResolvedPublic

Description

Say diff B depends on diff A.

Diff A adds

  old
+ hello

Diff B adds

  old
+ hello
+ world

Then ideally when viewing diff B, the only line that would be shown to have changed is

  old
  hello
+ world

Event Timeline

Assuming your local tree looks like this:

[master] - last commit from master, has "old"
abc4321 - Add Hello
25ag344 - Add World

You should create the diff for B using arc diff abc4321 (Or, more generally, arc diff <base commit>.

That will create diff B against A, and will only show + world.

epriestley claimed this task.
epriestley added a subscriber: epriestley.

@avivey's answer is correct.

Broadly, if you have commits A and B, it sounds like you're making one revision with "A" and one revision with "A + B".

Instead, make one revision with "A" and one revision with "B".

I believe there are cases where it is advantageous to have one revision with "A" and one revision with "A+B". I'm not sure what everyone calls it, but I've heard it being called the "git rebase workflow" before (versus the git merge workflow).

It's really useful for submitting multiple small commits as opposed to one giant commit. Because the small commits are all related, it is useful to have the 2nd revision be A+B, the 3rd be A+B+C, etc. In fact, in many cases you have B which is only compilable with +A, so you cannot just have one revision with just "B" as suggested. I can probably come up with a realistic example if you would like to see one.

I think it'd be really useful if Differential supported the git rebase workflow. It doesn't take too much to support it. Only what is described in the OP. Diff lines that belong to the <base commit> should just be hidden.

Thanks!

I use the same workflow. Manually running arc diff ${ other diff's commit } is a bit painful, looking up the other diff's HEAD commit and being super careful to pick the right SHA.

I didn't see a +1 button, so I commented here.

For additional context, the reason we use arc diff <other commit> in our workflow is because we've configured git.default-relative-commit to be origin/develop. We're discussing removing this so that we 1) use the default behavior again and 2) encourage people to use arc feature so that branch tracking is set up in such a way that arc diff works as expected.

Sorry to complicate: My original assumption is that if I

  1. Create new "stuff" branch with arc feature <stuff> <dep>
  2. Export "stuff" branch with arc diff <dep>

Then the diff for "stuff" will be automatically have a dependency on the diff for "dep". Now I see my assumption was wrong. When I add the dependency in the webui, I *do* get the diff-display behavior I want.

When I add the dependency in the webui, I *do* get the diff-display behavior I want.

This is the most surprising part here, because we don't have the code to show what you describe.

As an example, D16278 has a dependency on D16276, and the diff shown in D16278 does not reflect this[1].

D16278 includes all the commits from both D16276 and D16277.


...looking up the other diff's HEAD commit...

This is hard to do automagically: Say you have this stack of commits:

ab5  last change                 # this is where you run `arc diff` now
ab4  a minor change
ab3  Add Something great         #  This is where you did `arc diff` last time
ab2  start change 1
ab1  [origin/master]  real base of work

Revision A includes commits ab2 and ab3; Your HEAD is ab5 and you run arc diff --create. Is ab4 part of the new revision B, or part of the revision A?


My normal flow when working with stacked changes is like this:

  • I put all the changes in a single branch.
  • When starting, I just write code any-which-way, and make commits based on chronological order (i.e. git commit -a -m 'made some progress').
  • When I start to get close to a review-process, I rebase heavily in order to have a single commit for every revision I want to have.
  • Then using a series of arc diff HEAD~, I make a bunch of revisions (I do this inside an interactive rebase session, because the flow changes the commit messages).
  • Every round of changes first goes on top of the stack, and then being rebased again into the appropriate commits, so I end up again with 1 commit per revision.

The primary reason for me to squash these commits is to later make it easier to find the right base for each revision (It's always HEAD~ for me). I've tries some other approaches (Using branches and/or tags to mark the boundaries), but this method worked out best for me.

When working on a non-stacked revision, I don't bother squashing the commits.

Using this approach, the I'd end up with 3 revisions; Each will have one commit (A, B or C), and each will Depend On the previous one.
Since each revision explicitly excludes the previous changes, they each look like they only hold their additions - see D16277 and D16276.


[1] It's possible that I got the dependency wrong; feel free to play around.

Yes, writing "Depends On" in a commit message or editing "Edit Parent Revisions" or "Edit Child Revisions" in the web UI should never affect what is shown in Differential. If you can reproduce this, this is a bug which we'll fix.

If you have a workflow where you sometimes need to type arc diff <something>, and you would prefer to be able to type arc diff without arguments instead, configure a base rule to automatically select <something> correctly. See:

https://secure.phabricator.com/book/phabricator/article/arcanist_commit_ranges/

Note that this is exceptionally difficult in the general case, which is why base rules are complex and why arc does not just do what you mean automatically.

To remove your need to type <something>, do this:

  • Think carefully about what information you use to pick the correct <something>.
  • Describe an algorithm which arc could use to pick the same <something> given only mechanical processes and information available to it.

From this point, there are several ways forward, including these:

  1. Your algorithm works in the general case can already be implemented with builtin base rules.
  2. Your algorithm relies on assumptions about your workflow (see below), and is not suitable to bring upstream. You can implement it in a script and use arc:exec, or we'll modularize these rules after T5055.
  3. Your algorithm requires human intelligence and information which arc can not have, and is impossible for a machine to implement.

Most requests in this vein fall under (2) or (3) -- users are frustrated by the behavior of arc, but want it to have behavior that does not work in the general case or which machines can not implement.

In this case, the proposed algorithm appears to be something like:

  • Ask the user which revisions their change depends on.
  • Select ancestors of HEAD which are not part of that revision.

This won't work in the general case. Here's an example of a situation where it will not work:

$ git checkout -b feature1
$ ...
$ git commit -am feature1-commit
$ arc diff # Create revision A
$ git checkout -b feature2
$ ...
$ git commit -am feature2-commit
$ git rebase master # This rebase will change all the "A" commits.
$ arc diff

The marked rebase will change all the commits that are part of "A", so the proposed rule will fail to identify them. The proposed rule (at least, as I've interpreted it) assumes commit hashes are stable, but they are not.

(More inline with the original suggestion we could try to do "diff calculus" and just remove "A" from "B" visually in the web UI, but in this workflow we'd end up with conflicts in attempting to do this, because the state "A" would not exist anywhere in the history of "B", so this is dramatically more complex and does not produce a better result.)

Additionally, this rule requires users to write Depends on <something> instead of <something>: the proposed rule is actually worse and requires more typing.

If you never rebase in your environment you are free to implement this rule anyway, using arc:exec. It may work well under assumptions that hold true in your environment but do not hold true in the general case.

Most rule suggestions make one of these assumptions:

  • Users never rebase, so commit hashes or tree hashes are stable.
  • Users always amend commits, so each revision is exactly one commit.
  • Users never change commit messages, so messages are stable.
  • Users never use history.immutable, so messages are Phabricator-controlled.
  • Users never merge.
  • Users never squash.
  • Users never rename branches.
  • Users never move code between working copies.

Although some of the assumptions may be acceptable in your workflow, all of these are untrue for many workflows.

See also T3875 for a similar discussion.

I do think there is significant room to improve the behavior of arc diff (roughly captured in T6072) but I'm ultimately not sure how to improve this situation:

  • Users run arc diff with a strong expectation about what it should do, and believe the behavior they imagine is simple and intuitive.
  • Users obviously don't consider every possible git workflow since they only use one workflow, and the desired behavior really is simple and intuitive with the simplifying assumptions of their workflow. But arc can not know what these assumptions are.
  • In the general case, the behavior they imagine is actually complex, unintuitive for many other users, ambiguous, impossible for machines to resolve, etc.
  • Consequently, arc diff does something other than what they expect. Since the right behavior seems simple and obvious, this is really frustrating. The tool seems shoddy and broken since it can't do a simple, obvious thing correctly.

The approach arc currently takes is to allow you to tell it which simplifying assumptions are acceptable using base rules. But the imagined pathway here doesn't seem to be holding up in practice. The idea is:

  1. You run arc diff <something> for a little while.
  2. You get tired of that, so you configure a base rule which always works correctly in your environment/workflow.
  3. Now, you don't have to type as much. Great!

A lot of users seem to skip step (1), possibly because administrators configure a base rule that does not work in many cases without explaining what they've done or realizing that when you get base wrong for some of your users' workflows, you change arc from "requires annoying extra typing" to "deeply frustrating broken shoddy garbage mess".

A lot of users don't seem to make it through step (2), possibly because their rules are actually impossible for machines to execute.

Thanks for the explanations. I can appreciate the complexity and the multitudes of workflows which I am certainly not thinking of.

When I add the dependency in the webui, I *do* get the diff-display behavior I want.

I now believe this statement I made earlier is incorrect.

using a series of arc diff HEAD~, I make a bunch of revisions

I am now trying this workflow. But I do have a question: If I run arc diff HEAD~, could Phabricator be smart enough that it could set the new revision to Depend on the revision at HEAD~? Right now, I have to manually go into the webui to add a parent dependency.

See T11343 for discussion of automatic "Depends On" detection. Briefly: desirable, but also complicated in the general case.

Got it. Another quick comment: arc branch unfortunately does not work well for the HEAD~ workflow.

"work well" means "show all active diffs"