Page MenuHomePhabricator

When "arc land" selects a commit range with merge commits, it may fail to slice them from history
Open, NormalPublic

Description

See PHI1860.

See https://discourse.phabricator-community.org/t/arcanist-says-the-revision-does-not-merge-cleanly-into-master-even-after-cleanly-merging-master/4184/.

These reports describe issues where arc land fails to merge changes which should merge. (PHI1860 may or may not describe the same issue as the Discourse report.)

See also PHI1957.

See also PHI2006.

Related Objects

Event Timeline

epriestley triaged this task as Normal priority.Sep 4 2020, 5:46 PM
epriestley created this task.

Broadly, I believe these "certain changes" are approximately changes which include a merge commit.

When arc land acts on a sequence of changes, it attempts to rebase each group of commits before it squash-merges them. The general intent here is to perform work incrementally, in smaller steps.

If you resolve conflicts by rebasing, this generally works fine.

If you resolve conflicts by merging, git rebase becomes sort of hopelessly lost. It has a --rebase-merges option but this doesn't seem suitable for non-interactive use.

I think we can do this:

  • If the rebase fails, or we detect that there's a merge commit in the range so we know it won't work:
    • Try to "rebase" in one shot with "git diff min_commit..max_commit | git apply && git commit". This is crude, but I think it should always work if the user resolved conflicts with "git merge".
    • If that fails, just continue and try the merge, since it may work anyway?

git diff ...

A smarter way to do this is probably to checkout min^, then git merge --squash max, then rebase.

In the general case, the problem is that some invocations of arc, including arc land --pick A, may select an arbitrary range of commits to land which have some ancestors that we do NOT want to land.

That is, if you're in a local branch with local commits (newest to oldest) A, B, C, D, etc., you can invoke arc land to land commits C..A, with the intent that D, E, etc., should not land.

This condition also occurs inside arc normally, because arc land A may mean "land F..D, then land C..A", and we're in the same position when we start the second land operation. And even if there are no commits we don't want to land, we still act as though there are because this makes the code simpler (it can always do one thing, instead of doing two different things depending on whether history does or does not contain commits we want to exclude).

In any case, generally, we can't just merge A because that would bring in all changes applied by ancestors of A, and that is often a different set of changes than the changes we actually want, C..A.

If the branch is nice and linear, we can usually just rebase C..A onto the "into" state (usually master), then merge normally. So we move the C..A slice of commits to be based on master, then squash-merge A, and we're done.

If the branch has merge commits, as here, this generally won't work. How do we get an arbitrary range of commits, C..A, which includes merge commits, into a state where we can merge it into the "into" state without merging ancestors of C we want to exclude?

The best approach I can come up with is this:

  1. Squash-merge C..A into C^, creating M.
  2. Rebase M onto master, creating R.
  3. Squash-merge R onto master.

This generally feels like it makes sense. However, it doesn't actually work, because "M" will include changes also present in "master", and Git may produce conflicts when merging.

The best approach I can come up with to resolve that problem is:

  1. Merge master into A normally, creating M on top of A. (This will be a no-op if the user has merged master themselves and M will be empty.)
  2. Squash-merge C..M into C^, creating S.
  3. Rebase S onto master using --strategy recursive --strategy-option theirs (which really means "always use the version on the branch", since "ours/theirs" are reversed during a rebase) creating T.
  4. Squash-merge T onto master.

The invocation in (3) automatically resolves all conflicts in favor of the branch we're landing. The preparatory merge in (1) probably guarantees these resolutions are correct.

This feels flimsy and dangerous, and I'm not convinced step (1) actually guarantees the use of "theirs" in (3) is safe.

The alternative approach is probably:

  1. If C^ is an ancestor of "master", don't bother trying to rebase, since it isn't necessary. We can just squash-merge and get the right result. This will fix all simple cases of "arc land" where users merge instead of rebasing.
  2. If C^ is not an ancestor of "master", and C..A includes merge commits with parents that are not in C..A, fail (and tell the user to rebase instead of merging when working with multiple dependent changes).

This isn't satisfying and I suspect some users won't really find any guidance on rebasing issued in (2) convincing.

A particular example is:

+-------- master
|
V

o   o <-- Branch A, Commit C2, Revision D2
|   |
|   |
|   |
o   o <-- Branch B, Commit C1, Revision D1
|  /
| /
|/ 
o

Now, the user runs arc land --pick A. This legitimately encounters merge conflicts and fails. The user chooses to resolve conflicts with git merge master and commits a merge which resolves conflicts:

+-------- master
|
V
    o <-- Branch A, Commit C3 
   /|
  / |
 /  |
o   o <------------ Commit C2, Revision D2
|   |
|   |
|   |
o   o <-- Branch B, Commit C1, Revision D1
|  /
| /
|/ 
o

Now, the user runs arc land --pick A again. This means "land the changes in C2 and C3, but NOT the changes in C1".

  • We can't rebase C2..C3 onto master (the problem in this task) because rebases and merges are, uh, mutually allergic.
  • If we squash C2..C3 onto C1 creating S, then rebase it onto master, the merge will often fail because S and ancestors of master apply conflicting changes (both the "original" and "resolved" versions of any legitimate conflicts).
  • If we squash C2..C3 onto C1 creating S, then rebase it onto master and resolve in favor of the branch, the resolutions may be incorrect because the merge point may be somewhere in history and there may be new conflicts above the merge point.
  • If we merge master onto C3 creating S, squash C2..S onto C1 creating T, rebase T onto master creating R with forced conflict resolution, then squash-merge R onto master (this is effectively a no-op, it just simplifies the code), everything appears to work. This is the same as the previous approach, but forces us to have a conflict-free merge point as the maximum commit.

Although this feels sort of flimsy and fragile, I can't find any changes locally which actually break it. In particular, I tried:

  1. In C1, modifying files unrelated to C2. The unrelated changes are correctly excluded.
  2. In C1, making non-conflicting modifications to files modified in C2. These changes are also correctly excluded.
  3. In C2, modifying changes made in C1.

Case (3), it's possible to get modifications to merge that were initially part of C1, then further modified in C2.

In natural cases (were we're internally landing C..A after F..D) this is moot, since all the dependent changes have already merged and this behavior is unambiguously correct.

In unnatural cases (where you used --pick), I think this behavior is reasonable-ish and we should expect --pick is a potentially dangerous operation. Regardless of what the merge does, it is always possible for --pick to take working code and turn it into non-working code because you're removing part of history and not testing the state that results. For example, if C1 adds a method m() and C2 adds a call to method m(), no behavior of --pick can possibly produce working code, and no behavior of --pick can possibly identify the problem by considering merge behavior (there is no world in which these changes have edit conflicts of any kind). When you --pick, you must be confident the changes you're landing are truly independent of the changes you're skipping. The UI could warn about this, perhaps, but I think this is an advanced operation anyway.

We might be able to detect that the --strategy-option theirs merge has done something suspicious by trying to merge C^ into the final state (or vice versa). If it merges cleanly, I think we can be fairly confident that the changes are actually independent. However, I'm not sure offhand what we could conclude if it doesn't merge cleanly.

epriestley renamed this task from Certain "arc land" merges fail when they should succeed to When "arc land" selects a commit range with merge commits, it may fail to slice them from history.Mar 3 2021, 10:13 PM
epriestley claimed this task.