Page MenuHomePhabricator

Allow arc to perform cascading rebases, maybe during or after `arc land`
Closed, ResolvedPublic

Assigned To
Authored By
hach-que
Sep 23 2013, 10:39 PM
Referenced Files
None
Tokens
"Like" token, awarded by leoluk."Mountain of Wealth" token, awarded by scp."Cup of Joe" token, awarded by yelirekim."Like" token, awarded by hidden.

Description

This kind of annoys me a little. Basically I'll be working on a feature branch and I've submitted a diff, but I'm still waiting for it to be accepted. I want to start the next piece of work, so I arc feature off the feature branch (thus it's master -> first_feature_being_reviewed -> second_feature).

When the review is accepted and landed, I then have to do git rebase --onto to move the second feature branch on top of master. Ideally arc land would detect any derived branches and rebase them automatically, but I'm guessing this could be quite difficult to do properly (especially if there's 3 or 4 feature branches).

Related Objects

Event Timeline

epriestley triaged this task as Wishlist priority.Apr 24 2014, 12:32 PM

I think this arc land is maybe the wrong place to do this (or, at least, not the best place) but do think this would be useful to have eventually, maybe as something like arc cascade.

An advantage of doing this from arc land is that we know the hashes on either side of the rebase, but we could fish this out of git reflog or make arc cascade be a pre-rebase operation (e.g., arc cascade master would act roughly like git rebase master).

If this worked properly, switching arc land to use it (sometimes? With a flag/config?) would probably make sense, but maybe it should happen after the push? Like, you finish the land and then the cascade triggers, so if you end up with a pile of messy rebases you can just ^C / git rebase --abort a lot to get out of it and not have to deal with the land again.

Although I'd find this useful, I suspect it would take years to pesonally regain the time spent developing it, and I don't think these workflows are enormously prevalent in the wild. Similar requests have come up every so often, but not with much real frequency.

epriestley renamed this task from `arc land` should rebase any derived branches to Allow arc to perform cascading rebases, maybe during or after `arc land`.Apr 24 2014, 12:33 PM

When the review is accepted and landed, I then have to do git rebase --onto to move the second feature branch on top of master

But, in this specific case, you shouldn't need to do this? That is, arc land should just rebase onto master for you when you go to land the second-degree branch?

epriestley changed the visibility from "All Users" to "Public (No Login Required)".Jul 7 2015, 10:39 PM
eadler added a project: Restricted Project.Apr 10 2016, 6:20 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Apr 17 2016, 6:24 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 1 2016, 10:40 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 6 2016, 4:11 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 16 2016, 6:28 PM

I think the immediate issue here tends to manifest as arc land requiring users to manually rebase before it will work.

I think the problem scenario is that if you have MasterAB and arc land A, then later arc land B, I think there are three issues:

First, arc will sometimes (always?) show you that more commits will land than will actually land. Particularly, arc land will report that it will land A and B, when the actual effect of the operation is only to land B. arc does not know/determine that the A on your local branch is the same as the A' which landed onto master already. This is not easy to determine automatically because A' may have a different commit hash, message, and tree hash than A -- there is no unambiguous mechanical way to examine a repository state and know that A and A' are equivalent.

Second, arc will sometimes (when revisions haven't closed yet?) warn you that multiple revisions will land when this warning isn't useful. In the scenario above, it warns that revision A will land when revision A is already present on master. I believe we often avoid this being an actual problem by ignoring revision A if it's closed, but if you land stuff quickly you can hit this. Resolving the first issue would resolve this issue.

Third, arc will sometimes (when tree hashes changed?) fail to rebase. Resolving the first issue, then rebasing more selectively would probably resolve this issue.

In my personal workflow, I worked around these issues with this gm (originally short for git rebase -i master) script:

#!/bin/sh
set -x
set -e

TARGET=${1-origin/master}

exec sh -c "git rebase -i $TARGET && arc liberate src/ && ./bin/celerity map && git status"

In the greater scheme of things, I intend to give arc a hook for artifact regeneration steps (T10329) so that I can get rid of this, but I needed something here anyway until then.

In practice, I gm and then arc land when stacking local branches. Since I only land about 3,000 commits a year this is a net savings of perhaps a couple hours annually, so it hasn't been high on the list of itches to scratch.

One way we could possibly fix this is to store a map of every commit we generate with arc land, then use that to identify that {nav A} and {nav A'} are, in fact, the same commit. That should let us fix the first problem, and I think the other two should get fixed for free once we can pick the correct commit range. This has some minor technical issues (the list may become unboundedly large and probably needs some kind of GC step) and will be local to the working copy so it may eventually run into some issues with "share code via forks" workflows, but I think it's pretty reasonable conceptually.

I'm also not sure how to do what git rebase -i does non-interactively (i.e., move arbitrary commits X..Y on top of master) but git presumably has some way to do this. In the worst case, we can use git rebase -i with an EDITOR that literally does the edit, although this is exceptionally silly.


There is also a related issue, which are more in line with the idea of arc cascade:

First, you may update A in response to feedback, and want to cascade the change to B, C, etc. Currently, you must do this manually:

git checkout B
git rebase -i A
git checkout C
git rebase -i B
git checkout D
git rebase -i C

Sometimes you can just ignore this if your changes are minor, but sometimes there's, say, an API change that you legitimately want to cascade down the chain. It would be nice for arc to have an arc cascade function that does this for you, possibly also updating revisions along the way.

I think the major issue here is that you've already rewritten A into A', so it's not clear to me how we can identify what A was to find which branches need to be cascaded. The rewrite is also usually a git operation (git commit --amend) so we don't have an opportunity to leave a note to ourselves somewhere.

Here are some ways we could try to figure out A after you've amended it:

  • Parse git reflog to find it.
  • Identify the revision in the working copy, then query Phabricator for all old hashes of that revision (may be wrong if the revision changed a lot or got split up).
  • Assume you didn't change the message and look for other commits made recently with the same message?

Of these, reflog seems most promising.

Once we identify A, we can find all descendants of it and rebase them onto A', then repeat the process recursively. For this step, we can store notes locally about which commits we're rewriting, so we can do things reliably from this point forward.

Two related cases here:

  • You've landed A but B isn't accepted yet. You want to cascade the changes.
  • You haven't landed anything, but there's relevant code in master and you want to rebase/cascade everything on top of it.

In both cases, I think you'd run arc cascade master which would operate like git super-powerful-mega-rebase master.

I'm not sure if arc cascade should try to do anything with immutable history. For now, I'd probably just have it abort.


A related issue which is probably overripe is that unit testing arc workflows is currently more difficult than it should be. arc land and these proposed arc cascade operations are theoretically fairly amenable to unit testing (unlike most commands, they have little-to-no need to interact with the server, and their effects can be easily observed with git operations), but we don't have a harness yet for "run arc commands from the CLI". It would be nice to at least get something basic in place. We do already have infrastructure for "create a mutable test copy of a reference repository", so this is mostly about making it reasonable to write a test script.


My general plan here would be:

  • Write a reflog parser and make sure that's viable.
  • Figure out how to git rebase -i without actually using an EDITOR.
  • Take a stab at making top-level arc operations more testable.
  • Build the working copy "notes" store so we can keep track of which commits we've rewritten.
  • Do the arc land fixes, so it does the right thing more often (automatically rebasing the correct commit ranges).
  • Implement arc cascade for the other cases.

I wouldn't plan to have arc land automatically run arc cascade (I don't think it needs to if it just works better), but a --cascade flag might be reasonable.

Partly, arc cascade will necessarily be an interactive workflow (rebasing may encounter conflicts) so I don't want to automatically put "now you have to resolve 50 branches worth of conflicts" on the end of arc land.

  • Does that description sound approximately like what you're hitting?
  • Are both smarter arc land and arc cascade interesting, or are you only running into pain in one of the cases?
  • Do you anticipate cases where git reflog and local notes could fail as sources (lots of pushing/pulling to different working copies, I guess)?
  • Anyone know offhand how to git rebase -i master; delete the first few lines; save and exit; non-interactively?

I'll comment on the remainder later.

  • Anyone know offhand how to git rebase -i master; delete the first few lines; save and exit; non-interactively?

set EDITOR or pass -ccore.editor as a script.

I think the root of this is largely building a commit mapping from "draft" commits to their equivalent "published" commits. I believe all of the approaches available to us are:

Commit Hashes: Identify commits as the same because they are identical. In most cases (but maybe not all cases) git should do this for us. This is always correct, but has many false negatives after merges/rebases/amends/etc where commits are equivalent in the sense we care about but not identical.

Tree Hashes: Identify commits as the same because their contents are identical. This won't survive merges/rebases if other changes have happened, and creates false positives with empty commits and reverts (which take the tree state from A -> B -> A).

Commit Messages (or other metadata): We can use messages only, or messages with other message metadata (author, date). These aren't unique and aren't stable.

git reflog: Use the reflog to identify rewrites made locally with git commands. As of yet, it's unclear how powerful this is. This won't work with remote changes or older changes which have been GC'd.

Local Notes: When arc commands perform rewrites, we can write the old and new hashes down somewhere in a local cache. This won't work with remote changes and users will have no obvious way to know that it exists or observe it.

Remote Notes: We can put the local notes on the server to deal with local/remote problems, but this makes observing/understanding them harder (arc may do something surprising because someone else made a mistake and told the server lies about commit equivalency).

Use Branch Names: We can assume branch names don't change, and that commits on a branch are equivalent at all points in time. These assumptions are often incorrect and can lead to dramatically mistaken false positives, but are somewhat true in some cases provided the branch is a local feature branch. This won't work across publishes which change branch names, branch renames, etc. It is currently also inefficient to query which branches contain a commit in large repositories.

Use "Differential Revision: ..." in Messages: We can assume commits are equivalent if they are marked as being related to the same revision. This is error-prone and inefficient to search in the general case.

Use something like "Change-Id": We can do something like Gerrit and require a pre-commit hook which mutates messages. I don't think these can survive squash-merges.

Query a Service / have a service write data to commit messages: Ask some external service for a list of equivalent commits, or have it embed this information in commits (for example, if using an external merge queue).

Phabricator's own "Land Revision" should probably provide such a service if we do pursue something here.


Each strategy has advantages and disadvantages, and many imply workflow limitations. For example, if we use branch names, users can not rename branches without breaking the relationship. Local notes are useless if the project uses a submit queue. Commit hashes are mostly useless if you rebase. "Differential Revision" is useless with immutable history. "Change-Id" is useless if you don't want to install a pre-commit hook and probably not particularly useful in the presence of squash merges.

We can use a combination of these approaches and may be able to achieve good results in many cases, but I'm generally unhappy with this approach in other cases where we use it. For example, arc diff with base rules approximately uses a similar approach to figure out which commit range you mean, but users are frequently confused about what arc diff will do and how to configure it to do what they want.

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:04 PM
scp added a subscriber: scp.
epriestley claimed this task.

This is somewhat implemented by D21315, which is headed to master soon. See also T13546 and T12876.

This is not enough on its own to make arc land always "just work", but in the simplest case where you have a stack or sequence of changes, arc land now lands any prefix of them in one command and leaves the remaining set in a state which will succeed when any prefix is landed later. It can also recover automatically from some sequencing ambiguities, particularly when you arc land X but commits related to already-closed Y are ancestors.

There are some remaining issues here, but this task is ancient and long and meandering so I'm going to close it in favor of whatever new issues ripen in the wake of these changes.