Page MenuHomePhabricator

Modernize the "arc land" workflow
Open, NormalPublic

Description

See T13490.

See T12876. arc land currently can not build a completely empty commit to squash-merge onto. This sequence should be possible, just difficult.

See PHI1013, which would like a way to specify that the branch you want to merge into might be different from the name of the branch you want to push into the remote.

See PHI1727, where a user found the prompt "Revision X has not been accepted by reviewers. Land revision in the wrong state?" insufficient to indicate that the revision was in the wrong state when the revision state was "Closed".

See T13547 for guidance on changes.

See PHI1646, which was a request for lower Git upstream precedence.

See PHI1384, via T13380.

Revisions and Commits

rARC Arcanist
D21392
D21392
D21390
D21390
D21380
D21379
D21367
D21366
D21365
D21364
D21363
D21378
D21377
D21376
D21375
D21374
D21372
D21371
D21360
D21355
D21354
D21353
D21352
D21350
D21349
D21351
D21348
D21347
D21343
D21342
D21345
D21344
D21341
D21340
D21339
D21337
D21336
D21335
D21333
D21338
D21334
D21332
D21331
D21330
D21329
D21328
D21325
D21326
D21324
D21319
D21318
D21322
D21320
D21321
D21315
D21316
D21317
D21313
D21314
D21312
D21311
D21310
D21308
D21309
D21307
rP Phabricator
D21373
D21346

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The major issue I'm trying to deal with now is that if you have a tree like this:

master > A > B > C

...and you arc land C, we'll land A, B, and C. However, if A, B, or C are not the most up-to-date version of those changes, we might land older code than the user expects.

In the case where the user explicitly typed arc land C and older code lands, I think this is not terribly bad: they at least very clearly told us to do something and we did exactly what they asked. Still, in all cases we can compare the set of changes we are planning to land to the last diffed set of changes and raise some basic warnings (notably, if the revision has a different set of changes and was updated more recently than the local changes were committed). This is straightforward.

The less straightforward case is when A or B is out of date, but a newer version exists somewhere else in the working copy. We'd like to find this A[2] or B[2] and land that instead. Often, this looks like starting here:

o master
 \
  \
   o A
    \
     \
      o B

...then you git checkout A and commit a fix:

o master
 \
  \
   o --- * A
    \
     \
      o B

..or git checkout A and amend a fix:

o master
 \---\
  \   \
   x   * A
    \
     \
      o B

In either case, you arc land B. In the diagrams above, the desired behavior is to include the commits marked * and exclude the commits marked x.

Finding these commits is challenging in the general case. We can only really expect that:

Reachable: They're reachable from some repository marker.

Although it's possible to update A on a detached head and save the hash on a sticky note, I think it's unreasonable for users to expect we'll find it. We can reasonably say we'll only look for commits that are ancestors of some marker (branch or bookmark).

Authored after A: Any commits we find should have an author date later than the earliest author date of any commit attached to the revision.

I don't think we can make any stronger claims than this in the general case, because users may have mutated or rebased their working copy; may push anything to any remote at any time, etc. Signals like commit date, author date, presence in remotes, Mercurial draft phase, git upstreams, etc., can not be used to exclude markers or commits in the general case.

But this means we have to search back from every marker in the repository, and can't stop our search until we (a) find a single GCA or (b) all active cursors have author dates older than the oldest author date of A.

In the general case, we can't even cache this work. If we examine the history of commit X and find it has no revision at time T, then the user runs arc land again later, there is no guarantee that X isn't mapped to a revision at time T + 1: the user may run arc diff between the two invocations of arc land.

Very modern Mercurial can trace the evolution of change A -> A', but I believe evolve is not widely deployed. Git can trace this evolution only through the reflog, which is unreliable and transient. Arc could trace this evolution, but only by wrapping every VCS command and forbidding users from using git or hg.

After we find A in history, we can ask Phabricator if it knows of a newer A, then look for that newer A. But this is not reliable in the general case because the user may have local updates to A that they haven't sent for review yet.

We can let users configure marker patterns for markers which never contain newer changes. I dislike this because it adds more configuration, the configuration is complex, there's a "right" and "wrong" value for it, and I suspect users may believe that arc should be able to figure it out even though I think it can not.

So I think the logic looks like this, mostly similar to today:

  • Log from the symbols to the GCA with the into commit.
  • Partition the graph into sets.
  • Identify "collateral damage" revisions which we're also landing.

The new piece is:

  • For each collateral revision:
    • Log backward from every head.
    • Stop if we reach a single GCA.
    • Stop if all cursors have an author date before the first known date of the revision.
    • Map commits in this graph to revisions (this is potentially very slow, since the graph may be very large).
    • If the graph contains commits associated with A which are not the original commits:
      • Partition the graph.
      • For each "A" partition:
        • Find all heads within the partition.
      • If there are multiple heads across all the "A" partitions:
        • If exactly one head has a marker, ask the user to confirm that?
        • Ask the user to confirm the newest head?
        • Ask the user to confirm the other head, if there are only two heads?
        • Give up and suggest --pick?

This is fairly awful, but probably not impossible.

A related issue is that it is difficult to identify the set of "published" hashes in the general case. We would like arc branches to show all unpublished history, but stop at published history.

This is difficult because users routinely push temporary changes into remotes. Git upstreams are not reliable (many users set upstream branches to temporary branches to make git push easier to use) and hg outgoing is likewise not reliable.

When we recognize a remote as a Phabricator repository, we can safely use any refs in that repository which are configured as permanent refs: the install has told us they are permanent. But things become difficult beyond this. Without a way to do this, it's hard to make the arc branches tree work even plausibly well in many cases, although perhaps I can just degrade into skipping revision resolution beyond some depth.

See PHI1807. At time of writing, arc land can delete the local master if you land it onto itself. This isn't a big deal (it gives you the command to get it back), but not intended and undesirable. Although it isn't recommended, arc land is supposed to support working in master and landing master into itself.

arc has three phases of local branch updates. The first two may run incrementally (with --incremental); the third is a finalization step which runs at the end.

  • (incremental) "cascade" updates, rebasing branches which descend from the range which was just pushed;
  • (incremental) "prune" updates, deleting local branches which are no longer relevant; and
  • (final) a "reconcile" update, where we figure out what state we should leave the user.

Currently, the "cascade" step skips branches which point directly at the heads of ranges which landed, with the expectation that we will "update or delete them later".

The "prune" step then deletes all branches which point at landed heads.

The "prune" behavior is likely fine if the "cascade" step updates these branches.

The "reconcile" behavior is likely fine too, although it may need a messaging change and a short-circuit in this case.


So when do we retain branches?

I think we can't know for sure in all cases. When a user runs arc land X, they sometimes would like X to be deleted and sometimes would like X to be retained, and I think we can not deduce intent purely from repository state in the general case.

A small subset of users use git checkout -b feature1 origin/release-1.2.3 (so their local feature branch may have no local ancestors) and then possibly set the branch upstream to point somewhere else. This looks like a "master" and quacks like a "master", but is not a "master".

A practical rule for retaining a local branch is probably just:

  • retain local branch X (which would otherwise be obsoleted) if it has the same name as the into branch.

This is straightforward, at least.