Page MenuHomePhabricator

Modernize "arc land" for Mercurial: bookmark-to-branch, branch-to-self, multiple heads
Open, LowPublic

Description

This is a followup to T3855, which discussed an array of different issues in Git and Mercurial.

The issues in Git were resolved in work related to T9657. Remaining issues are specific to Mercurial, and likely best resolved with a similar update.

arc land in Mercurial should support these operations (or have some other reasonable behavior when users attempt them):

  • landing from a bookmark to a branch.
  • landing from a branch to a bookmark.
  • landing from a bookmark to itself.
  • landing from a branch to itself.
  • handling situations where a branch has multiple heads.

Also, like the modern Git workflow, the error behavior should be to attempt to restore the original working copy state, even if that means discarding some work.

Event Timeline

epriestley raised the priority of this task from to Low.
epriestley updated the task description. (Show Details)
epriestley changed the edit policy from "All Users" to "Community (Project)".
epriestley added a project: Arcanist.
epriestley added a subscriber: epriestley.
efkan awarded a token.Jan 19 2016, 8:46 AM
efkan added a subscriber: efkan.
eadler removed a subscriber: eadler.Apr 30 2016, 2:12 PM
mwwade added a subscriber: mwwade.May 12 2016, 9:13 PM

For reference, I use this small script to allow myself to easily use arc land with bookmarks:

#!/bin/bash
hg book land --force
hg book master --rev default/default --force
arc land land --onto master
bar350 added a subscriber: bar350.Jul 21 2016, 3:44 PM
alexmv added a subscriber: alexmv.Feb 14 2017, 5:04 AM
gabe awarded a token.Feb 26 2017, 4:55 AM
gabe added a subscriber: gabe.
franjesus added a subscriber: franjesus.
epriestley moved this task from Backlog to arc land on the Mercurial board.Jan 5 2018, 6:55 PM
alexmv removed a subscriber: alexmv.Mar 31 2019, 10:17 PM
epriestley moved this task from Backlog to arc land on the Arcanist board.Jun 4 2020, 3:09 AM

After changes connected to T13546, the git workflow now looks like this:

$ arc land X Y Z --into-remote M --into N --onto-remote U --onto V --onto W

...which means:

  • Find all ancestors of X, Y, and Z which are not ancestors of N in remote M.
  • Group them by associated revision.
  • Order the groups topographically.
  • Merge each group into N, then push to V and W in remote U.
  • (Then, do some complicated followup steps.)

I'd like to generally make Mercurial follow this same template but I'm generally less familiar with it, and particularly with some of the recent changes like the evolve extension vs the mq extension, and which operations are provided by extensions vs which operations are core.

I think this generally looks like:


Fetch

  • Fetch N from M with hg pull -b <branch> <remote>. This assumes N is always a branch. If it is a bookmark, the -B flag must be used. You can't trick it with hg pull -b N -B N to mean "get N no matter what type of thing it is".

But:

  • In Mercurial, I think there is no equivalent of hg log <particular-remote>/ref. That is, when we hg pull -b N M, that doesn't just put N in the repository somewhere. We may end up with two N heads.
  • After this operation, how do we determine which head is the "remote" head and which head is the "local" head? Possibly hg log --rev 'head() and branch(N) and !public()'. But what if we have several paths, and there are several different branch heads across the paths? I don't see a way to identify which heads are in a particular remote except by testing each head with remote(..., path), which is an expensive network operation. However, this is rare and we can conceivably just abort if we know about multiple public heads of N after pulling it from M, or eat the network cost and slowly test each head.
  • What if N is a bookmark? When we hg pull -B <bookmark> <remote>, our local bookmark of the same name is destructively overwritten.

The Mercurial documentation says this:

If there is a divergence between shared remote and local bookmarks, Mercurial will mark the incoming bookmark either with the path alias (e.g. feature@alice), or with a number if the remote isn't listed in [paths] (e.g. feature@1, feature@2, etc.). This is new as of Mercurial 2.1.
https://www.mercurial-scm.org/wiki/Bookmarks

I can't immediately reproduce the circumstances for this under Mercurial 4.7: both hg pull -B N M and hg pull destroy my local bookmark N and replace it with the remote bookmark.

(Actually, I did get it to work. It doesn't seem to always work -- it may depend on whether the two positions are ancestors of one another or not? That is, "a divergence" is strict, not just "a difference".)

So this starts to look like:

  • Have the user tell us if N is a bookmark or branch, or try pulling it as each one and see which ones work.
  • To pull a branch: hg pull -b N M, then hg log --rev 'head() and branch(N) and public()' to find a superset of things we might have pulled, then if that list has more than one element run hg log --rev remote(...) on each one (but how do we determine that the revision is a head in the remote, not just present in the remote)? This is such a mess that we probably just give up instead.
  • To pull a bookmark: save the positions of bookmarks N and N@M, and N@<any number>. Run hg pull -B N M. Whichever saved bookmark is now in a different position is the rev we pulled. Use hg bookmark --force --rev <old> to put the bookmarks back.

So now we know the hash of the commit we're landing into and we have only mutated the working copy by possibly adding new heads of N, which I think we can't do much about.


Identify Commits / Expand Revisions

Happily, this is mostly the same in Mercurial and Git, I think. We might need to track whether each symbol is a branch, bookmark, or rev.


Merge

This is a bit spicy in Mercurial, particularly this:

$input = phutil_console_prompt(pht(
  "%s '%s' has %s %s(s) forking off of it that would be deleted ".
  "during a squash. Would you like to keep a non-squashed copy, rebase ".
  "them on top of '%s', or abort and deal with them yourself? ".
  "(k)eep, (r)ebase, (a)bort:",
  ucfirst($this->branchType),
  $this->branch,
  $alt_count,
  $this->branchType,
  $this->branch));

Uhh, I'll cross that bridge when we come to it. Seems like it's not a fundamental problem.

And this:

// When merging a bookmark branch to a master branch that
// hasn't changed since the fork, mercurial fails to merge.

Push

Hopefully kinda straightforward?


Cleanup

PHI45 mostly deals with cleanup issues. We currently hg strip the whole commit set out, but it's not clear we need to (users can GC later with hg strip --rev extinct(), maybe?). This apparently interacts poorly with hg evolve operations.

I don't have hg evolve locally but have some of the commands (hg amend, hg strip) from other extensions.

A possible fix here is to just stop running hg strip, perhaps.

I am dropping support for versions of Mercurial older than 2.1.1 (released in May, 2012). This is the first version that: (a) has phases and (b) does not exit 1 when hg pull succeeds but fetches no changes.

Today, arc land in Mercurial does not support the "merge" land strategy.

This strategy seems nontrivial to support, because Mercurial can not readily produce an empty merge commit as Git does with --no-ff. I suspect it can be supported in some form with some work.

But this means that only the history-mutating "squash" land strategy currently works under Mercurial. This is awkward because the other arc defaults under Mercurial adhere to "history.immutable = true" and do not mutate history.

Given this, I'm going to revisit how "history.immutable" works. Today, it generally controls four things:

  1. whether arc amend is willing to run;
  2. what we do with lint changes;
  3. the land strategy;
  4. policy when you run various commands in a dirty working copy.

In (1), if a user specifically invokes arc amend, I think amending history is a reasonable outcome.

In (2), I plan to review how lint application works anyway, and this could be controlled narrowly.

In (3), the land strategy can move to a separate piece of configuration reasonably (arc.land.strategy) which should also make it easier to understand.

In (4), I am generally tailoring dirty working copy behavior, and arc land no longer offers to amend dirty changes under any combination of options since this behavior seems obviously bad (it is very unlikely that these changes have been reviewed).

I am also going to make the default strategy "squash" under both Git and Mercurial. This was already the default strategy in Git, and already the only strategy in Mercurial.

I may actually make the minimum version whichever version has this fix:

https://bz.mercurial-scm.org/show_bug.cgi?id=3716

...which landed in November 2013.

Just extracting a few blocks of old code for reference:

Diverged Bookmarks
$divergedbookmark = $this->onto.'@'.$api->getBranchName();
if (strpos($err, $divergedbookmark) !== false) {
  throw new ArcanistUsageException(phutil_console_format(pht(
    "Local bookmark **%s** has diverged from the server's **%s** ".
    "(now labeled **%s**). Please resolve this divergence and run ".
    "'%s' again.",
    $this->onto,
    $this->onto,
    $divergedbookmark,
    'arc land')));

I've removed this check, but haven't put bookmark fetching back. My plan is to continue in this case and do what the user asked for (merge onto the divergent remote bookmark), under the expectation that this represents a case where you and someone else are sharing a remote bookmark and they've pushed some changes since your last update.

Local Ahead
if ($local_ahead_of_remote) {
  throw new ArcanistUsageException(pht(
    "Local %s '%s' is ahead of remote %s '%s', so landing a feature ".
    "%s would push additional changes. Push or reset the changes in '%s' ".
    "before running '%s'.",
    $this->ontoType,
    $this->onto,
    $this->ontoType,
    $this->ontoRemoteBranch,
    $this->ontoType,
    $this->onto,
    'arc land'));
}

I've removed this check because landing a branch onto itself is a valid operation and the local will be ahead of the remote. (This operation now works.)

This may need to be restored when you're landing branch or bookmark B onto A, and your local A has a divergent head from the remote A. However, in Git we can ignore this case, and I'd like to try to ignore it in Mercurial too.

This sequence creates a second local head of A, but that faithfully reflects the user's request when A is dirty and they ask us to land onto it.

Delete Bookmarks
if ($api->isBookmark($this->branch)) {
  $api->execxLocal('bookmark -d %s', $this->branch);
}

if ($this->getArgument('delete-remote')) {
  // named branches were closed as part of the earlier commit
  // so only worry about bookmarks
  if ($api->isBookmark($this->branch)) {
    $api->execxLocal(
      'push -B %s %s',
      $this->branch,
      $this->remote);
  }
}

I haven't actually tested this, but on the normal pathway where we prune/strip every change, I think the local part of this is unnecessary?

I'd like to get rid of the --delete-remote flag, but I think there's maybe an argument for making it the default behavior unless --keep-branches is specified, because local and remote bookmarks of the same name are closely related in Mercurial.

hg-svn
} else if ($this->isHgSvn) {
  // hg-svn doesn't support 'push -r', so we do a normal push
  // which hg-svn modifies to only push the current branch and
  // ancestors.
  $err = $api->execPassthru('push %s', $this->remote);
  $cmd = 'hg push';

I'm not sure anyone uses this anymore, and am planning to just drop support until someone complains. This stuff landed in 2013 and was quite possibly only ever used by Facebook.

git-svn
// See T13293, this depends on the options passed when cloning.
// On any error we return `trunk`, which was the previous default.

$repository_api = $this->getRepositoryAPI();
list($err, $refspec) = $repository_api->execManualLocal(
  'config svn-remote.svn.fetch');

if ($err) {
  return 'trunk';
}

$refspec = rtrim(substr($refspec, strrpos($refspec, ':') + 1));

$prefix = 'refs/remotes/';
if (substr($refspec, 0, strlen($prefix)) !== $prefix) {
  return 'trunk';
}

$refspec = substr($refspec, strlen($prefix));
return $refspec;

Uhh...

(I'm tentatively keeping git-svn support since I think it sees some use, but am not thrilled about it.)

When the user specifies arc land --onto marker, I'd like to identify whether marker is a bookmark or branch.

This seems difficult to do in a robust/efficient way. I can't find any command which means any of these things:

  • "Fetch marker from the remote, no matter what type of object that name identifies."
  • "List all markers in the remote."
  • "Test if a marker exists in the remote."

The actual primitives we seem to have are:

  • "Fetch marker of type T from the remote, failing if it does not exist, but also failing if there's some kind of issue with the remote." (hg pull [-b|-B])
  • "List all markers of type T in the remote that are ahead of the local state." (hg incoming [-B] remote)
  • "Test if a commit exists in the remote." (revset with remote(...)).

Since Mercurial has a significant per-command startup time, issuing extra commands slows things down a bit. I don't think I'm feeling ambitious enough to resurrect the arc command server. Still, I think I'm going to do this:

  • Save bookmark state.
  • Run the branch pull. If that works, the marker is a branch. This should be overwhelmingly the most common workflow.
  • Run the bookmark pull. If that works, the marker is a bookmark.
  • If both pulls failed, the remote is either bad or the marker does not exist.

When the marker does not exist, I'm just going to fail for now rather than creating a branch/bookmark (in Git, we create a branch). We can't tell which one the user wants, we might have just hit two network issues in a row, creating a bookmark in the remote isn't really recommended behavior.

(I haven't made it as far as hg topic so that might also throw another wrench in the gears.)

This is also made more complicated because Mercurial branches and bookmarks are all in a single shared namespace to some degree. I'm not sure what happens if you try to fetch a bookmark that conflicts with a local branch name (presumably: it renames it as "divergent"?) or try to fetch a branch which conflicts with a local bookmark name (presumably: fails in a way that's hard to detect?).

I'm wrong: it works, creating a bookmark and branch with the same name.

epriestley@orbital ~/scratch/hg-test-1 $ hg bookmarks
...
   vvv                       22:c23cabff5fcb
epriestley@orbital ~/scratch/hg-test-1 $ hg branches
...
vvv                           20:160c2c987682

Uh, cool? This makes some things easier and some things harder, I suppose.

As a tool developer, this kind of thing is infuriating:

$ hg bookmark mark@default
$

(Mercurial succeeds, creating a bookmark with the same name as the bookmark it would create during "divergence resolution".)

Better yet, pushing this bookmark to the remote appears to break the bookmark in the remote. Mercurial allows creation of "x@y" bookmarks but does not appear to be able to transfer them:

$ hg push -B mark@default default
...
exporting bookmark mark@default
$ hg pull -B mark@default default
...
abort: remote bookmark mark@default not found!
$ hg push -B mark@default default
...
abort: push failed:
'repository changed while pushing - please try again (bookmark "mark@default" set on a19a3fb9115b, expected missing)'

In fairness, my local version of Mercurial is a bit older, but it is still from many years after the introduction of bookmarks.


A simpler, clearer example of this kind of behavior is:

$ hg bookmark 'a ' # Note trailing space.
$

This works. It creates a bookmark with a different name (a) than the name which was provided (a ): the trailing space is discarded. Mercurial explicitly calls mark.strip() in bookmarks.checkformat() to discard to strip the whitespace.

Perhaps this makes hg slightly easier to use for normal users (but: how do they accidentally pass spaces through the shell?), but this kind of behavior is poisonous as a component in reliable systems.


Also relevant is that hg pull -B X may mutate multiple local bookmarks:

https://bz.mercurial-scm.org/show_bug.cgi?id=6221

I believe this is roughly the form we end up with, which is completely ridiculous and extremely difficult to test or have any confidence in:

$err = $this->newPassthru(
  'pull -b %s -- %s',
  $target->getRef(),
  $target->getRemote());
if (!$err) {
  // If this pull worked, the symbol is a bookmark.
  //
  // (1) If the local bookmark of the same name did not exist, or was
  // behind the remote bookmark: the local bookmark was updated.
  //
  // (2) If the local bookmark existed and was ahead of the remote
  // bookmark: the local bookmark was not updated.
  //
  // (3) If the local bookmark existed and was neither an ancestor
  // of the remote bookmark, the local bookmark named "markname@remotename"
  // was updated instead.
  //
  // (4) Note that the update in (3) may have been a no-op update if
  // the bookmark already pointed at the correct position.
  //
  // Get the new bookmark positions and compare them to the old bookmark
  // positions.
  //
  // If any bookmark position has changed, we're in cases (1) or (3).
  // In either case, our target is the position of the new bookmark.
  // Save it and reset the bookmark.
  //
  // If no bookmark position has changed, we're in case (2) or (4). Our
  // target is the last non-outgoing ancestor of the relevant bookmark.

  $new_bookmarks = $api->newMarkerRefQuery()
    ->withMarkerTypes(array(ArcanistMarkerRef::TYPE_BOOKMARK))
    ->execute();

  $old_bookmarks = mpull($old_bookmarks, null, 'getMarkerName');
  $new_bookmarks = mpull($new_bookmarks, null, 'getMarkerName');

  // NOTE: "hg pull -B X" may mutate an arbitrary number of local
  // bookmarks, so we always want to go through the entire list and put
  // all of the bookmarks back.

  $into_hash = null;
  foreach ($new_bookmarks as $marker_name => $new_bookmark) {
    $old_bookmark = idx($old_bookmarks, $marker_name);

    if (!$old_bookmark) {
      // No old bookmark of the same name exists. This is either a newly
      // created local "marker" or a newly created local "marker@remote".
      // Either way, use the marker position and delete the marker.

      $into_hash = $new_bookmark->getCommitHash();

      $api->execxLocal(
        'bookmark --delete %s --',
        $marker_name);

      break;
    }

    $old_hash = $old_bookmark->getCommitHash();
    $new_hash = $new_bookmark->getCommitHash();

    if ($old_hash !== $new_hash) {
      // This bookmark position has changed. This is a local bookmark
      // which was moved forward, or a local "marker@remote" which was
      // moved forward.

      // Either way, use the marker position and reset the marker.

      $into_hash = $new_hash;

      $api->execxLocal(
        'bookmark --force --rev %s %s --',
        $old_hash,
        $marker_name);

      break;
    }
  }

  if ($into_hash) {
    return $into_hash;
  }

  // We didn't find any bookmarks, we're ahead of the remote.

  $try_marks = array(
    $target->getRef().'@'.$target->getRemote(),
    $target->getRef(),
  );

  foreach ($try_marks as $try_mark) {
    if (!isset($new_bookmarks[$marker_name])) {
      continue;
    }

    $new_bookmark = $new_bookmarks[$marker_name];

    list($stdout) = $api->execxLocal(
      'log --rev %s --template %s --',
      hgsprintf(
        'last(ancestors(%s) and !outgoing(%s))',
        $new_bookmark->getCommitHash(),
        $target->getRemote()),
      '{node}');

    return trim($stdout);
  }

  throw new Exception(
    pht(
      'Pull of "%s" as a bookmark succeeded, but Arcanist failed to '.
      'identify the position of the bookmark after pulling it.',
      $target->getRef());
}

I'm going to try writing a Mercurial extension instead.

Also, this entire workflow is garbage because pushing into a remote bookmark may rewind changes in that bookmark:

$ hg incoming -B default
...
   shared-bookmark           ee5ee676b55c
$ hg push -B shared-bookmark default
...
updating bookmark shared-bookmark

That "destroyed a change" pushed by a colleague without confirmation or --force, etc. Phabricator currently can not flag these changes as "dangerous" because the Mercurial bookmark store is a completely separate system from the Mercurial branch store and they use different protocol commands.

Mercurial itself cautions against this outcome in the bookmarks documentation:

/!\ Be aware that there is only one namespace for bookmarks - it is advised to prefix your local-only bookmarks to avoid conflicts with other users.

I'm going to try writing a Mercurial extension instead.

This seems pretty convincing as a more promising path forward:

$ hg --config extensions.krangler=/Users/epriestley/dev/phabricator/support/hg/extensions/krangler.py krangle-remote default
[
    {
        "name": "default", 
        "node": "ee5ee676b55c8f8a9696f78c4c3fc20404425d02", 
        "type": "branch"
    }, 
    {
        "name": "default", 
        "node": "8b929ae6f1b1970ad0ec51cfcc5ba0500de164ea", 
        "type": "branch"
    }, 
    {
        "name": "vvv", 
        "node": "a19a3fb9115bb6ff1170f76d0c26d0b58d05c8ac", 
        "type": "bookmark"
    }, 
    {
        "name": "mark-G", 
        "node": "8b929ae6f1b1970ad0ec51cfcc5ba0500de164ea", 
        "type": "bookmark"
    }, 
    {
        "name": "mark1", 
        "node": "ef6d3363a96496362febde55eaa4295528d80542", 
        "type": "bookmark"
    }, 
    {
        "name": "shared-bookmark", 
        "node": "8b929ae6f1b1970ad0ec51cfcc5ba0500de164ea", 
        "type": "bookmark"
    }
]
$

I am considering reviving the command server since I don't remember it having any critical failures, but offhand:

  • we need to launch it with every extension we might need to run, and since a lot of modern Mercurial behavior is shipped in extensions and there's now an arc extension, this is a bit complex;
  • when we run a remote command (pull, push, arc-ls-remote) the user may be prompted for credentials; this seems like it's probably (?) difficult or impossible to implement via the command server and limits the number of things we could do with the command server.

Would the use of an extension make additional setup necessary for observed repositories vs. hosted repositories, or are observed repositories out of scope for these set of changes?

epriestley added a comment.EditedWed, Jun 10, 3:42 PM

I put the extension in phabricator/ earlier by accident, but that was just me goofing my local paths. The extension is purely client side: hg can do ls-remote just fine (it's just "do hg pull, but stop half way through"), there's just no UI command for it.

The actual extension is a 100-line Python script (visible in D21343) that just lives in arcanist/support/, does half an hg pull and stops before doing side effects, then dumps the result out in JSON. We run this monstrosity of a command to execute it:

$   hg --config 'extensions.arc-hg=/Users/epriestley/dev/core/lib/arcanist/support/hg/arc-hg.py' arc-ls-remote --output /private/var/folders/p3/9wkc1nw50292mwz0zsw3g7j40000gn/T/1snvhbkkp7hcgsoc/19111-lTNoV0 -- default

This works transparently and doesn't require anything to be installed anywhere, so it shouldn't impact hosted/observed/etc.

After D21344, the user only sees this in the actual arc output:

...so no one should be the wiser unless they go digging.

This presumes there are no Python or Mecurial versioning issues lurking on the horizon, but I've written as much as 300 lines of Python in my life so I can't see how anything I can't tackle could arise.

The real Pandora's box this opens is that now that arc is exposed to whatever issues arise with the raw Python/Mercurial API, there's a much weaker argument against not writing more extensions.

For instance, we currently find all local bookmarks and branch heads like this:

$ hg log --rev 'head()'
$ hg log --rev 'bookmark()'

This is inefficient (we pay for an extra python), but I think we can't search for --rev 'head() or bookmark()' because I can't figure out how to tell if a bookmarked commit is a head or not when it appears on the result list.

This could be more efficient with hg <... arc extension ...> log-heads-and-bookmarks.

I haven't made it to topics yet, but they probably strengthen the arguments for doing this work in extensions by adding a third kind of marker which I assume is as different from bookmarks as bookmarks are from branches, but this is speculative for now.

The actual extension is a 100-line Python script...
The real Pandora's box this opens...

The actual extension is now a 200-line Python script that lists remote state, local state, the current branch state, and the current working copy position.

The current ls-remote script as written does not actually work; I think it only returns branches in the remote that:

  • exist in the local repository; and
  • exist in the remote repository; and
  • where the remote is ahead of the local.

This is probably a terribly deep hole that will lead to arc being 50% Python.