Page MenuHomePhabricator

Update "arc diff" to amend non-head commits with Mercurial
ClosedPublic

Authored by cspeckmim on Jul 15 2021, 4:30 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 3:33 AM
Unknown Object (File)
Tue, Jan 21, 7:24 PM
Unknown Object (File)
Tue, Jan 21, 3:23 PM
Unknown Object (File)
Tue, Jan 21, 11:54 AM
Unknown Object (File)
Tue, Jan 21, 9:35 AM
Unknown Object (File)
Sat, Jan 18, 7:48 AM
Unknown Object (File)
Sat, Jan 18, 12:41 AM
Unknown Object (File)
Sun, Jan 12, 11:44 PM
Subscribers

Details

Summary

After arc diff creates a revision in Phabricator it amends the commit to include a link to the revision in the commit message. For Mercurial this is done with hg commit --amend --logfile however this will fail when trying to create a diff for a non-head commit.

This updates ArcanistMercurialAPI::amendCommit() to allow amending a non-head commit in two ways, depending on whether evolve is in use:

No evolve:

  1. Rebasing the current commit onto the current commit's parent, using the new commit message
  2. Rebasing all children + descendants of the current commit onto the new resulting commit
  3. Stripping the original commit

With evolve:

  1. Amend the commit with hg amend --logfile
  2. Run hg evolve to tidy up all commits
Test Plan

I created 6 commits in a row placing a bookmark at commits 2 bookmark1, 4 bookmark2, and 6 bookmark3, and ensured I had arc:bookmark in my base ruleset.

No evolve, non-head changeset:

  1. I verified I did not have evolve enabled by running hg debugextensions and did not see evolve in the listed active extensions.
  2. I updated to bookmark1 and modified a file to leave a dirty working state.
  3. I ran arc diff and when prompted to amend my changes I said "yes", and verified a phab revision was created properly.
  4. I checked the status of my repository and verified it was still linear and the bookmarks pointed to the proper commits.
  5. I ran hg log -r bookmark1 --template {desc} to view the full commit message and verified it contained both Summary: ... and Differential Revision: https://....
  6. I ran hg diff -c bookmark1 and verified the changes for that commit included the changes I made in step 2.

No evolve, head changeset:

  1. I updated to bookmark3 which is the head commit and modified a file to leave a dirty working state.
  2. I ran arc diff and when prompted to amend my changes I said "yes", and verified a phab revision was created properly.
  3. I checked the status of my repository and verified it was still linear and all the bookmarks pointed to the proper commits.
  4. I ran hg log -r bookmark3 --template {desc} to view the full commit message and verified it contained both Summary: ... and Differential Revision: https://....
  5. I ran hg diff -c bookmar3 and verified the changes for that commit included the changes I made in step 1.

With evolve:

  1. I enabled evolve and verified it was enabled by running hg debugextensions and saw evolve in the listed active extensions.
  2. I updated to bookmark2 and modified a file to leave a dirty working state.
  3. I ran arc diff and when prompted to amend my changes I said "yes", and verified a phab revision was created properly.
  4. I checked the status of my repository and verified it was still linear and all the bookmarks pointed to the proper commits.
  5. I ran hg log -r bookmark2 --template {desc} to view the full commit message and verfieid it contained both Summary: ... and Differential Revision: https://....
  6. I ran hg diff -c bookmark2 and verified the changes for that commit included the changes I made in step 2.

Diff Detail

Repository
rARC Arcanist
Branch
hgdiff
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 25467
Build 35202: Run Core Tests
Build 35201: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
src/repository/api/ArcanistMercurialAPI.php
685–688

If there are uncommitted changes then this will pick them up and amend, though amending the uncommitted changes does introduce the possibility of conflicts occurring during the evolve. Reading ArcanistGitAPI:amendCommit() it looks like if there will be conflicts then an error just throws up.

Clean up the logic/nesting by returning early where possible. Add TODOs for work to be done still.

cspeckmim added inline comments.
src/land/engine/ArcanistMercurialLandEngine.php
807

This bugged me after realizing I recorded the definitions different from the other places that do this

After arc diff creates a revision in Phabricator it amends the commit to include a link to the revision in the commit message.

Just for completeness for future readers: this behavior depends on history.immutable today, but arc would generally like to do this and Mercurial users broadly seem more open to treating local history as mutable today than they did in ~2011 (as does Mercurial itself, with changes like evolve).


If it's helpful, I think it's reasonable to treat "add working copy changes to an existing commit" and "update the message of a commit" as two different operations, even if they both run mostly the same code. I believe we only care about "add working copy changes" after we make them ourselves (e.g., by applying lint patches) and maybe there's an upfront prompt at the beginning.

I think we should never reach this method with a "bad" working copy state (that is, with any changes in the working copy which we don't want to combine into the commit we're amending), except that there may (?) be untracked files. If we can today, we will no longer be able to after arc diff updates to use ArcanistRepositoryLocalState. So the stashing should be unnecessary, I think -- but maybe you found a route to get here with a dirty working copy.

I also imagine updating arc diff to work like arc land does (update the revisions for a list of one or more groups of commits, i.e. possibly update multiple revisions if you arc diff change4 and have made local changes to local commits it depends on that are associated with other revisions). I'm not sure if that will actually happen given my actual velocity, but Git will likely need a similar update if it does (to reach into history and amend commits not in the working copy). But this seems like a straightforward step toward that, regardless of where things go.

If the non-evolve workflow is a pain, I think it's also reasonable for arc to exit when you try to do this and tell you to enable evolve or use a simpler workflow -- that's no worse than what it does today, right?

src/repository/api/ArcanistMercurialAPI.php
701

I think this is always safe today since $child can never be a branch name, etc, but should use hgsprintf() for robustness against possible future values of $child.

Just for completeness for future readers: this behavior depends on history.immutable today, but arc would generally like to do this and Mercurial users broadly seem more open to treating local history as mutable today than they did in ~2011 (as does Mercurial itself, with changes like evolve).

Would this be good to comment on ArcanistMercurialAPI::amendCommit()?

If it's helpful, I think it's reasonable to treat "add working copy changes to an existing commit" and "update the message of a commit" as two different operations, even if they both run mostly the same code. I believe we only care about "add working copy changes" after we make them ourselves (e.g., by applying lint patches) and maybe there's an upfront prompt at the beginning.

I think we should never reach this method with a "bad" working copy state (that is, with any changes in the working copy which we don't want to combine into the commit we're amending), except that there may (?) be untracked files. If we can today, we will no longer be able to after arc diff updates to use ArcanistRepositoryLocalState. So the stashing should be unnecessary, I think -- but maybe you found a route to get here with a dirty working copy.

I didn't come across this through testing, but after I put up these changes I remembered I needed to look at other callers to amendCommit() aside from just the one I was aiming to address with arc diff changing the message after creating the revision. When finding other call sites that don't change the message I realized that the only other reason to call that method (at least for Mercurial) is to amend with changes, e.g. arc lint. Even in the case where it's being called intentionally to put in changes that are intentional, if it's not a head revision there are still potential conflicts that could occur with descendant revisions.

If we can today, we will no longer be able to after arc diff updates to use ArcanistRepositoryLocalState.

I'm not catching on to what you're referring to. Are you saying there are other changes which will have already stashed unsaved state away before calling amendCommit()?

I also imagine updating arc diff to work like arc land does (update the revisions for a list of one or more groups of commits, i.e. possibly update multiple revisions if you arc diff change4 and have made local changes to local commits it depends on that are associated with other revisions). I'm not sure if that will actually happen given my actual velocity, but Git will likely need a similar update if it does (to reach into history and amend commits not in the working copy). But this seems like a straightforward step toward that, regardless of where things go.

I've been thinking about this too. It actually came up in discussions with colleagues today. We had someone who lost work because their process is: create diff, strip, patch down, make changes, update diff, strip, etc. When working with a set of dependent diffs they forgot to update every single revisions and were surprised to find out that when they stripped everything and re-patched that it was gone (and had conflicts applying all revisions because of that). To be clear this is not a process I have ever recommended or condoned, though apparently it's being recommended by several developers.

I think it's something I'd be interested in trying to implement though it might be ambitious for myself.

If the non-evolve workflow is a pain, I think it's also reasonable for arc to exit when you try to do this and tell you to enable evolve or use a simpler workflow -- that's no worse than what it does today, right?

Correct the fallback here would be the same existing behavior. If this becomes too tedious I will probably take this route, though I think one beneficial path that can be added is to still allow the non-head amend if it's only to change the commit message.

I'm not catching on to what you're referring to. Are you saying there are other changes which will have already stashed unsaved state away before calling amendCommit()?

Sort of: the current behavior of arc diff is to refuse to run with dirty/unsaved state. If you try it, you'll be told to revert/commit your changes or prompted to amend them into the current commit before continuing. This happens immediately, before we do any interesting work.

We must do this (guarantee the working copy isn't dirty) because even if arc diff could technically run in a dirty working copy, the output of arc unit isn't meaningful if you run tests against some state which isn't the state you're sending for review. If we continue with a dirty working copy and then produce some kind of result out of arc unit like "tests passed", we can't really claim that the tests actually pass against the code we're uploading, since we can't tell the unit test suite to use the state in commit abcd1234 instead of the actual working copy state. Running with a dirty working copy voids every kind of assertion or guarantee we're trying to make in operations that work against the working copy (arc lint, arc unit today; maybe arc rebuild-artifacts-after-rebase later) even if we're able to do it in a technical sense.

And it's super hard to lint and amend lint changes in a working copy with random other dirty change, since a lot of linters only accept working copy state as input rather than, e.g., data on stdin.

The possible exception to to this is that arc diff may let you run with untracked files. I don't remember what the current behavior is: historically, users want to run with untracked files. I think this is easily in the top 5 list of the worst things users want to do. Different versions of arc have permitted or prevented this to varying degrees and I don't remember what the current behavior is, but I think it's no weaker than "prompt to continue".

The most recent code for "shove all the working copy garbage into the corner before we do anything", ArcanistRepositoryLocalState, used in arc land, can completely shove everything into the corner, including untracked files (git stash push --include-untracked).

arc diff doesn't use this most-powerful effect (ArcnaistRepositoryLocalState) to sweep the working copy under the rug yet; it uses older code in requireCleanWorkingCopy(). It looks like the behavior may be to ignore untracked files with a dumb garbage flag, --allow-untracked, which dates from the long-ago before-times. That's definitely getting nuked if arc diff gets any work.

The desired behavior is that you can not enter any interesting workflow in arc diff if your working copy is dirty, so amendCommit() "should" be able to safely assume that any working-copy dirtiness has been made by arc, likely in arc lint, and is appropriate to amend into repository state. The actual behavior should pretty much be this today, just with some asterisks around how we actually make that guarantee and what happens with untracked files.

To be clear this is not a process I have ever recommended or condoned, though apparently it's being recommended by several developers.

At first glance, this workflow seems bad-with-no-advantages to me, but who knows. A popular Git flow is "stacked diffs" which I also think is bad-with-no-advantages. My workflow has been essentially unchanged for 10 years, except that arc now gets a more of it right automatically more often.

A lot of the forces around workflow selection are wholly opaque to me and I don't think I've ever really convinced anyone that my workflow is an improvement over whatever they're doing in many years of discussing things.

I don't think I've ever convinced anyone who likes keeping untracked files in their working copy that they shouldn't do that, except maybe one guy at Facebook who completely broke production by making an untracked-file error about 20 minutes after I tried to convince him that his workflow was bad and prone to those specific kinds of preventable error.

I don't think I've ever convinced anyone who likes using "git push" to "save changes" into the remote that it isn't a great idea.

I don't think I've ever convinced anyone who insists on running git rebase on every local change after every commit, and running arc diff after every git rebase, that there's no benefit to doing that.

I think one of the most popular extensions for Phabricator causes Jenkins to post hundreds of spam comments on every revision.

Companies routinely deploy stateful services on autoscaling containers and apparently have no capacity to operate even a minimally stateful service.

Happily, I no longer have to try to deal with balancing these concerns.

I think all arc diff flows where you're affecting more than one revision and that revision isn't exactly the current working copy state are bad, but imagine aligning behavior with arc land just for purity/consistency.

The desired behavior is that you can not enter any interesting workflow in arc diff if your working copy is dirty, so amendCommit() "should" be able to safely assume that any working-copy dirtiness has been made by arc, likely in arc lint, and is appropriate to amend into repository state.

Ah the stashing I'm thinking needs done is due to how (I think) amending a non-head changeset without evolve has to happen.

C
|
B
|
A* (changes from lint, want to amend)
|
master

Create a copy of A with the new commit message

C
|
B
|
A
|
A*  A'
|  /
o  master

Rebase B:: onto A'

    C'
    |
    B'
    |
A*  A'
|  /
o  master

And then strip A*

    C'
    |
    B'
    |
    A'
   /
o  master

Assuming any dirty changes are produced by arc and are cleared for amending, those changes will need to be amended to A' however the first step, rebasing to create a copy of A, will fail if the working directory isn't clean. I think I have to tuck away the lint changes until after the first rebase so they can be untucked and amended to A' while it's still a head.

Though drawing all that out I realize my current approach has the edge case where A is the first changeset in a repository.

At first glance, this workflow seems bad-with-no-advantages to me, but who knows. A popular Git flow is "stacked diffs" which I also think is bad-with-no-advantages. My workflow has been essentially unchanged for 10 years, except that arc now gets a more of it right automatically more often.

I think my workflow is very similar to yours - single continually-amended commit per conceptual/effectual change - and I'm guessing that may be why I haven't run into many issues. Despite writing up a KB outlining my entire process, configuration, etc. and trying to share it and refer to it when people run into issues, I still come across interesting workflows. I think the constant strip thing is either, developers who don't like having multiple local heads of development, or developers who only want to use arc for the nice commit messages (tbf really nice).

Are "stacked diffs" effectively dependent revisions? I usually avoid that scenario where possible but it does come up on occasions, usually to make code review easier by splitting a refactoring out from feature changes.

Ohhhhh, sorry, I didn't connect the dots and understand that you're only sometimes running the amend (since you can't amend for non-heads).

In Git, we'd always run amend first (creating A', with a new message that includes any dirty working copy changes), then rebase from there. You can't (or can't easily?) change messages with a git rebase -- and git is fine with you amending any commit anywhere in history, it just creates a new commit floating out in space -- so the "create A'" and "rebase children" steps would have to be separate.

You can probably stash/restore by using ArcanistMercurialLocalState, although I'm not 100% sure it's reentrancy-safe and I'd imagine arc diff may already be holding a LocalState object somewhere above this callsite in the future.

Are "stacked diffs" effectively dependent revisions?

They're the "Facebook" workflow described in the bottom half of the document: the good workflow (local dependent revisions), except you put everything on one branch named feature (a big stack of commits under a single label, corresponding to multiple revisions) and use git rebase -i (walk through history, stopping the working copy at arbitrary points) any time you need to update things. In the abstract, they're the same workflow, but the "stacked diffs" workflow is weirdly cumbersome and obtuse for no reason that's clear to me.

Another possible approach might be to copy the amend extension to create hg arc-amend which just lets you amend anything, assuming the "head only" rule is a safety guard rail rather than a fundamental issue with Mercurial (which it seems like it should be).

This is probably not a great path to walk down, but there are secret flags that disable the "cannot amend changeset with children" check:

$ hg commit --amend
abort: cannot amend changeset with children
$ hg --config experimental.evolution.allowunstable=True --config experimental.evolution.createmarkers=True commit --amend
1 new orphan changesets

It otherwise seems somewhat possible to call cmdutil.amend() and skip the check in an extension, although that also skips a bunch of other code and I'm not sure if the other code does anything.

Holy moly I'm not sure I would've found that out lol. My only guess is you've delved into the Mercurial source base (I've poked around a little)

You can probably stash/restore by using ArcanistMercurialLocalState, although I'm not 100% sure it's reentrancy-safe and I'd imagine arc diff may already be holding a LocalState object somewhere above this callsite in the future.

Might that manifest in ways like this?

Screen Shot 2021-07-15 at 10.59.31 PM.png (816×1 px, 135 KB)

I'll diff up the changes because it still works for the other cases

Attempt to stash unsaved changes before doing the rebase dance. Not currently in a working state.

src/repository/state/ArcanistMercurialLocalState.php
155–161 ↗(On Diff #51654)

The changes in this file are only here momentarily so I could test out the approach. I haven't thought through what a well-designed solution might be. I didn't want to pass through the workflow into the amendCommit() function but that is one way that it can create the state and give it a valid workflow. As for the public/protected I'm not sure what a good solution is -- aside from making new functions in ArcanistRepositoryLocalState to directly expose these ones.

I don't think LocalState should z̵̰̲͚͖̊͗́̓̓a̷̛̱̯̠͐̿̅̅͛̓͗ĺ̵̨̻̠͓̼̗̥̥̻̲̳̎͜ͅg̴̛͙͗̀̄̈́̃̂̓̄͝o̶̥̘̻͙̬͇͚̥̪͌̔̃͛̆̀̂̚ͅ your terminal!

Given the mess of dealing with the non-evolve case, what do you think about splitting this into a smaller "make this work under evolve" change and landing that now (which should solve everything for your org, since everyone has evolve?) since all that code looks good to me, and then maybe revisiting the non-evolve case if it other users hit it and/or diff gets some work and it's easier, and/or we figure out a reasonable way to hg force-amend or do some other less-tortured operation here.

I also think aiming toward "arc requires evolve to work with Mercurial" is a reasonable approach to consider.

src/repository/api/ArcanistMercurialAPI.php
670

(Missing word?)

My only guess is you've delved into the Mercurial source base (I've poked around a little)

(Yeah, I just went spelunking in the source code.)

Given the mess of dealing with the non-evolve case, what do you think about splitting this into a smaller "make this work under evolve" change and landing that now (which should solve everything for your org, since everyone has evolve?) since all that code looks good to me, and then maybe revisiting the non-evolve case if it other users hit it and/or diff gets some work and it's easier, and/or we figure out a reasonable way to hg force-amend or do some other less-tortured operation here.

Unfortunately not everyone at my org uses evolve, in fact most don't. I've never gotten a solid reason for why but I suspect it's just "new" and frightening. I started looking at this change initially because someone ran into an error when trying to arc diff a non-head revision. I'll try to investigate what's happening - it totally hoses my console and I can't scroll up to view the history when using --trace so I'll have to pipe the output to a file or something. If I can't figure it out I might just throw an error stating that evolve is required here when trying to amend changes to a non-head commit. I'll extract out the non-evolve-local-changes amend to a separate function.

I also think aiming toward "arc requires evolve to work with Mercurial" is a reasonable approach to consider.

I'm on board lol. It's probably a little ways off -- evolve still does not officially ship with Mercurial.

Sounds good. I spent 15 seconds on trying to write hg arc-amend, and this does something and does appear to produce an amended commit that looks ballpark-correct (and can amend non-heads), but emits some warnings and I have no clue if it functions across Mercurial versions:

diff --git a/support/hg/arc-hg.py b/support/hg/arc-hg.py
index 01ac3907..82202767 100644
--- a/support/hg/arc-hg.py
+++ b/support/hg/arc-hg.py
@@ -28,6 +28,12 @@ _ = i18n._
 cmdtable = {}
 command = registrar.command(cmdtable)
 
+@command(
+  b'arc-amend')
+def amend(ui, repo, source=None, **opts):
+  cmdutil.amend(ui, repo, repo[b'.'], {}, [], opts)
+  return 0
+
 @command(
   b'arc-ls-markers',
   [(b'', b'output', b'',

Still, it seems possible that this is the least-messy pathway for non-evolve support since we need the extension anyway and all the "can't amend non-head" stuff seems advisory rather than technical, given that flags exist to disable the checks and the low-level call seems to produce the desired repository state.

Added a workflow argument to the amendCommit() path so it can be set on the local state for logging purposes.
Fixed the issue with the terminal, Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!

cspeckmim added inline comments.
src/repository/api/ArcanistMercurialAPI.php
729–733

This is what was blowing up before -- I wasn't specifying a commit message so this was launching my text editor then happily moving along to the next part of the dance.

771–775

There's an issue here and I'm trying to investigate the problem but I got a little lost. The saveStash() call ends up throwing an exception due to a null workflow.

EXCEPTION: (Error) Call to a member function getLogEngine() on null at [<arcanist>/src/workflow/ArcanistWorkflow.php:248]
  #0 ArcanistWorkflow::getLogEngine() called at [<arcanist>/src/repository/state/ArcanistMercurialLocalState.php:157]
  #1 ArcanistMercurialLocalState::saveStash() called at [<arcanist>/src/repository/api/ArcanistMercurialAPI.php:777]
  #2 ArcanistMercurialAPI::amendNonHeadCommit(ArcanistDiffWorkflow, array, array, TempFile) called at [<arcanist>/src/repository/api/ArcanistMercurialAPI.php:724]

However I've confirmed that $workflow isn't null and is an instance of ArcanistWorkflow, so I'm not really sure why this is a problem. Any ideas? I confirmed this rebase dance works back before making more appropriate changes to the local state objects, so this only cropped up after trying to set a proper workflow.

cspeckmim marked an inline comment as done.

Fix some typos and comments before @epriestley spots them

Sounds good. I spent 15 seconds on trying to write hg arc-amend, and this does something and does appear to produce an amended commit that looks ballpark-correct (and can amend non-heads), but emits some warnings and I have no clue if it functions across Mercurial versions:

diff --git a/support/hg/arc-hg.py b/support/hg/arc-hg.py
index 01ac3907..82202767 100644
--- a/support/hg/arc-hg.py
+++ b/support/hg/arc-hg.py
@@ -28,6 +28,12 @@ _ = i18n._
 cmdtable = {}
 command = registrar.command(cmdtable)
 
+@command(
+  b'arc-amend')
+def amend(ui, repo, source=None, **opts):
+  cmdutil.amend(ui, repo, repo[b'.'], {}, [], opts)
+  return 0
+
 @command(
   b'arc-ls-markers',
   [(b'', b'output', b'',

Still, it seems possible that this is the least-messy pathway for non-evolve support since we need the extension anyway and all the "can't amend non-head" stuff seems advisory rather than technical, given that flags exist to disable the checks and the low-level call seems to produce the desired repository state.

I’ll do some experiments with this

Use arc-amend instead of a long Mercurial dance

I went back to test this with Mercurial 4.0 just as a test and the extension file fails to load properly. I traced the issue down to this issue
https://stackoverflow.com/questions/52117289/simple-mercurial-extension-fails-to-import

I added some modification to the extension to get this working

cmdtable = {}
if "command" in registrar:
  command = registrar.command(cmdtable)
else:
  command = cmdutil.command(cmdtable)

But then ran into the same issue with remoteopts used by arc-ls-markers which I think was changed around the same time so I made a similar modification

if "remoteopts" in cmdutil:
  remoteopts = cmdutil.remoteopts
else:
  remoteopts = commands.remoteopts

But then I got another issue which I wasn't sure was worth continuing down this path, argument of type 'module' is not iterable -- nothing immediately stood out.

Since modern Arcanist usage with Mercurial has required the arc-ls-markers for a while now and nobody else has reported issues, I think it's fine to not try to address these issues for older versions.

support/hg/arc-hg.py
24

Oops this isn't necessary, was doing some testing

132–137

I wonder if this should inversely check for b'' keys and insert them as string keys as well, for the Python 2 v 3 situation.

Remove unused import from arc-hg.py

src/repository/api/ArcanistMercurialAPI.php
876–885

Just noting that I've been seeing this take ~2.75s on my system and realized it's because of outgoing() which has to exchange info with the server to determine what changesets are outgoing. Changing this to draft() results in this finishing in ~200ms however draft() will not return the same results in some edge cases (pulling public commits from a secondary remote and wanting to include them in a diff?).

I'm doing testing from my home against a server running in the office which is why the delay is more noticeable. Probably the best workaround is using a custom rule

.hgrc
[revsetalias]
arcbase = limit(sort( (ancestors(.) and bookmark() - .) or (ancestors(.) - draft()), -rev), 1)
$ arc set-config base "hg:arcbase"

Updating arc-amend to support some older versions of Mercurial

support/hg/arc-hg.py
29–42

This will allow the extension module to load as far back as Mercurial 3.5, possibly earlier. I tested out a basic arc-ls-markers invocation and that seemed to work properly in 3.5 with this change

90–91

Comment slightly outdated, we're not removing entries with empty values -- older Mercurial seems to depend on the entries being present despite being empty but newer Mercurial doesn't care

105–120

I took this from here
https://phab.mercurial-scm.org/rHGa39dce4a76b87b4621eeca11265e5c9b445a8405#change-6QwjPg8DnkEX

However it ended up squashing in addition to the amending which will cause problems

src/repository/api/ArcanistMercurialAPI.php
876–885

Oof this doesn't actually work. Using HGPLAIN=1 seems to disable configured aliases and the revset is too complex to make it through arc's parsing and mercurial's.

epriestley added inline comments.
src/repository/api/ArcanistMercurialAPI.php
705–715

This may not be worth dealing with just to get away from looking for English-language text in stderr, but it looks like we can trick Mercurial by passing --date now, provided the commit was made at least one second ago. But, obviously, this changes the commit date, which is not faithful to the amend semantics of Mercurial.

My local Mercurial is smart enough to consider it an error ("nothing changed") if we pass --date <the current commit date>, i.e. passing the flag doesn't skip the no-op check, so we can't cheat that way.

I think this is at best a step sideways (rather than forwards) and not worth pursuing, but I would eventually like to find ways around all stderr-English-text-matching.

758–759

Can we descendants($current) - $current or similar?

876–885

I imagine "base" rules getting a rewrite in the future. Ideally, I'd like to get rid of them completely, use sensible default (I think draft() is a perfectly sensible default in Mercurial) and expect anyone with a bizarre edge case (like Mercurial with multiple remotes) to just put that logic in a helper script and have the helper script provide the base commit to arc diff using something outside the scope of arc, like shell aliases.

src/repository/state/ArcanistRepositoryLocalState.php
199–215

This is neither here nor there, but my imagined "correct" use of this API is to treat the "$ref" as opaque and always return the "$ref" you were handed under every VCS. Git happens to ignore it today but may not in some future version of the API, i.e. this use of the API would be "incorrect" but is arguably somewhat-supported by the comments:

$ref = $git->saveStash();
unset($ref); // Just throw the ref away, Git always returns true.

// ...

$git->restoreStash(true); // Just pass true, Git doesn't actually use this.

I think this is really a bigger issue ("who is the audience for comments?") and I don't feel strongly about this, but I suspect if I was convinced that comments were worth getting right and tried to develop a guide for them, I'd probably end up with guidance like "don't put implementation details in a doc comment far away from the implementation?" and maybe "instead, put them inline next to the implementation" perhaps with a side of "annotate them in such-and-such a way and the documentation generator will extract them".

This revision is now accepted and ready to land.Jul 19 2021, 5:12 PM

Since modern Arcanist usage with Mercurial has required the arc-ls-markers for a while now and nobody else has reported issues, I think it's fine to not try to address these issues for older versions.

(Agreed on this.)

src/repository/api/ArcanistMercurialAPI.php
705–715

Yea this unexpected error when it's not really an error is kinda messy. Though I was wondering if maybe we wouldn't be likely to hit this now that above it's checking that we're amending no file changes and the message hasn't changed. I hadn't looked into whether the amend has the same behavior for different message vs. same message.

758–759

Ooh I forgot we could do something like that. I'll try it out, though as I wrote this comment it was optimistic that Mercurial could rebase multiple branches at once.

src/repository/state/ArcanistRepositoryLocalState.php
199–215

I agree with $ref being opaque and will update the comments here. At the time I think was more keen to validate my own understanding by writing it out without second-guessing whether it made sense to put the implementation-level details at this abstract level.

src/repository/api/ArcanistMercurialAPI.php
758–759

From a quick test (I think I set this up the same) Mercurial seems to complain

abort: source and destination form a cycle

I checked using the same revset with log and it's not including $current, so maybe Mercurial just can't handle this? Seems a little odd though. I'll update the comment to mention this though.

src/repository/state/ArcanistRepositoryLocalState.php
199–215

Ah though one point about this is that it does impact how Git repositories interract with this API. Someone following this API currently might assume that the following is valid, however it turns out to work properly with Mercurial but not in Git.

save -> ref1
save -> ref2
restore(ref1)
restore(ref2)

It might not cause an error but the behavior would be unexpected. I think that was my original intention behind the first note on saveStash(). I'll clean up some of the details about what specifically is being returned but I think it would be useful to retain some notice about Git's behavior at least until (or if) the implementation supports that.

We could conceivably prevent that error by making GitLocalState keep a trivial stack (just a stack of true) and return its current depth (0, 1, 2, ...), then throw if the passed $ref was not the current stack depth.

This behavior is ultimately caused by a limitation of Git 2.21.1 -- we'd prefer to do what Mercurial does, but can't in older Git:

// NOTE: We'd prefer to "git stash create" here, because using "push"
// and "pop" means we're affecting the stash list as a side effect.

// However, under Git 2.21.1, "git stash create" exits with no output,
// no error, and no effect if the working copy contains only untracked
// files. For now, accept mutations to the stash list.

So this could be prevented more robustly by using git stash create in recent git (and returning a meaningful $ref like in Mercurial) and a synthetic stack-based reference in older Git.

I think I figured the chance anyone would save and restore local states in a non-stacked way was smaller than the cost of all this complexity.

(I'm fine with whatever level of guard rail you want to put on it, though.)

Update comments, include an internal stack in ArcanistGitLocalState for tracking the order of save/restore

I really like that idea. I generalized the comments on the methods and then added a stack to GitLocalState. I ran two basic tests for this, running arc diff with uncommited changes and running arc land with uncommitted changes. For arc diff it properly amended the uncommitted changes into my diff (unsure if stash is used here), and for arc land it properly stashed my change during the land and restored it afterwards.

If the git changes are too much I can pull those out- I think a longer-term solution will be to do version-checking for whether git stash create is supported and use that instead but I can do that in a separate change.

Revert changes to GitLocalState

cspeckmim added inline comments.
src/repository/api/ArcanistMercurialAPI.php
705–715

From a quick test mercurial does not report this error when only the message changes. With the new check above to return early if neither files nor message changed then this is probably fine to remove, from this site at least.

758–759

Ah there is a way to do this in a single command utilizing --source multiple times as that argument specifically means the specified changeset and its descendants

hg rebase --dest $new_commit --source $child1 --source $child2 --source $child3 --
src/repository/state/ArcanistRepositoryLocalState.php
199–215

I think I'll make a new diff which updates GitBinaryAnalyzer to check for this better stash usage, and if not uses an internal stack to ensure that we don't accidentally run a restore out of order

Remove the stdout text matching, I confirmed that using arc diff with uncommitted changes on a head changeset properly amended the change into the changeset.

Updated the rebasing of child branches to happen in a single Mercurial command, I confirmed by having multiple separate branches based on a commit belonging to a diff then using arc diff to update the parent commit with modified file changes. The resulting working state had all the child branches on the new amended commit.

Adding some feedback regarding these changes (I should’ve made a task for this first). Last week I had two developers separately approach me with issues they ran into with arc diff. I had them upgrade to the master branch and this resolved their issues.