Page MenuHomePhabricator

An assortment of fixes and updates to using arc-land with mercurial
ClosedPublic

Authored by cspeckmim on Jul 7 2021, 8:53 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Mar 17, 4:30 AM
Unknown Object (File)
Sun, Mar 17, 4:30 AM
Unknown Object (File)
Sun, Mar 17, 4:26 AM
Unknown Object (File)
Sun, Mar 17, 4:26 AM
Unknown Object (File)
Sun, Mar 17, 4:26 AM
Unknown Object (File)
Sun, Mar 17, 4:26 AM
Unknown Object (File)
Sun, Mar 17, 4:26 AM
Unknown Object (File)
Sun, Mar 17, 4:26 AM
Subscribers

Details

Summary

Refs T13546

Behavior Changes

  1. Currently, after landing the --onto bookmark will not be advanced to the newly pushed commit(s)
    • This updates so that after pushing commits upstream, onto-marker bookmarks are pulled back from the server to retrieve their updated state
  2. Currently, after landing the working directory will typically be the old --onto commit
    • This updates the behavior in the following ways
      1. If the starting working state is a commit that is part of a revision being landed, the post-land state will be the newly published commit for that revision
      2. If the starting working state is a commit for a tip revision being landed and there is only a single --onto target, the post-land state will be the newly published commit and the --onto bookmark will be activated, if applicable. If there are multiple --onto targets defined (uncommon) then the resulting working state will be the same behavior as before, as the desired behavior for this case is ambiguous.
      3. If the starting working state is a commit that is not part of any revision being landed, then the post-land state will be that same commit, activating the same previous bookmark if applicable.

Bugs Fixed

  1. When landing a diff that includes multiple commits, where the non-tip commit adds a file and later commit modifies that file, the land process would fail during rebasing.
  2. When landing, the display of what commits are going to be landed only includes the commit hash but should include part of the commit message for easy identification.
  3. If errors occur during a rebase while landing, the resulting state is an in-progress rebase. Users will typically not be able to resolve this in-progress rebase and possibly confuse them further as they have to first abort the rebase before trying to clean up. These rebases will now be aborted by arcanist before exiting.
  4. If using evolve, landing a non-tip revision would leave behind orphaned commits.
Test Plan

Bookmark test

  • I created a diff with bookmark test, as a branch from before the master commit
  • I updated working state to activate test
  • I landed that diff
  • I verified that the resulting working state was on the newly published master commit

Non-bookmark test

  • I created a diff with bookmark test, as a branch from before the master commit
  • I updated working state to be the new commit, but not activate test even though it points to that same commit
  • I landed that diff
  • I verified that the resulting working state was on the newly published master commit

Not-working state test

  • I created a diff with bookmark test, as a branch from before the master commit
  • I updated working state to an older commit not related to the one being landed
  • I landed that diff
  • I verified the ending resulting working state was on the older commit I was on prior to landing

Multiple commits test

  • Using test bookmark I created a commit that added a new file, made a diff, made another new commit on top of it which modified that file, then updated the diff
  • I updated my working state to the first commit for the diff (non-head)
  • I ran arc land test to land the diff
  • I verified that the resulting working state was on the newly published master commit

Landing while on master

  • I created a diff with a bookmark test, as a branch from before the master commit
  • I updated working state to the master bookmark
  • I landed that diff
  • I verified that the resulting working state was on the newly published master commit

Landing with no local master bookmark

  • I created a diff consisting of two commits with bookmark test, as a branch from before the master commit
  • I left working state on test and deleted my local master bookmark
  • I landed that diff
  • I verified that the resulting working state was on the newly published master commit and the master bookmark was pulled and active

Cascading diffs while on an earlier diff changeset

  • I setup two diffs, each consisting of two commits, one depending on the other
  • I put the working state onto a commit from the earlier diff
  • I landed the later diff
  • I verified that the resulting working state was on the newly published commit associated with the earlier diff, which is not the new master commit so there was no active bookmark

Cascading diffs while on the later diff changeset

  • I setup two diffs, each consisting of two commits, one depending on the other
  • I put the working state into a commit from the later diff
  • I landed the later diff
  • I verified that the resulting working state was on the newly published commit for the later diff which was master and master was active

Landing a non-tip revision, with evolve

  • I setup three diffs, each consisting of two commits, creating a dependency chain
  • I verified I had the evolve extension enabled via hg debugextensions
  • I put the working state on a commit in the first revision
  • I landed the second revision
  • I verified that the resulting working state was clean, that revisions 1 and 2 were properly landed and published, and the resulting working state was on the published commit for the first revision.

Landing a non-tip revision, without evolve

  • I setup three diffs, each consisting of two commits, creating a dependency chain
  • I verified I did not have the evolve extension enabled via hg debugextensions
  • I put the working state on a commit in the first revision
  • I landed the second revision
  • I verified that the resulting working state was clean, that revisions 1 and 2 were properly landed and published, and the resulting working state was on the published commit for the first revision.

Landing a non-tip revision, while the non-landing revision is active

  • I setup three diffs, each consisting of two commits, creating a dependency chain
  • I put the working state on the latest tip revision commit and its bookmark
  • I landed the second revision
  • I verified that revisions 1 and 2 were properly landed and published, and the resulting working state was still on the latest tip revision commit and its bookmark was active.

Diff Detail

Repository
rARC Arcanist
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

reconcileLocalState() is another possible place to put the post-push update logic -- that runs exactly once, and a little later.

Thank you for the tips.

A possible approach would be to try to put the "failure" commands into ArcanistMercurialLocalState. I think that would look like snapshotting every bookmark (not just the active one), and then resetting all bookmarks when restoring state.

I had a vague notion of doing something like this but the details are still forming.

I'm not really sure if this would be better -- your approach seems fairly reasonable to me, and has the advantage of doing all the bookmark management pretty narrowly in one place, and this should be the only place that this work needs to happen. Offhand, I think I favor the approach you've taken here.

I think I'll continue with this approach for now but I'm wondering if I'll have to capture the initial state of bookmarks to restore the working directory, which might end up doing this partially.

Minor style quibble, but the logic might end up a little cleaner if you sort the bookmarks by outcome first (delete or pull), then build commands to do those things separately.

I'll give that bit a little more love. Originally I had two separate loops but then I realized I'd need to split pass/fail commands separately and at the same time tried to batch everything to avoid multiple hg pull commands. Also I think it was past my bedtime.

As for updating the working directory I think that change is going to be a bit more involved but I'm not sure if that's something appropriate to be done in pushChange() or that should happen higher up in ArcanistLandEngine::exeute() by adding some other functions to be overridden by the different engines or similar.

I think cascadeState() or something similar is possibly (?) the right place to do this. The distinction is that if users run arc land --incremental <many changes>, arc will do this internally:

foreach ($change) {
  push($change);
}
cascadeState(); // <-- this is a bit simplified

If you need to do your updates after each push, putting them inside push() is appropriate. If you can do them after doing all the pushes (and I think you can?), they should likely go in cascadeState() or some other new called-from-the-parent step like reconcileAllOtherNonCascadingStateNowPlease() that is a no-op in Git and does this bookmark reconciliation in Mercurial.

I was looking through cascadeState() but I wasn't entirely sure what that is doing. Does that roughly mean "sync the local state to match the server" or is that what reconcileLocalState() is doing?

Hmm my thinking now is that the current working directory behavior probably is better if things come up during --incremental iterations that require manual intervention, so I'll look at updating the state after all pushes occur. I think this makes sense that I'd have to acquire bookmark/working-directory state at the beginning of execute() (maybe in one of the function overrides that captures the local state, I don't have the code in front of me right now) and then updates the working directory at the end (in one of the cascade/reconcile) only if everything succeeds. I'm not 100% sure about this though. In the case of failure there's probably a different state that should be put into based on which push failed.

As another possible simplification, the only reason for newPushCommands() is to make it easy for --hold to tell users what to do. You could perhaps rewrite --hold to give the user some simplified set of commands to use instead, or just some general guidance ("you may push your changes using the correct Mercurial push commands, good luck") to avoid needing to have newPushCommands() return an enormous amount of state in an increasingly long list of commands. Phabricator has very few Mercurial users and I'd be comfortable with very minimal guidance if that makes the rest of your implementation easier. I think almost no one is running --hold with a super complex argument list and it's fine if --hold doesn't tell you how to reconcile 9 bookmarks or whatever.

Ah I'll think about that. We currently use --hold primarily for an oddball process we have for ensuring changes go into multiple clones of a repository in a specific order, but it's never used with other options. We've regularly discussed changing this to be a more-standard process that doesn't require this but our current one has served us pretty well and has scaled reasonably.

"Cascading" is updating all local refs which depend on refs that were selected-and-modified by the land operation. So if you have dependent features in several dependent branches named change1 -> change2 -> change3 locally, and arc land change2, we want to (in the general case) rebase your local change3 onto the landed version of change2 so that it no longer has out-of-date ancestors.

This is called "cascading" because if you have change1 -> change2 -> ... -> changeN and arc land change1, we update change2 on the new change1, then update change3 on the new change2, etc., all the way down the tree, "cascading" the change down commit-by-commit.

"Reconciling" is updating local refs for other reasons than because they depend on landed changes, generally because they are "equivalent" in some way to the "onto" or "into" branches. I think the operation you're doing is the best conceptual match for the "reconcile" step, but there might be technical reasons that it doesn't fit there.

Determine whether the working directory was rebased, and if so update to it during reconcileLocalState()

Doing a basic test this behavior looks good

$ hg wip
@   19:823c7457b7ef cspeck (30 seconds ago) tip test             <- draft commit to be landed, working state when running `arc land`
|     test commit
| o   17:33c3ae4d1277 cspeck (8 minutes ago) master
|/      test commit
o   14:82892d507c28 cspeck (24 minutes ago)
|     test commit
~

$ arc land test
...

$ hg wip
@   20:0b01052fc32e cspeck (80 seconds ago) tip master           <- master is active
|     test commit
~

I'll test out a couple more scenarios but I think this is right. It felt correct while writing it anyways~

cspeckmim edited the test plan for this revision. (Show Details)

Not-working state test

  • I created a diff with an inactive bookmark test but working directory on an older commit not related to the one being landed
  • I landed that diff
  • I verified the ending resulting working state was on the older commit I was on prior to landing

In this scenario the resulting working set ends up being something else -> the previous master commit (but bookmark inactive). I switch arcanist back to master branch to check without these changes and the behavior is the same, so even if this state seems unexpected it's not a regression introduced with these changes

$ hg wip
o   28:a33f5df2c48a cspeck (32 seconds ago) tip test
|     test commit
| o   26:74de4ab3c972 cspeck (2 minutes ago) master
|/      test commit
o   23:999d8d1f780f cspeck (4 minutes ago)
|     test commit
@   20:0b01052fc32e cspeck (25 minutes ago)            <- this commit is active working state
|     test commit
~

$ arc land test
...

$ hg wip
o   29:8b15ac1bbdcd cspeck (61 seconds ago) tip
|     test commit
@   26:74de4ab3c972 cspeck (3 minutes ago) master       <- ended up on this commit, with master not active
|     test commit
~
src/land/engine/ArcanistMercurialLandEngine.php
1176–1177

Actually I could probably add a hg update $this->getLocalState()->getLocalCommit() here so that landing from an unrelated commit would leave you back at that commit

cspeckmim edited the summary of this revision. (Show Details)

When landing from a working directory that isn't in the set of changes being landed, restore the working directory to that commit after the land.

Also ran into a bug in the following scenario:

  1. Create a commit which adds a file, make this commit not on the master head (results in 2 heads, though I'm not entirely sure this matters)
  2. Create a diff from this commit
  3. Create another commit on top of that which modifies that file
  4. Update the diff with the new commit
  5. Try to land the commit

The result is that arc throws an exception because when it's attempting to merge/rebase there is a merge conflict, even though there is no conflict between the changes in the diff branch and master.

I believe there were two things happening

  1. The ArcanistMercurialLandEngine::selectCommits() was failing to select more than the head commit in a range of commits. From the way the code attempts to parse the results of an hg log --template I believe the format of the template was incorrect and not picking up the multiple commits to be on separate lines.
  2. The ArcanistMercurialLandEngine::executeMerge() seems to use a range of commits in a reverse order than expected. Once I fixed #1 I got another failure where the merge/rebase's --rev argument was selecting 0 commits to be rebased.

I can put this in a separate diff if desired, but I came across this issue while testing landing from a non-head commit to see that it will detect that it was part of a rebase and still update the working state just as if you were on the head commit (which does now work). Coincidentally a coworker of mine has been running into this issue a few times in the past few months and I had no luck trying to figure out what was wrong or how to reproduce it.

src/land/engine/ArcanistMercurialLandEngine.php
702–703

With this change it seems to now select the proper commits when there are multiple local commits as part of a diff. I got to use that fancy template format you taught me! Here is what arcanist displays this to the user as

INTO COMMIT  Preparing merge into "master" from remote "default", at commit "e84517035151".
 LANDING  These changes will land:

  *   D22361 test commit
          8c1f1f2610ee  test commit
          fba13f95233f  test commit

 >>>  Land these changes? [y/N/?] y

(this sample output would be better if I had any more creativity for my test commit messages and diff titles)

This is what it previously displayed

INTO COMMIT  Preparing merge into "master" from remote "default", at commit "e84517035151".
LANDING  These changes will land:

 *   D22361 test commit
         fba13f95233f  8c1f1f2610ee4d90d871df01e5383387f86efa58-23:999d8d1f780f -

>>>  Land these changes? [y/N/?] y

My workflow I use most of the time involves locally amending commits so I only ever have a single commit per diff which is why I probably never ran into this. Arcanist's display when selecting a single commit was also a little less suspect

INTO COMMIT  Preparing merge into "master" from remote "default", at commit "74de4ab3c972".
LANDING  These changes will land:

 *   D22359 test commit
         a33f5df2c48a

>>>  Land these changes? [y/N/?] y
829

I'm not totally positive this is correct, but I tested landing when there's a single commit for the diff and two commits for the diff and the resulting behavior was correct.

841–844

I originally uncommented this, largely because of the issue my coworker was running into and the problem manifests as the rebase operation being in a partial state, so aborting it seems appropriate. However when I ran into the min..max vs. max..min issue the result was that the rebase just exited because there was nothing to do so this --abort resulted in a failure~

At the moment I'm leaning towards still running rebase --abort here (and the other location) as when these uncommon scenarios hit the working directory is in a slightly cleaner state. If there was a way to run the rebase --abort and discard the result of that it would probably be best.

cspeckmim edited the test plan for this revision. (Show Details)

Add back in the rebase --abort for now

src/land/engine/ArcanistMercurialLandEngine.php
1052–1055

I was just reviewing this in passing, should the backslash here be double-escaped? For a test case I think this would involve setting up multiple dependent revisions and landing the top most one, seeing them all get landed?

src/land/engine/ArcanistMercurialLandEngine.php
1052–1055

I was just reviewing this in passing, should the backslash here be double-escaped?

I only have a minute between baby-events, but \n means "literal backslash, literal n" when the string uses single quotes. The syntax highlighter is arguably misleading about this.

It would perhaps be more correct in some purist sense to write this as \\n, which has the same value, but the argument would look something like "there are two totally equivalent ways to specify this; from a purity standpoint we'd prefer to use only one way for consistency; \\n is the less-ambiguous specification (and has the same behavior in both single and double quotes) and thus is reasonable to prefer".

epriestley@orbital ~/dev/phabricator $ cat quotes.php 
<?php

echo 'node\n';
echo "\n";
echo 'node\\n';
echo "\n";

epriestley@orbital ~/dev/phabricator $ php -f quotes.php 
node\n
node\n

I'm going to do a few more tests -- I'm guessing the --base argument would affect what commits get snagged up during landing (or is that captured during diff phase only?). I normally use arc:bookmark however I recently switched it to arc:this so it's also compatible with git. All of my tests I've run currently were using arc:this.

src/land/engine/ArcanistMercurialLandEngine.php
1052–1055

\n means "literal backslash, literal n" when the string uses single quotes

I think I was starting to catch on to the single vs. double quotes though I wasn't aware \\n is the same as \n in single quotes. I think I'm also getting tripped up with what value we need passed to the template - a literal newline or the individual \ and n characters, which when I say it aloud it's pretty clear we don't want to be passing a literal newline. Using the appropriate syntax when testing with bash also adds some fun to interpreting everything.

baby-events

awww.jpg (119×132 px, 4 KB)

I'm guessing the --base argument would affect what commits get snagged up during landing (or is that captured during diff phase only?)

Ah it looks like the --base flag isn't part of diff workflow

$ arc land test --base arc:bookmark
Usage Exception: Argument "--base" is unrecognized. Use "--" to indicate the end of flags.

Just in case I arc set-config base arc:bookmark and re-ran the multi-commit test and things worked fine.

I normally use arc:bookmark however I recently switched it to arc:this so it's also compatible with git. All of my tests I've run currently were using arc:this.

Went back through https://secure.phabricator.com/book/phabricator/article/arcanist_commit_ranges/ and realized I can specify a ruleset of arc:bookmark, arc:this to get the behavior I want in both mercurial and git. Huzzah!

To confirm the min/max change I reviewed the revset x..y documentation along with the expected ordering of commits in the ArcanistLandCommitSet, then renamed + added some comments to clarify this behavior.

Whoops - that last change results in the same original behavior. It turns out confirmCommits() reverses the order that selectCommits() creates.

Just did another pass-through of all these changes to do some cleanup.

  • Removed an old line of code I was testing out trying to handle multiple onto-markers when restoring the working state
  • Updated some comments and formatting of marker ref queries
  • Found some logic which didn't quite carry over from refactoring of newPushCommands() in handling of cleaning up markers to be deleted
src/land/engine/ArcanistMercurialLandEngine.php
666–674

Aha

1006–1011

I tested out this case of landing onto master while not having that bookmark locally. This doesn't end up deleting the markers and I'm pretty sure it's because of other newMarkerRefQuery() or possibly as part of fetchTarget(). Assuming I'm correctly interpreting what the intent of this then it's probably okay to just remove this code which will simplify things a bit.

Don't actually unset the found rebased active commit. Not sure why I thought that was a good idea.

Uncovered an issue when using the evolve extension, in which landing a non-tip revision would leave behind a set of commits in a hideous state.

Ex, setting up 2 revisions each with 2 commits

$ hg wip
o   115:c11862f2e1f6 cspeck (12 seconds ago) tip test2
|     second diff, commit 2
o   114:46b7f768905a cspeck (20 seconds ago)
|     second diff
o   112:f3320e9ebdcb cspeck (28 seconds ago) test1
|     first diff, commit 2
@   111:c98c9efed5f1 cspeck (37 seconds ago)
|     first diff
o   109:bfd57cd29d55 cspeck (67 minutes ago) master
|     second diff
~

When running arc land test2, the land will succeed but leave behind this bit of mess:

$ hg wip
o   118:0c6bcc1ba699 cspeck (84 seconds ago) tip test2
|     second diff, commit 2
o   117:c82dd8d37be6 cspeck (92 seconds ago)
|     second diff
@   116:e17987796f54 cspeck (100 seconds ago) master
|     first diff
| *   115:c11862f2e1f6 cspeck (84 seconds ago)
| |     second diff, commit 2
| *   114:46b7f768905a cspeck (92 seconds ago)
| |     second diff
| x   112:f3320e9ebdcb cspeck (100 seconds ago)              <- This changeset was pruned but its descendants were not, leave them as orphans
|/      first diff, commit 2
o   109:bfd57cd29d55 cspeck (68 minutes ago)
|     second diff
~

The fix is to prune the commit and all descendants, and not just the commits that are part of the revision.

Sorry for the churn on this, it sorta became a whackamole for an assortment of issues I ran into. I think most of the test cases I've tried out now indicate that the behavior is pretty solid at this point. For now I don't have any remaining test cases to cover and I think this is in a state for review.

src/query/ArcanistMercurialWorkingCopyRevisionHardpointQuery.php
25–31 ↗(On Diff #51633)

I also ran into this odd behavior which confused me at first. After landing a non-head revision I tried to land the last remaining head revision and it failed to find an associated revision. Locally the revision contains 2 changesets, which only the first's commit message was updated to include a Differential Revision: line upon creation of the original diff, but the tip commit does not have this line. However when using arc which it will indicate that it's associated to a revision. I dropped a TODO here because I'm not sure what the solution would be (utilizing base revset is a guess) and it's probably a larger change to try and resolve.

cspeckmim edited the test plan for this revision. (Show Details)
cspeckmim retitled this revision from Put repository in a more-expected state when using `arc land` with mercurial to An assortment of fixes and updates to using arc-land with mercurial.Jul 10 2021, 11:44 PM
cspeckmim edited the summary of this revision. (Show Details)

If there was a way to run the rebase --abort and discard the result of that it would probably be best.

This might be resolved by the time I read as far as the actual code, but you can manually handle error codes with functions named "manual". e.g.:

list($stdout, $stderr) = $this->execxLocal(...); // Throws an e"x"ception on error.
list($err, $stdout, $stderr) = $this->execManualLocal(...); // You're saying you'll manually handle errors.

I'm guessing the --base argument would affect what commits get snagged up during landing (or is that captured during diff phase only?).

I think you sorted this out later, but generally: diff and land should broadly affect the same changes, and should use the same rules to select commits. However, I think the "base" rules generally shouldn't matter for arc land: the expectation is that arc will identify which commits are landing purely through ancestry, then map them to revisions because they've already been submitted with arc diff (that is, Phabricator already knows that commit abcd1234 was attached to D12345, so that's definitely the right mapping, and so arc land doesn't have to worry about base rules).

This can become ambiguous when users do local rebases or make local changes, but arc land doesn't have too many choices here since it has fewer possible things it can do: if a new commit is on top of some commits associated with D12345, it can only become part of D12345: arc land can't make a new revision for it, or combine it with some non-contiguous commits elsewhere in history. The only options are "treat it as part of D12345" or "abort", so it just says "you're landing some stuff you haven't actually diffed, probably minor local fixes, really do this?".

Users who want to do crazy stuff can still do it explicitly with arc land <commit C> --revision X to force arcanist to map C and its unpublished ancestors to revision X.

which when I say it aloud it's pretty clear we don't want to be passing a literal newline

Well technically I suppose either one works:

$ hg log --limit 5 --template 'commit
'
commit
commit
commit
commit
commit

But literal newline is (or was?) impossible to escape on Windows, and escaping the newline seems a lot more expected and easier to debug (e.g., --trace will dump the whole command out on one line rather than giving you a secret semantic newline in the middle) so I agree that it's preferable to pass \n rather than a literal newline.

Sorry for the churn on this, it sorta became a whackamole for an assortment of issues I ran into.

This has been my experience in this code, too. I think this is kind of just inevitable given the enormous range of different input and output states, plus trying to make Git and Mercurial behave in a spiritually similar way.

src/land/engine/ArcanistMercurialLandEngine.php
800–808

I think this is totally fine, but I was using min and max in a topological sense (i.e., the max commit has no descendants in the range, and the min commit has no ancestors in the range). In Mercurial this is a bit ambiguous (the local changeset IDs could be in whatever order, I think, and is probably the thing you'd think of first as far as max and min go).

In both version control systems, the "max" commit may not be the "newest" commit because all commit dates are user-controlled and a commit's commit/author/etc dates may be before those of its ancestors. This is rare (and always indicates some kind of anomalous user behavior) but comes up often enough (e.g., see T11537) that I've tried to put myself in the frame of mind that none of the dates on commits are guaranteed to tell you anything about commit ordering.

If you ever use commit dates to impose ordering, you can end up with some nonsense buts like commits someone accidentally "authored" in 2099 being pinned to the top of a feed forever.

I don't know of any concise, well-understood term for "topological maximum". "Newest" is probably less opaque than "max".

841–844

(I expect this should do that, since it's using the "manual" call. Maybe there's another callsite not using the "manual" flavor.)

1006–1011

I'm possibly a little bit lost here, but assuming I'm getting this right:

I think it's reasonable to expect that landing onto a bookmark which you don't have locally will create that bookmark locally. The equivalent behavior doesn't happen under git (you can land onto a remote ref without creating a corresponding local ref) but we can't really do the thing Git does in Mercurial, and "give you the bookmark" seems reasonable/expected/consistent with what hg pull does by default.

Although we could perhaps imagine cases where this matters, I think these workflows are all silly (in Git, too) and they're only worth "getting right" (i.e., executing with the smallest possible set of side effects) when we don't have to work unreasonably hard to minimize sensible, VCS-default side effects. It seems like way more work than it's worth to try to keep the local working copy clean of bookmarks you land onto.

1177

Incredibly minor quibble, but this test should probably be an explicit if ($this->rebasedActiveCommit === null). This particular test can't really get the wrong result, but string logic operators in PHP have some "gotchas" where they're uncomfortably permissive (see also T2312) and it's better to never let your guard down.

In particular, if you have the obviously-a-real-value commit hash "0e482192" and perform this comparison:

if ($commit == 0) ...

...PHP will cast the string "0e482192" to an integer, interpret it as "0x10^482192" (i.e., "e" means exponentiation in engineering notation), which has value 0, and it will pass the conditional.

src/query/ArcanistMercurialWorkingCopyRevisionHardpointQuery.php
25–31 ↗(On Diff #51633)

Yeah, I think this is probably out of scope here, although I think (?) I'd expect arc to get this right anyway.

A possible solution in general (to the problem of arc losing commit mappings across rebases) is arc rebase, which would let arc retain a mapping of "old commit X was rebased into new commit Y, so they're the same when figuring out revisions".

Another "solution" is to run arc diff and update the diff with a silly message like "rebased lol", although I dislike this for various reasons.

Still, I'd expect arc land <head2> to look at the ancestors between head2 and master, find the "Differential Revision" commit, and then figure out that "head2" can only be part of that revision, possibly prompting you ("Local range includes commits which don't match the revision, are you sure you want to land them?").

But almost certainly not really an issue with the behavior here, at least.

This revision is now accepted and ready to land.Jul 11 2021, 7:17 PM

I'll make these last few minor updates and do a quick few tests to make sure everything is still in working order

src/land/engine/ArcanistMercurialLandEngine.php
800–808

I wasn't entirely sure about the renaming I was doing for the exact reasons you mentioned. I've been looking at commenting other parts of ArcanistLandEngine with some of my learnings here and clarifying in what order things are processed and describing things concisely is difficult. Yesterday I was thinking about with using the term ancestor and descendant instead.

the max commit has no descendants in the range, and the min commit has no ancestors in the range

I think this description is very useful. I think I'm going to make some updates to naming and/or comments, possibly changing back to min and max since I am also hesitant about including chronological terms which might be more confusing in the future.

841–844

the result was that the rebase just exited because there was nothing to do so this --abort resulted in a failure~ [...] If there was a way to run the rebase --abort and discard the result of that it would probably be best.

I expect this should do that, since it's using the "manual" call. Maybe there's another callsite not using the "manual" flavor.

Ah I probably saw the mercurial error output and assumed an exception was thrown when one actually wasn't. My comment about "discard the result" was probably focused around hiding that output. It looks like ExecFuture might be able to do something like that with setStderrSizeLimit(0) though I'm not too worried about it.

1006–1011

Yea that makes sense, thank you for adding clarifying details. I'll remove this extra code for deleting the new bookmarks to simplify this code some.

1177

I will update this.

...PHP will cast the string "0e482192" to an integer, interpret it as "0x10^482192" (i.e., "e" means exponentiation in engineering notation), which has value 0, and it will pass the conditional.

o_o This definitely sounds like something I would only uncover through a terrible bug hunt

src/query/ArcanistMercurialWorkingCopyRevisionHardpointQuery.php
25–31 ↗(On Diff #51633)

Still, I'd expect arc land <head2> to look at the ancestors between head2 and master, find the "Differential Revision" commit, and then figure out that "head2" can only be part of that revision, possibly prompting you ("Local range includes commits which don't match the revision, are you sure you want to land them?").

But almost certainly not really an issue with the behavior here, at least.

If I want to go spelunking later to see about improving/adding this behavior is this the right place to do this or is there another place this TODO should go?

cspeckmim edited the summary of this revision. (Show Details)
cspeckmim edited the test plan for this revision. (Show Details)

Made updates from comments. Re-ran the "Landing a non-tip revision" test and everything still looks good.

If I want to go spelunking later to see about improving/adding this behavior is this the right place to do this or is there another place this TODO should go?

Oh, I think I expect ArcanistMercurialWorkingCopyRevisionHardpointQuery to have no idea which revisions the commits should go with in the "rebase" or "local commit on top of other commits" cases. The "there's only one logical thing we can actually do" logic, where arc land decides they must either go with the contiguous ancestor revision or the workflow must abort, happens later on. The major parts of related control flow are:

  • ArcanistLandEngine->loadRevisionRefs(), where arc land processes the --revision flag and commits which have obvious (or obviously-ambiguous) revision mappings. Commits that we aren't sure about (usually, including touch-up commits and rebased commits) fall through:
// NOTE: We may exit this method with commits that are still unassociated.
// These will be handled later by the "implicit commits" mechanism.
  • ...to ArcanistLandEngine->execute(). Here, commits are sorted into sets of contiguous commits (ArcanistLandCommitSet), and this implicitly attaches the "touch-up" commits to the revisions of their ancestors (see also ArcanistLandCommit->newRevisionRef(), which looks at a commit's ancestors to find a related revision if it doesn't have an explicit revision from the HardpointQuery).
  • Finally, if we did locate implicit commits, we ask the user to confirm that they're expected touch-up / rebase commits a little later:
final protected function confirmImplicitCommits(array $sets, array $symbols) {
  assert_instances_of($sets, 'ArcanistLandCommitSet');
  assert_instances_of($symbols, 'ArcanistLandSymbol');

  $implicit = array();
  foreach ($sets as $set) {
    if ($set->hasImplicitCommits()) {
      $implicit[] = $set;
    }
  }

  if (!$implicit) {
    return;
  }

  echo tsprintf(
    "\n%!\n%W\n",
    pht('IMPLICIT COMMITS'),
    pht(
      'Some commits reachable from the specified sources (%s) are not '.
      'associated with revisions, and may not have been reviewed. These '.
      'commits will be landed as though they belong to the nearest '.
      'ancestor revision:',
      $this->getDisplaySymbols($symbols)));

  foreach ($implicit as $set) {
    $this->printCommitSet($set);
  }

  $query = pht(
    'Continue with this mapping between commits and revisions?');

  $this->getWorkflow()
    ->getPrompt('arc.land.implicit')
    ->setQuery($query)
    ->execute();
}

Generally, the flow for arc land A B C --onto master is to: take all ancestors of A, B, and C that are not ancestors of master; group them by revision; then try to land each group. The "touch-up commits implicitly go with the ancestor revision" stuff is inside the "group the commits by revision" part.

Remove TODO from ArcanistMercurialWorkingCopyRevisionHardpointQuery. Add some comments to functions in ArcanistLandEngine.

Still, I'd expect arc land <head2> to look at the ancestors between head2 and master, find the "Differential Revision" commit, and then figure out that "head2" can only be part of that revision, possibly prompting you ("Local range includes commits which don't match the revision, are you sure you want to land them?").

Ah I think the behavior I'm seeing where arc land fails to identify the proper commit is due to the order of commits and revisions that are being created.

If you create commit A then use arc diff, then create commit B the result is

B (bookmark)
|
A - "Differential Revision: https://..."
|
o master

Running arc land bookmark does appear to look at commit B, then A, figure out that A is associated to a revision, then implicitly ties B into that same revision. However if you first create both commits then create the diff from`B` the result is

B (bookmark) - "Differential Revision: https://..."
|
A
|
o master

When running arc land bookmark it identifies B with the revision but then fails to group A with a commit set and gives an error that it can't identify which revision A belongs to, even though arc diff lumped the commit in with the revision due to base configuration. Looking at the revision in phab it only shows in the "Commits" tab that the changeset for B is included but not for A. In the first scenario, after running arc diff to update and include changes for B the phab revision includes that commit. This makes me think that in this second scenario where running arc diff initially includes multiple commits that the revision should be updated to include the commit hashes for each, and then might result in having better detect this scenario if it's searching for revisions with associated commit hashes.

This makes me think that in this second scenario where running arc diff initially includes multiple commits that the revision should be updated to include the commit hashes for each, and then might result in having better detect this scenario if it's searching for revisions with associated commit hashes.

I believe this is the expected behavior. Specifically:

However if you first create both commits then create the diff from B the result is...

If you created a diff from only B explicitly (arc diff B^, i.e. force arc diff to omit A by selecting only B), I think arc land is correct to abort and force you to sort things out: you've put an unpublished, unreviewed commit in history and it definitely isn't part of the revision you're landing because you explicitly excluded it. arc could continue faithfully only by discarding it, and this seems sort of dangerous/arbitrary: it seems more likely your expected state and actual state differ than that that you've intentionally put a junk local commit in history which isn't related to any revision and which you intend to discard.

If you created a diff from B normally (with something like arc diff master in Git, or most sensible base rules), arc diff will select both A and B and create a revision with the changes in both commits, and both commits associated with the revision.

For example, here's arc diff --base arc:outgoing against the state you describe, which includes both commits:

Screen Shot 2021-07-19 at 9.14.17 AM.png (482×1 px, 63 KB)

arc land then correctly associates both commits with the revision:

$ arc land --hold --conduit-uri=http://local.phacility.com/

...

 INTO COMMIT  Preparing merge into "default" from remote "default", at commit "162a9c1e8e44".
 LANDING  These changes will land:

  *   D18 A
          409349be71c5  A
          ade642d0c045  B

 >>>  Land these changes? <y>

Oh, maybe I'm mixing up running arc diff B against running plain arc diff while being on B

I’ve used this a few times in the past week and the behavior is overall better for my own case. I had one other developer upgrade to master branch and also reported better behavior, though his setup is very similar to my own.

src/land/engine/ArcanistLandEngine.php
1296–1299

I'm poking around some and noticed this doesn't make use of $cascade_set but passes $set for the cascade. Should this be

$this->cascadeState($cascade_set, $into_commit);

I haven't noticed any issues in using Arcanist but I don't think I regularly hit the cascade path either.

src/land/engine/ArcanistLandEngine.php
1296–1299

Ah, yes. I think this likely requires the use of --incremental and/or arc land a b ... to actually hit (and might sometimes/often be moot -- we'll just cascade "too much" stuff, which is generally a no-op, maybe?) which is perhaps why it has escaped notice.