Page MenuHomePhabricator

Support Perforce/Git in "arc land"
Closed, ResolvedPublic

Description

See PHI1506. An install which mostly uses Git but uses some Perforce-bridged-to-Git would like arc land to work better in Perforce/Git repositories.

Particularly, it seems like minor changes will allow things to work:

  • Skip git fetch, which is meaningless in Perforce.
  • Use git p4 submit instead of git push.

Event Timeline

epriestley created this task.

I got Git/Perforce working after a large amount of bashing my head into errors like these:

test.txt - file(s) not in client view.
. - must refer to client 'perforce_client'.

I probably could not retrace my steps.

git p4 itself appears to be a 4,100 line python script. As far as I can tell, a Git/Perforce repository is not stateful in any meaningful way, and the only way we can detect it is by the presence of a p4 remote.

Also, git p4 submit is interactive.

git p4 sync is the probable replacement for git fetch, I think? But I don't think we can figure out which arguments to run this with automatically, a Git/Perforce repo doesn't seem to save the depot location.

--disable-rebase and --disable-p4sync are likely relevant.

You don’t need to pass any arguments to git p4 sync: the config with the dépôt path to “fetch” latest changelists from is saved inside .git directory. Did you follow the exact instructions I provided?

Git p4 script is rather elegant: it creates a “fake” remote which it populates with commits created from perforce changelists. Then the local master branch tracks this remote branch.

You can probably detect that a repo is a Perforce one by poking around the .git directory.

the config with the dépôt path to “fetch” latest changelists from is saved inside .git directory

I can't find any kind of depot/fetch path information in my .git/, are you seeing it somewhere specific?

I couldn't follow your instructions exactly since I had to set up my own depot, but mine looks like this:

$ git p4 clone //depot turtle-git
$ cd turtle-git
$ grep -Ri depot .
$

That is, the string depot appears nowhere in the actual working copy. I have no special files in .git/:

$ ls -alh .git
total 48
drwxr-xr-x  14 epriestley  staff   448B Oct 24 19:18 .
drwxr-xr-x   4 epriestley  staff   128B Oct 24 19:11 ..
-rw-r--r--   1 epriestley  staff     5B Oct 24 19:18 COMMIT_EDITMSG
-rw-r--r--   1 epriestley  staff     0B Oct 24 19:11 FETCH_HEAD
-rw-r--r--   1 epriestley  staff    23B Oct 24 19:17 HEAD
-rw-r--r--   1 epriestley  staff    41B Oct 24 19:17 ORIG_HEAD
-rw-r--r--   1 epriestley  staff   137B Oct 24 19:11 config
-rw-r--r--   1 epriestley  staff    73B Oct 24 19:11 description
drwxr-xr-x  13 epriestley  staff   416B Oct 24 19:11 hooks
-rw-r--r--   1 epriestley  staff   137B Oct 24 19:18 index
drwxr-xr-x   3 epriestley  staff    96B Oct 24 19:11 info
drwxr-xr-x   4 epriestley  staff   128B Oct 24 19:11 logs
drwxr-xr-x  14 epriestley  staff   448B Oct 24 19:18 objects
drwxr-xr-x   5 epriestley  staff   160B Oct 24 19:11 refs

... and no p4 stuff in .git/config:

$ cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true

I also can't immediately find any code in the git-p4 python script where it writes config into .git/ during git p4 clone, or reads depot config during git p4 sync, although my readthrough may not be exhaustive and there is some branch mapping stuff I don't have a clear understanding of.


Oh, I think I found it:

                logMsg =  extractLogMessageFromGitCommit(self.refPrefix + branch)

                settings = extractSettingsGitLog(logMsg)

                self.readOptions(settings)
                if ('depot-paths' in settings

...
def extractSettingsGitLog(log):
    values = {}
    for line in log.split("\n"):
        line = line.strip()
        m = re.search (r"^ *\[git-p4: (.*)\]$", line)
...

And then:

$ git log
commit 2e37bd42e1923696c3e6f12ebe690cc9a732f5eb (HEAD -> master, p4/master, p4/HEAD)
Author: git perforce import user <a@b>
Date:   Fri Oct 25 01:41:57 2019 -0800

    Initial import of //depot/pond/lily/ from the state at revision #head
    
    [git-p4: depot-paths = "//depot/pond/lily/": change = 5]

So it's in .git/ strictly speaking, but in the actual commit message that git p4 sync creates, not in .git/p4config or something like that.

I believe this suggests:

  • git p4 sync --branch refs/remotes/p4/X is probably the replacement for git fetch --quiet -- <remote> X.

We can probably choose a remote by doing this:

  1. (Current Behavior) If the current Git branch has a remote as an {upstream}, use that.
  2. (Current Behavior) If a remote named origin exists, use that.
  3. (New Behavior) If a remote named p4 exists, use that.
  4. (New Behavior) Probably, we should fail explicitly here with guidance to pass "--remote". I think we'll fail implicitly today, by trying to execute git commands which reference "origin" even though it does not exist.

I believe this will let theoretical hybrid "git + p4" repositories work in either mode and allow users to arc land --remote p4 to explicitly select p4 behavior if either git push or git p4 submit are sometimes valid land operations. This is probably very rare, but the git p4 documentation suggests it's intended as a use case, at least. For example:

If a Git repository includes branches refs/remotes/origin/p4, these will be fetched and consulted first during a git p4 sync. Since importing directly from p4 is considerably slower than pulling changes from a Git remote, this can be useful in a multi-developer environment.

It's possible I'm misreading/misunderstanding this, so maybe the right algorithm is actually to select the "p4" remote ahead of the "origin" remote, i.e. swap steps (2) and (3). That might be safer / more reasonable anyway; users could arc land --remote origin in the weird hybrid case.

Then we can detect that we're in p4 mode with this test:

  • (Simple Test) Is the remote named "p4"?
  • (Advanced Test) Does the commit that refs/remotes/p4/<onto> points at contain [git-p4: ...] metadata?

The simple test is probably good enough in all practical cases, since it only fails if users independently name their normal Git remotes p4. There's no reason you can't do this, but it's likely no one does, and as long as arc prints out a "Selecting p4 mode because your remote is named p4" we should likely be okay.

From there, I think we can merge/rebase normally.

When submitting in the squash-merge case, we can git p4 submit --commit <detached_head>. This might be trickier in the merge-commit case.

Even with git p4 submit --commit <detached_head>, the submit flow implies sync + rebase. I think we do want sync, but not rebase. We should likely git p4 submit --commit <commit> --disable-rebase, then git p4 rebase only if we would normally pull. I think we can git reset or bail out normally in the other cases (these are cases where you land a branch onto an identically named branch, etc).

We can make git p4 submit non-interactive with these (git config) settings:

git-p4.skipSubmitEdit # Disable edit prompt.
git-p4.skipSubmitEditCheck # Disable "message unchanged, continue anyway?" prompt.

Since we expect arc to produce a reasonable submit/commit message, I'm inclined to run with git-p4.skipSubmitEditCheck = true, i.e. saving and exiting the git p4 submit prompt screen should make the submit go through.

I'm also inclined to run with git-p4.skipSubmitEdit = true (i.e., "arc land" performs fully non-interactive submit) since this is more similar to the default arc land flow, but I'm not sure if we can get away with this. If there are niche cases ("I need to edit the 'StreamAtChange' property in such-and-such obscure workflow"), I'd perhaps favor arc land --prompt-for-p4-submit-messages. If there are workflow/expectation issues but this is usually a reasonable default, I'd perhaps prefer "test if you set it to false explicitly, respect that if you did, default it to true otherwise".

It's possible I'm misreading/misunderstanding this, so maybe the right algorithm is actually to select the "p4" remote ahead of the "origin" remote, i.e. swap steps (2) and (3).

I don't full understand your entire approach here. It seems complicated. Why not just implement the basic path for arc land and fail with an error if the user has some esoteric setup?

  1. Check if it's a perforce fake repo
  2. Execute git p4 sync and let it do its thing with its defaults and everything
  3. Squash commit, rebase on master as usual
  4. Call git-p4 submit disabling the UI

Why not just implement the basic path for arc land and fail with an error if the user has some esoteric setup?

Roughly: I think a less complicated approach will lead to "fail with wrong behavior", not "fail with an error" in some cases; and arc land is already very complicated and the changes I'm outlining are simpler to implement than the more straightforward bare commands because they fit into what arc land is already doing more cleanly.

In (1), if we follow the algorithm you outline exactly as you suggest, arc land --remote esoteric-git-origin will successfully push changes to the p4 remote when run in a p4 repository, even though this is very clearly not the user's intent (and hard for them to undo). I'm making the "are we in p4 mode?" test slightly more complicated to avoid successful-but-unintended behavior.

I'm perfectly fine with failing with an error, but I don't want arc land to do something super weird that defies intent if we can avoid it. Likewise, git checkout --set-upstream -b topic-branch esoteric-git-origin/master feels like a fairly strong indicator of intent (but, see also T11535).

For (2), git p4 sync is fine, but git p4 sync --branch ... is "free" (we don't need to do any work to provide the flag value) and should be strictly better (since it asks git p4 to do less work, so git p4 may be able to execute it faster or successfully in more cases).

For (3), squashing-merging is only normal if history.immutable is false. I do think it's reasonable to just fail here ("p4 bridging is not supported with workflows that generate merge commits") but if it can trivially be made to work properly that's preferable. More importantly, we do not actually rebase onto master (that is, we don't update master) in the normal workflow. We work in a detached head instead. As above, the goal here is to avoid surprising side effects, where your master is mutated into some confusing state and then an error leaves you in a state you're unsure how to recover from. Using git p4 submit --commit ... instead of git p4 submit allows us to continue performing the merge/rebase sequence in a detached head, so failures in arc leave all the refs in the working copy in an undisturbed state.

In particular, if we just run git p4 submit, we need to point master at the desired state before we run it. This means we can't run the workflow non-destructively if master is dirty. arc land currently works in Git if master is dirty, since we do all our scratch work in a detached HEAD and just skip the pull/sync step at the end:

    // We aren't going to pull anything if anything upstream from us is ahead
    // of the remote, or the local is ahead of the remote and we didn't land
    // it onto itself.
    $skip_pull = ($ahead_of_remote || ($local_ahead && !$land_self));

    if ($skip_pull) {
      $this->writeInfo(
        pht('UPDATE'),
        pht(
          'Local "%s" is ahead of remote "%s". Checking out "%s" but '.
          'not pulling changes.',
          nonempty(head($ahead_of_remote), $local_branch),
          $this->getTargetFullRef(),
          $local_branch));

...

The --disable-rebase stuff is also to fit the git p4 commands into the existing workflow more cleanly. We never run git rebase in the normal workflow, and don't approximate it if the local target branch is dirty. This behavior is generally better -- users with a dirty master are probably already in over their heads, and not equipped to understand what's going on when arc land dumps them into a conflict state during a rebase.

The more closely we can replace the behavior of git push with git p4 submit, the smaller the p4 change is. If the minimal git p4 submit was push + sync + rebase, a much larger amount of code would need to be different in the git and p4 cases, and the p4 behavior would be necessarily worse.

For (4), yes, but the p4 submit prompt has additional elements and I'm not sure if they ever need to be edited, so I'm just trying to make sure I have some sort of plan to deal with possible feedback around this.

Upshot: arc land is very complicated because many users have difficulty recovering from abnormal git states (aborting rebases, recovering refs from the reflog, resetting branch heads to a particular place, etc), so we try very very hard to never put users in an abnormal state, even if they ^C arc in the middle of the workflow. The workflow templating achieves this by running a series of very surgical git commands. The simplest/smallest change to add p4 support replaces those surgical git commands with surgical git p4 commands. The default git p4 flows are not surgical, just like the default git flows are not surgical, so I believe the simplest change here counterintuitively involves a lot of git p4 --flag-x --flag-y stuff.

This makes a lot of sense, thanks or the time to explain.

If you haven't logged in to Perforce recently (I think?), the sync can fail like this:

$ git p4 sync --silent --branch refs/remotes/'p4'/'master' --
failure accessing depot: perforce ticket expires in -267733 seconds

As far as I can tell, there doesn't seem to be any way to get git p4 sync to prompt the user here, or attempt a login, or do anything else more graceful, by passing flags.

I also don't immediately see reasonable any way to detect that we're going to hit this before we hit it , or detect what we hit other than by testing stderr against an English-language string pattern. Even if we assume p4 is installed, I don't see any way to test the session:

$ p4 login -s
User epriestley was authenticated by password not ticket.
$ echo $?
0

Maybe this? But we have to parse English-language output:

$ p4 login -s -a
User epriestley session has expired.
User epriestley on host 127.0.0.1: Ticket: Set

The documentation suggests that bare p4 login isn't universal (e.g., the manual prominently suggests p4 login -a) so I'm hesitant to try to recover by effecting a login.

For now, I'm just going to assume that "perforce ticket expires in -N seconds" is universally beloved and well-understood by all Perforce users to mean they should log in, which is why the message is opaque and provides no next steps.

I'm passing these additional flags to git p4 submit:

  • -M, detect moves and submit them to Perforce as moves. Barring any reason not to do this, it seems like better beahvior.
  • --conflict=quit, abort the submit operation if any commit has conflicts. The default behavior asks the user, and letting users "skip" (continue applying commits, skipping one) seems highly bad. But I think (?) that even with "quit" we may end up applying some of the commits -- that is, my read is that "git p4 submit" is not atomic when submitting multiple commits. Not much to be done here. Running arc land again will probably be able to resume even in the face of a partial submit.

Under the --merge workflow (which does a --no-ff merge and always generates a merge commit; as opposed to the --squash strategy, which does a rebase-like squash merge) the git p4 submit --commit ... fails:

$ arc land --merge --revision 39
Landing current branch 'merge1'.
 TARGET  Landing onto "master", the default target under git.
 REMOTE  Using Perforce remote "p4". The existence of this remote implies this working copy was synchronized from a Perforce repository.
 P4 MODE  Operating in Git/Perforce mode after selecting a Perforce remote.
 P4 SYNC  Synchronizing "p4/master" from Perforce...

...

 SUBMITTING  Submitting changes to "p4/master".
Perforce checkout for depot path //depot/ located at /Users/epriestley/scratch/frog/
Synchronizing p4 checkout...
... - file(s) up-to-date.
('Applying', 'bb3274c adsf')
//depot/ribbit.txt#1 - opened for edit
error: unrecognized input
Unfortunately applying the change failed!
//depot/ribbit.txt#1 - was edit, reverted
No commits applied.
Usage Exception: Submit failed! Fix the error and run "arc land" again.

In what is apparently typical Perforce error fashion I'm not sure if "error: unrecognized input" means "you can't do this with merge commits" or "one of the merge commits you submitted references a parent commit which you haven't submitted yet" or something else. I'm going to make an effort to sort this out, but probably just give up on --merge pretty quickly here unless it yields very easily.


I got this to "work" by running git p4 submit --commit X...Y, where X..Y is the merge commit and all of its local ancestors. However, the actual merge commit vanishes in the Perforce remote (presumably because it's "empty" once we've applied all the actual merges) and we just get the individual "checkpoint" commits, with no nice "arc land" message. For now, I'm going to turn this into an explicit error message since it generally seems like there's a high degree of impedance between the Git and Perforce concepts of merges and/or fast forwards and/or empty commits, and it seems philosophically questionable to "--merge" when bridging to Perforce.

This appears to be working. T13481 fixed one straggling issue with changes to remote handling in very old versions of Git but nothing else has cropped up.