Using Differential with plain Git, without requiring Arc
Open, Needs TriagePublic

Tokens
"Cup of Joe" token, awarded by yelirekim."Party Time" token, awarded by tomekj2ee."Like" token, awarded by stevex."Love" token, awarded by ishanarora."Like" token, awarded by rencris."Love" token, awarded by MGChecker."Like" token, awarded by Treri."Love" token, awarded by tycho.tatitscheff."Like" token, awarded by michaeloa."Like" token, awarded by noisy."Like" token, awarded by cscott.
Assigned To
None
Authored By
qgil, May 8 2014

Description

Wikimedia's code review process relies on Gerrit, but we are starting to do steps to move it to Phabricator. Currently our official process goes through git-review, but contributors can use plain Git as well.

Is it technically possible to mirror this situation in Phabricator, having an official process that goes through Arc, but allowing users to use plain Git to contribute patches? If so, are you interested in developing this feature or to support it if someone else develops it? We would like to offer to our contributors the features that Gerrit supports and allows today.

In fact, there is a way to bypass Arc, but you're expected to 'git diff' and copy/paste that in the web UI. This is really inconvenient. And then you'll loose commit metadata that way. Like committer, author, and most important: parent of a commit.

This request was originally filed at https://phabricator.wikimedia.org/T127, where you can also find an entertaining discussion, :)

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

Major unknown here is the complexity of decoding the Git wire protocol. The D9599 workflow goes like this:

  1. User runs git push.
  2. Commit hook intercepts the change, causes side effects, and then rejects the push.
  3. git sends "sorry, error" over the wire.

Today, this happens next:

  1. User gets an error.

The flow should actually be:

  1. Phabricator intercepts the "sorry, error" message.
  2. Phabricator rewrites it into a "ok, no problem" message.
  3. User gets a success state.

Beyond this basic response rewriting, we also need to rewrite requests to support T10691 + T8092, and rewrite responses more exhaustively (those tasks largely center around silently redirecting git push origin staging into git push origin some/internal/ref, and git fetch epriestley/master into git fetch refs/forks/epriestley/master). These aren't required for this to work, but are adjacent, and I believe ref handoff is legitimately valuable (I am far less convinced of the value of forking or git push-to-review, except as onboarding tools).

Broadly, the traditional (GitHub, Gerrit) way to do all of this is to implement git. It is quite possible that that's a much better idea and that protocol rewriting is a pathway to ruin, but I don't want to implement git if we can possibly avoid it and we've gotten a lot of milage out of protocol mangling so far.

fooishbar added a comment.EditedMay 20 2016, 4:23 PM
(I am far less convinced of the value of forking or `git push`-to-review, except as onboarding tools).

If it implemented one-revision-per-commit behaviour, it would be a good way to avoid having to deal with arc diff's insistence on squashing coherently-rebased patch series into a single, possibly enormous and unreviewable, revision. (Though, looking at D9599, I see now it carefully squashes the entire branch into a single revision.)

If this is the outcome of you using arc land then you are doing it wrong. Just flick the history to immutable if you don't want it squashed.

@benjumanji history.immutable does not actually cover it, but this task is probably the wrong place to begin that entire discussion again.

@fooishbar, this task will not address your use case. We will, as closely as possible, implement arc semantics without requiring arc. The primary upstream motivation for this workflow is to provide a stepping stone toward arc which is as similar as possible to the arc workfow, not to provide a new, entirely different workflow which would make adopting arc more confusing and difficult.

For your use case, perhaps see T5636, T3875, T1508, etc. But I think we are unlikely to ever really support the workflow you are interested in, where each commit becomes a revision. We expect you to run arc diff and arc land once per revision, and to size revisions so that this is not onerous. I believe these constraints are reasonable and still accommodate a very wide variety of workflows. If your project faces unique forces which make these constraints unreasonable, you may have better luck evaluating other review software which targets a use case better aligned with the problems you seek to solve.

The Git wire format for PACK data is very special, here's how you read the length of an entry:

	p = fill(1);
	c = *p;
	use(1);
	obj->type = (c >> 4) & 7;
	size = (c & 15);
	shift = 4;
	while (c & 0x80) {
		p = fill(1);
		c = *p;
		use(1);
		size += (c & 0x7f) << shift;
		shift += 7;
	}
	obj->size = size;

I believe this saves as much as five bits per pack object over a more conventional representation.

avivey removed a subscriber: avivey.May 20 2016, 6:18 PM
scode added a subscriber: scode.May 20 2016, 6:41 PM

Git protocol rewriting appears practical.

I worry that the driving users here may not have clearly defined their needs or expectations for this feature. In particular, it is difficult for me to imagine that supporting this command:

$ git push HEAD:review

...as a strict replacement for copy-pasting changes into /differential/diff/create/ is substantially valuable. I am concerned that users are actually imagining that this "does what Gerrit does", which it won't, probably can't (because of fundamental differences in workflow models), and we don't want to write or maintain a complex Gerrit compatibility layer.

In the past, we've often seen users object when something is new or unfamiliar, or something they're used to changes. At Facebook, every time we changed literally anything, users would create groups lamenting the change, imploring us to "bring back the old Facebook", etc.

It's hard for me to distinguish between "this is new and different" and "this is a real problem" feedback here and on linked tasks. I'm worried about moving forward in a direction that's solving the wrong problem ("copy/pasting is really, really inconvenient") when the real root problem is different ("Phabricator and Gerrit are different").

If we are to support git push HEAD:review, I'd like users who are interested in this to explain exactly what they expect it to do, and why that's dramatically better than copy-and-paste.


Some feedback here is "make Phabricator work like GitHub". This is adjacent, although covered more closely in T10691. I think we can "do what GitHub does" much more closely, that this feature has a real future in the upstream, and that this solves a wider range of problems (particularly, letting "git push = save changes").

Under a strict reading of most feedback this would be satisfactory to everyone (i.e., no one is actually saying "Phabricator should work exactly like Gerrit" that I can see, except maybe @cscott in T5000#58192), although I worry that many users objecting to copy/paste are really objecting to the workflow not being like Gerrit's flow.

scfc added a comment.May 20 2016, 9:38 PM

My problem with copying & pasting are the media breaks: I have to format the patch, make sure it doesn't get mangled by some auto-wordwrap, trailing whitespace deletion, etc., and then paste it in or upload to a web form. git push ensures that I don't have to worry about any of that because the patch is an opaque "blob".

The nicety of Gerrit (or GitHub) is that I can use git, git-review and ssh for that, all provided, reviewed and supported by Debian, Fedora, etc. If T4200 was resolved (or Arcanist was stable and general enough so that external packagers could do that), I think a lot of interest in this task would move away.

IIRC, we copy/pasted at Facebook for quite a while -- maybe 6-12 months, circa 2007, before arc and related tools really got built out -- without ever running into wordwrap/whitespace issues.

On my system, at least, I can git diff | pbcopy and I'm not aware of any way that can mangle the output. Can you reproduce any of these issues against modern Phabricator?

So in the case of Haskell.org, specifically the Glasgow Haskell Compiler (which I'll go ahead and put out there, since I believe we're the one who got this put on 'The Queue' in question), our root problem that we think this fixes, I believe, is not "Phabricator should act like GitHub" or "Phabricator should act like Gerrit", which aren't reasonable or actionable as requests. They're sometimes brought up, but not our real problem. Rather, it is "Contributing smaller changes carries lots of unnecessary friction, perhaps even psychologically, because of arcanist".


So. We often hit this complaint where people often say the activation energy required to submit "trivial" things (typos, doc fixes, minor obvious 'wrong' things which pop up often), which are good to have, is too high. This in turn makes people move away.

Now, while there are *many* places to improve this process in our workflow (I'll spare you details), one of them sticks out particularly like a sore thumb: arc. In fact people are almost unreasonably suspicious of arc to me, I think -- likening it to some kind of uncontrollable voodoo curse which was set upon you purely due to God determining you are a child of misfortune.

I do not think it is important that arcanist is a needed tool, because we need many tools to for working on GHC. But rather that it sits directly in the developer workflow, and most brand new developers cannot immediately determine exactly what arc 'is' and it 'is not', and exactly how it will interact with git, and what does it even do that's important anyway? Most of these people are also completely new to Phabricator so the design differences which explain these points aren't immediately obvious (other tools have no concept of linting or autofixing for example, so the fact arc can do this and git cannot and why that's good isn't immediately obvious)

The expectation is basically, "I want to contribute a typofix, and maybe some more later, but why I do I have to spend 20 extra minutes up front on my first patch, a typofix? Can't this be easier up front?" Note that GHC is a compiler so it inherently has a somewhat high activation energy; this is 20 minutes spent on top of the 20-40 minutes you spent already getting it to build, etc.

I think it's important to note that many people do things like suggest GHC move to GitHub to make things 'easier', but note GH isn't really very good for us, and the core issue isn't "GitHub", it's "easier" for the specific case of "I want to get my feet wet very easily".

(I realize this is somewhat against the general Phabricator design ethos, which is to favor long-term productivity at the minor hitch of some short term discomfort, and to emphasize core, scalable workflows.)

For the most part, signing in with OAuth and whatever is like 30 seconds of work so that's not a very big turn off. People almost always seem to be inherently suspicious of needing arcanist, however. This ruins their concept of a "simple, easy workflow" to match their "simple, easy bugfix" and they get confused or scared.


OK, so how do *I* envision it to work? To be very handwavy: approximately how Arcanist behaves. The fact it behaves the same way isn't as important as the fact it isn't arc, it's their much more familiar git. It's easier to say "Oh, Phabricator emphasizes a different way of using git. So that's what you should follow". If you say that but also say "use arc to submit diffs", people immediately think this is some kind of weird 'deficiency' that Phabricator "needs a tool".

Let's say I have a repository and I want to submit a diff. I do this:

$ git checkout -b bugfix
$ emacs # fix the bug
$ git commit -asm "fix a bug"
$ git push HEAD:review
...
Wow, good job! Check out your review at https://secure.example.com/D1234

Bob submits this, and Alice rejects it because Bob totally forgot to use the oxford comma in the process of writing something down. Bob fixes that by making a new commit:

$ git branch
* bugfix
  master
$ emacs # fix comma usage
$ git commit -asm "submit to the will of the english"
$ git push HEAD:review
...
Bold move, there. Your updated diff is at https://secure.example.com/D1234

Alice now accepts this revision. There are a few things that could happen here:

  1. Bob merges it himself, using arc land. Because arc land only works if the submitter is running it anyway, it can easily do a merge --squash + arc amend
  2. Alice uses arc patch to merge it herself which basically works as expected (merge --squash + arc amend)
  3. Bob (these days) can also hit Land Revision in Differential directly, which can also work as expected (squash + amend, server-side)

Some other points:

  1. In most of our cases for this feature, Bob is overwhelmingly going to be a new contributor, so option 1 for him is off the table in 99.99% of scenarios. Option 3 may be available.
  2. There is the possibility that if Bob is not a new contributor, but in fact has push access, and can commit himself. In this case, he could refuse option 1) and 3) and just git merge bugfix; git push origin master if he really desired, avoiding the whole process. But Bob will then be scolded as contributors have much stricter guidelines for committing to the tree, and told to not do that. This occurs rarely in GHC as it is (people forget to close out diffs with their commits), but is rare enough I think it doesn't need any mechanical enforcements.

What about the dependent-on case?

$ git branch
* bugfix
  master
$ git checkout -b dependent-bugfix
$ git branch
* dependent-bugfix
  bugfix
  master
$ emacs # fix something else
$ git commit -asm "fix another bug"
...
Wow, good job! Check out your review at https://secure.example.com/D1235

This would be a new review, as Phabricator will presumably be able to see what the local incoming reference was (so it can see dependent-bugfix is different from bugfix).

The population of depends on metadata is unclear here. In the event you can get the information about the base commit from git push metadata, I believe you should be able to automatically assign Depends On metadata if you key differential revisions (in the whole) by what base commits the diffs in the revision applied to (which I'm almost certain you do track, already), because you can see "The base commit of this local dependent-bugfix branch exists in bugfix, which was submitted as D1234, therefore, D1234 is a dependency."

Now, Alice *again* requires that Bob really fix the bug in bugfix, and requests changes. dependent-bugfix will also need updates now.

$ git branch
  dependent-bugfix
* bugfix
  master
$ emacs # respond to Alice's comments
$ git commit -asm "Fix, based on Alice's comments"
$ git push HEAD:review
...
Bold move, there. Your updated diff is at https://secure.example.com/D1234
$ git checkout dependent-bugfix
$ git pull --rebase . bugfix
$ git push HEAD:review # push for a rebase
...
Bold move, there. Your updated diff is at https://secure.example.com/D1235

These primary cases (which can be extended to multiple patches and depends-on relationships based on your Git branches) are almost no different, except instead of calling arc diff you call git push HEAD:review. Again, I don't think the actual enforced flow is the problem as much as the anxiety from feeling like a "tool" is backseat-driving for them. (Naturally almost any long-term person gets used to such tools, but newcomers are inherently suspicious and the Popular Internet is very monocultural around GitHub these days.)


Why not .patch files that are uploaded through the web interface?

  1. Finnicky. .patch files are actually binary, so introducing whitespace problems through contributor mistakes can happen making diffs seemingly impossible to apply, and defy all logic. This can happen if the naive new contributor for example uploads the diff somewhere so they can copy/paste it (see SSH case below) in the browser and the upload silently mangles the whitespace in some way.
  2. Error prone. The above approach should be able to unambiguously identify the repository, allowing metadata to be populated like Reviewers through Herald rules or the new Owners work this past week. It also allows critically important metadata like Repository to be assigned, which is vital for other rules to trigger on patch submission, like Harbormaster build rules (to date, if you have more than 1 repo, key your Herald rule on the repository in question, and someone puts a .patch file up for review and forgets this, it doesn't trigger the rule). I am not sure if it's possible to require a repository when copy/pasting.
  3. Tedious. Developers in The Cloud Today may often be working from a powerful remote server which has an SSH key available for development, from a laptop. To post a patch file, I'll have to manually dump it using git diff <range> > foo.patch, followed by finding some way to get the patch file uploaded. If it's over an SSH terminal, you'll either need your terminal scrollback to be big enough for the entire .patch file so you can copy/paste by using cat foo.patch directly. If you're in something like tmux, this is basically too tedious to be usable, so you'll have to push your branch elsewhere, or use a service like http://sprunge.us or GitHub gists/forks.
  4. Can Linux even reliably copy and paste from an arbitrary application into the browser yet, or is it 2016 and it's still all messed up? Because it couldn't do that the last I tried (*ba-dum tish*)

Finally, I think the above workflows sound reasonable, so correct me if I'm wrong, but I do believe the Git protocol massaging should allow you to correctly calculate all the necessary information about the state of the repository to build diffs correctly in most cases, I think, except perhaps for the Depends On case.

Anyway, this is just a braindump. Thanks for taking a look at the initial investigation, and please let me know if this all sounds like complete nonsense jibberish or if this hits at something you think is vaguely reasonable at all.

The expectation is basically, "I want to contribute a typofix, and maybe some more later, but why I do I have to spend 20 extra minutes up front on my first patch, a typofix? Can't this be easier up front?" Note that GHC is a compiler so it inherently has a somewhat high activation energy; this is 20 minutes spent on top of the 20-40 minutes you spent already getting it to build, etc.

Oh, and just to elaborate here, I think this is mostly brought up because humans are just goddamned awful at cost and risk estimates. People really, really would rather not spend 20 minutes now if there's ever the possibility of doing it later and they can't see why not.

Even if 20 minutes now saves everyone over time. In reality with our docs I think arc takes like 5 minutes running, not 20, but the perception is that "I'm going to have to spend a whole lot of time figuring this all magical stuff out just to submit a typofix", because human risk estimation, and when people get scared in any way by that they often don't ask for help but confuse themselves into a corner (for example, by doing something wrong with arc somehow, then just coming and later saying "I tried it and it didn't work!!!" but they can't exactly reproduce or remember it and they conveniently set their bash history on fire 10 minutes prior).

Anyway, just an extra thought, some of that is just life, I guess.

...

  1. Can Linux even reliably copy and paste from an arbitrary application into the browser yet, or is it 2016 and it's still all messed up? Because it couldn't do that the last I tried (*ba-dum tish*)

seriously though if anyone knows how to fix this so that copying between Emacs and Chrome on Linux isn't a nightmare I will pay you 💰 plz help

Wow, good job! Check out your review at https://secure.example.com/D1234

Without substantial additional work, I think it won't/can't say this, instead it would say:

Complete this review at https://secure.example.com/differential/diff/1234/

You'll still need to go there and fill out all the stuff you would have filled out in your commit message. That is, this flow would drop you at the screen right after you copy/paste:

You would still need to choose to create/update a revision, then fill out all of the information:

Although we could theoretically read this from the commit message sometimes, I'd guess our hit rate will be very low if users aren't using arc to get a template and are just writing whatever their heart desires in their messages. We could reject poorly-formatted commits in the push but then users would need to --amend and potentially get into trouble with local commit stacks. Without any tools on the client, this seems hard to guide them through.

Bold move, there. Your updated diff is at https://secure.example.com/D1234

I'm very hesitant to update diffs automatically, because I think the rules won't be clear (and we don't have any client tools to define them). If you rebased your change, I think we have no way to guess that it's related and that you want to update. We could guess that it's related in some cases if there's a common history, but then we'd have inconsistent/unpredictable behavior.

This would be a new review, as Phabricator will presumably be able to see what the local incoming reference was (so it can see dependent-bugfix is different from bugfix).

This information is not available on the server. We only see the remote reference. I believe we can not distinguish between an update and a dependent revision, without having users explicitly embed all that information in the reference with git push HEAD:review%dependencies=D123 or similar.

$ git push HEAD:review # push for a rebase

Particularly without client tools, I think this workflow is almost impossibly difficult to make align with user expectations consistently. If this is marked as "depends on" but not rebased on the thing it depends on, what does that mean? Did they remove the dependency? Is this an error and they forgot to rebase? Did they merge because they don't like rebasing? We can reject pushes, but we can't prompt them or do anything interactive without client tools.

In the general case of stacked/dependent diffs, we can't see any local references and can't guess that you mean "only some of the stuff I pushed".

We can safely funnel everything to /differential/diff/1234/ and make them figure it out from there, but only replaces copy+paste with git push.

If we are to support git push HEAD:review, I'd like users who are interested in this to explain exactly what they expect it to do, and why that's dramatically better than copy-and-paste.

My problem with arc is that I want the following things in my commit/review workflow:

  1. Keep a detailed history of stand-alone changes (builds & makes sense in itself, even if it's just a small change) as Git commits.
  2. Build on previous stand-alone changes with more stand-alone changes.
  3. Put a change up for review without earlier changes having been pushed to master yet that this new change depends on.

In other words, my issue is that if there is any dependency between small commits, on the same feature branch, then arc does not let me parallelize reviews. Reviews are slow. People don't generally respond before I have another small change ready. Or I accumulate several small changes to make sure they head in the right direction as a whole.

Large, squashed patches are harder to review and often skip desirable details in the commit message. Numerous small patches tend to stack upon each other and Phabricator doesn't give me an option to say "this commit depends on this other commit, so you can review it on its own but don't try to merge it yet".

Git solves this by separating the concept of a commit from a push, by providing rebases and merge commits that will preserve detailed history and smaller, easier-to-review commits while keeping a single "land" operation.

arc assumes that every change is based off of master, so changes cannot stack.

If I am asking for a workflow that supports plain Git without arc, what I hope to get is for Phabricator to allow Git concepts in reviews. That doesn't necessarily mean pull requests and Releeph, but it means recognizing several commits in a feature branch that can depend on each other and get reviewed separately.

For illustration, this is what I do now to keep arc from messing with my history while still providing small, reviewable commits to my team mates:

$ git commit -a -m "Commit that can stand on its own"
$ arc diff --head HEAD HEAD^ # top commit is ready, let people review it
(...make progress on the next commit for this feature,
    notice that something is off in the previous one...)
$ git commit --amend file.cpp -m "Updated commit message."  # Fix it up.
$ git commit -a -m "New commit that builds on the previous one."
$ git log  # Double-check that everything makes sense :)
$ arc diff --update D1234 --head HEAD^ HEAD^^
$ arc diff --head HEAD HEAD^ # creates D1235

Apart from the fact that arc diff --update uploads don't advertise when I change the commit message, this will also produce a new review D1235 that is entirely detached from D1234 and will fail the build if integrated with CI.

In my ideal world, this workflow would look like this instead:

$ git commit -a -m "Commit that can stand on its own"
$ git push origin HEAD:review # top commit is ready, let people review it
Created review D1234.
(...make progress on the next commit for this feature,
    notice that something is off in the previous one...)
$ git commit --amend file.cpp -m "Updated commit message."  # Fix it up.
$ git commit -a -m "New commit that builds on the previous one."
$ git log  # Double-check that everything makes sense :)
$ git push origin HEAD:review -f  # updates both reviews at once
Updated review D1234.
Created review D1235.

I personally don't like how GitHub de-emphasizes individual commits, and I found Gerrit awkward to use iirc, but what both get right is that I can update a review branch with separate commits still intact and correctly ordered.

As for correctly-formatted commits, I would argue that a required commit message template is far less intrusive and more reasonable than to take over the entire commit/squash/rebase process with arc. This is at least as easy to set up as installing arc, and promotes good practices that can be reused for projects outside of Phabricator.

Phabricator doesn't give me an option to say "this commit depends on this other commit, so you can review it on its own but don't try to merge it yet".

Write Depends on D123 in the commit message.

arc assumes that every change is based off of master, so changes cannot stack.

I don't understand this -- I stack almost all of my changes. For example, these changes were all stacked on top of each other locally, and IIRC most of them were written before the first one was reviewed:

D15909, D15910, D15911, D15912, D15913, D15914, D15915, D15916, D15917, D15918

$ git push origin HEAD:review -f # updates both reviews at once

This is strictly impossible in the presence of possible rebases or amends without a local commit hook.

Regardless, any workflow we implement here will mirror the arc workflow as closely as possible. No resolution of this task will change workflows or introduce new workflows. See T5000#176274.

Write Depends on D123 in the commit message.

Oh, that's awesome. Didn't know, sorry.

Regardless, any workflow we implement here will mirror the arc workflow as closely as possible. No resolution of this task will change workflows or introduce new workflows. See T5000#176274.

Understood, I guess the prompt for explaining what I expect git (vs. arc) to do was meant in a narrower way than I took it for.

bgamari added a comment.EditedMay 21 2016, 7:04 AM

You'll still need to go there and fill out all the stuff you would have filled out in your commit message. That is, this flow would drop you at the screen right after you copy/paste:

This makes me nervous: we currently do see copy/pasted Differentials submitted to GHC fairly regularly and they are almost invariably a pain to handle as they lack base commit information. This means that if the repository diverges enough to result in even a single conflict, arc patch fails to apply the patch and drops you with your working tree in an unknown state, with no easy ability to proceed in resolving the conflict, and if you manage to manually resolve it, no commit message. This is a severe regression from the usual git workflow.

Perhaps this should be considered to be a separate bug against arc, however.

To me, I imagined this task would be supporting something rather like:

git push HEAD:my-review-branch

and that non-master branches in the repository would be Differential Revisions (with each push forming a new diff), rather than a single endpoint of review that's completely isolated and seperate.

But maybe that's more Releeph than this task.

non-master branches in the repository would be Differential Revisions

I think the closest match for this is T10691 (GitHub-like forking). I'd plan to implement forking very similarly to GitHub.

I don't plan to ever encourage users to create un-namespaced personal branches in the remote because I believe this process doesn't scale well to a large number of users. Much of the value of forking in my mind is giving users who want to do this (particularly, under the "git push = save changes" mode of thinking) a namespaced place to do it.

It's possible that branches in forks could have some amount of automatic magic (e.g., push to update a revision) but I think the possible magic is limited, because "git push = save changes" doesn't want a revision, we can't distinguish between "save changes" and "create revision", we can't do much if we guess "create revision" and there's bad metadata, and some of the value of this feature is strictly in being like GitHub and GitHub requires you to use the web UI to create a revision ("pull request").

Perhaps this should be considered to be a separate bug against arc, however.

If you'd prefer to apply a patch manually, you can arc export --git ... to produce it, then go from there. We can possibly improve arc patch's behavior too, but that's a baseline where you can fall through to the underlying tools if it isn't working well for you.

This makes me nervous: we currently do see copy/pasted Differentials submitted to GHC fairly regularly and they are almost invariably a pain to handle as they lack base commit information.

Base commit information would be retained.

Information, particularly "title" and "summary", could be prefilled to some degree, but I would expect users to still need to go through the form workflow to verify it, and to fill out some of the fields (e.g., "Reviewers"). (If they type "Reviewers" somewhere in their commit messages and then make errors by specifying invalid reviewers, I'm not sure we really have any good behavior available to us). Overall, this workflow would be similar to the GitHub pull request workflow where you create a pull request based on some pushed changes, but must still provide additional information.

Updating would be more manual than GitHub. With pull requests, we can deduce that an update is an update because it has the same branch name in the remote. With git push HEAD:review, it does not.

Some types of information (like the names of local refs/branches, the local path, the local host, etc) would necessarily be lost compared to an arc workflow. We can not know this information with a git push workflow. And local unit tests and lint obviously wouldn't run.

hach-que added a comment.EditedMay 21 2016, 1:15 PM

non-master branches in the repository would be Differential Revisions

I think the closest match for this is T10691 (GitHub-like forking). I'd plan to implement forking very similarly to GitHub.

I don't plan to ever encourage users to create un-namespaced personal branches in the remote because I believe this process doesn't scale well to a large number of users. Much of the value of forking in my mind is giving users who want to do this (particularly, under the "git push = save changes" mode of thinking) a namespaced place to do it.

I'm not imagining this actually creates branches in the remote; rather Phabricator just tracks who pushed it and what the branch name was when it was pushed, and converts it to a Differential Revision on the fly. It then stores the branch name -> differential revision mapping somewhere, so when something else is pushed to that branch, the revision gets updated. The code however only exists in the revision; the branch push doesn't actually get accepted into the repository.

So for example, if I do git push HEAD:my-temporary-branch (or make a branch and set the upstream branch to origin/my-temporary-branch, and then push), I would expect:

  • Phabricator receives the branch and knows who is pushing it based on PHABRICATOR_USER
  • It looks up differential_branchnamespace where branchName = "somevalue" AND userPHID = "PHID-pusher"
  • If the entry exists, it updates the existing differential revision that table points to using the pushed data
  • If the entry does not exist, it creates a new revision and adds the entry
  • When a revision is closed (either by accepting it and abandoning it), the entry is removed from the table so users can re-use branch names
  • Commandeering it should probably just change userPHID so that if someone commandeers it back, everything still works right
  • Phabricator then rejects the push (or accepts it with magical wire re-writing, but doesn't actually put the changes into the repository)

So your desired behavior is that git push origin develop should create a revision, instead of creating a new branch in the remote like a normal Git repository would? How would users create new branches in this model?

I'd expect there to be either per-repository rules (like "Tracking Branches" for repositories right now), or Herald-based rules "on branch hook, convert to Differential Revision action" that would allow Phabricator to determine whether a branch is allowed in the remote or not.

For almost every workflow I can think of in Phabricator, in workflows encouraged by the upstream at least, it's strongly recommended that you don't have arbitrary or multiple branches in a remote. To put it another way, the number of branches in a remote should be small and contained (like stable and master) because you shouldn't have feature branches there.

For example, at our work we'd set up a global Herald rule something like this:

  • On branch hook
    • If repository projects is not in "allow any branches"
    • And, if pusher's projects is not in "allow push any branches"
    • Convert to differential revision

FYI we'd mainly be placing devops engineers in the "allow push any branches" project group, so that in the case where we need to hotfix production, authorized users can create a branch in the remote to create a release from it.

We'd probably also add another Herald rule that says if a branch starts with "personal-" then always convert it to a differential revision so that those users can still use the feature.

Another possibility is that we just build this behavior (intercept and react to ref changes) as an extension point prior to the Herald, and then you can implement whatever magic you want. I suspect very few installs want the default behavior of git push origin mybranch for arbitrarily named branches to be anything except "create a branch".


@thoughtpolice outlines two problems above which I think are somewhat persuasive arguments for git push over copy/paste, even if that's all it's doing:

  • Copy/paste is significantly less convenient than git push if you are working on a remove development server.
  • Copy/paste isn't always really a thing in Linux.

The whitespace issue ("copy/paste destroying patches") would be persuasive to me too, except that I'm not sure how it occurs or how to reproduce it. I don't believe I've experienced this myself.

One possible thing I might be missing here is that Facebook and Phabricator use linters to strip trailing whitespace before changes are published, so diffs generally apply cleanly even if all trailing whitespace is stripped from them.

If a project did not do this and the codebase routinely has trailing whitespace, then the removed lines in the diff/patch could end up with missing whitespace after stripping, which would cause an apply conflict. I would generally expect stripping trailing whitespace from added lines to be possibly inconvenient but not particularly problematic.

I don't know how common trailing whitespace is in codebases. I can't immediately identify any in Mediawiki (find . -name '*.php' | xargs grep ' $'). A "better" fix for this in some senses is probably to adopt a lint pipeline, but we shouldn't require that.

But I can't even figure out how to copy/paste things in a way that strips trailing whitespace. Even when I manually copy/paste content with trailing whitespace from my terminal by selecting it with my mouse after artificially resizing the window so it's about 12 characters wide and the whole thing is a horrible wrapped garbage mess on screen, whitespace copies faithfully and is correctly preserved.

But perhaps this is some combination of trailing whitespace in codebases, weird terminal software, weird Linux desktop environments, clipboard extensions, weird browsers, codebases full of mixed-encoding text and non-utf8 data, weird linux utf8 desktop browser terminal clipboard extensions, etc., and my toolchain just happens to be impervious and not have any broken components.


Generally, my inclination is to implement forking first, then reconsider "ref magic" and see if any of it is still valuable. For new users, I suspect GitHub's familiar forking model is at least as good as git push HEAD:review magic, and possibly much better (it's far more familiar to many users, and we can guide users along the flow much more easily because most of it happens from the web).

It's easier to understand, and easier to drive CI from, and solves "git push = save changes", and solves "git push = collaborate", and supports eventual release use cases, etc.

It also solves problems with copy/pasting being unreliable, and provides and preserves more metadata than ref magic. The arguments for ref magic over forking seem to mostly be things that don't impact new users and/or which I don't plan to address ("make the whole workflow work completely differently").

Grimeh added a subscriber: Grimeh.Aug 4 2016, 5:31 PM
eadler added a project: Restricted Project.Aug 5 2016, 5:23 PM
carwyn added a subscriber: carwyn.Sep 4 2016, 12:18 AM
ablekh added a subscriber: ablekh.Sep 12 2016, 6:20 AM
enckse added a subscriber: enckse.Oct 7 2016, 12:27 PM
urzds added a subscriber: urzds.Nov 28 2016, 12:48 PM
J5lx added a subscriber: J5lx.Dec 4 2016, 10:41 AM
ariel added a subscriber: ariel.Dec 7 2016, 7:40 PM
ppggff added a subscriber: ppggff.Dec 12 2016, 4:53 PM
Ltrlg added a subscriber: Ltrlg.Jan 20 2017, 11:52 PM
20after4 added a comment.EditedJan 25 2017, 11:12 AM

Another possibility is that we just build this behavior (intercept and react to ref changes) as an extension point prior to the Herald, and then you can implement whatever magic you want. I suspect very few installs want the default behavior of git push origin mybranch for arbitrarily named branches to be anything except "create a branch".

This would be really great, I think we (Wikimedia) would likely use it to implement virtual refs and custom access controls.

  • Currently it seems that phabricator only supports basic per-repo policies and Herald. Repo policies and herald rules each have their own limitations and drawbacks:
    • No easy way to apply a policy to a group of repos. This makes it difficult to maintain when you've got over 9000 repositories. Managing the policy on each one individually is cumbersome.
    • Herald rules are more powerful and can do most of what we want, however, it requires a rather fragile combination of herald rules, repository tags, user-group projects, policies and duct-tape to piece it all together.

No easy way to apply a policy to a group of repos.

You might already be aware of it, but you can currently create an "Object Rule" in Herald using a project, then tag all the repositories with that project.

For example:

  • Create a project like "Uses Core Contributor Rules".
  • Tag all your "Core" repositories with this project.
  • Write Herald "Object" rules for commits, choosing the "Uses Core Contributor Rules" project as the object the rule acts on.

however, it requires a rather fragile combination of herald rules, repository tags, user-group projects, policies and duct-tape to piece it all together.

For custom access controls, you should already be able to use existing Herald actions as an extension point. Just write one rule like this:

When:
Always
Take these actions:
Run my organization's custom access controls

Create that action by starting with DiffusionBlockHeraldAction and having it apply whatever custom access controls you prefer.

(I suspect a lot of that complexity might be irreducible, though, unless some other system already defines it. For example, you'll still need some way to define which rules apply to a particular repository, which will probably look a lot like tagging repositories. And you'll need a way to define which users have which permissions, which will probably look a lot like putting users in projects.)

What you can't currently do is write a Herald rule that has an effect like "pretend the push succeeded, but actually redirect or ignore the refs that were pushed".

pasik added a subscriber: pasik.Feb 17 2017, 12:42 PM
mves added a subscriber: mves.Mar 10 2017, 7:16 AM
zhen added a subscriber: zhen.Jun 1 2017, 10:40 AM
stevex awarded a token.Jun 9 2017, 9:15 AM
stevex added a subscriber: stevex.