Page MenuHomePhabricator

Commit into repository directly from differential
Open, NormalPublic

Assigned To
Authored By
tuomaspelkonen
May 26 2011, 11:57 PM
Referenced Files
F910883: Screenshot_20151027-192254.png
Oct 27 2015, 8:24 AM
Tokens
"Like" token, awarded by leoluk."Love" token, awarded by scp."Like" token, awarded by kuba-orlik."Like" token, awarded by amartin33."Love" token, awarded by austinstoker."Orange Medal" token, awarded by turadg."Like" token, awarded by tycho.tatitscheff."Like" token, awarded by halfsea."Like" token, awarded by hwinkel."Like" token, awarded by igorgatis.

Description

Just opening up a conversation about how to do this. This would be a really nice feature for deffirential.

From Facebook Tasks:

D255317 fixes (or at least should fix) the latest branch, which means we can get a working build out for testing.

Right now though, we need to get it reviewed, committed, and requested for merging in to latest.

I'd like to be able to request a diff to be merged, prior to it actually being committed. This is a valid thing to do as long as we're not using shadow branches in production (and only for testing), though I guess that might be debatable.

Revisions and Commits

rP Phabricator
Closed
Closed
D7486
D14350
D14349
D14348
D14347
D14346
D14343
D14341
D14338
D14337
D14314
D14295
D14280
D14270
D14266
D14259
D14258
D13022
D8719

Related Objects

StatusAssignedTask
OpenNone
Openepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
OpenNone
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
ResolvedNone
Resolvedepriestley
Resolvedepriestley

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

General progress update here: this now works, technically. Almost every part of it -- particularly, the UI -- is still total garbage for now, but if you ship a revision against rGITTEST up to this server you'll get a "Land Revision" button on it which has worked twice in the past and might work again. I would recommend extending the list of pasta varieties found in the pasta file if you want to try this (the button should appear and function without requiring the revision to be accepted for now, and everyone has write access to the repo anyway).

Although the UI is a mess, the underlying fundamentals are scalable: because this process leverages Drydock, it could perform two (maybe three!) pulls at the same time if you had enough hardware.

This stuff is done or mostly done or otherwise in good shape:

  • Blueprint selection needs a little more UI work but feels like it's headed in a good direction and generally in a good place already. T9519 has details.
  • Diffusion now has a "Repository Automation" section which allows blueprint selection. This seems fine for v1.
  • Merge requests exist as objects (DrydockRepositoryOperation) and about half the support code is in reasonable shape.
  • Drydock is generally holding up and beginning to feel like an actual piece of useful infrastructure.
  • And the thing does technically work in a reasonable subset of cases today.

This stuff needs significant additional work:

  • DrydockRepositoryOperation is still missing some fundamental interfaces (like a list view), and some major concepts (like making sure patches apply sequentially and probably a command queue) are undeveloped or undeveloped.
  • UI is a huge pile of trash, needs a bunch of work. Particularly, feedback about progress and outcomes is exceptionally poor.
  • The actual merge mechanics are OK, but have three major remaining areas which require work:
    • Mechanics live in DrydockRepositoryOperation, but probably need to live in WorkingCopyBlueprint (or otherwise be accessible from both places). The code could move easily in a purely technical sense, but picking up merge errors from WorkingCopyBlueprint is less easy.
    • We aren't trying to deal with any of the hard cases yet (like revisions not based on the target branch). Worse, we will happily do the wrong thing, which is dangerous.
    • Process is not trusted, and I think getting a chain of custody which covers "what you see is what will land" will be hard in the general case.
  • Lots of hard coding of things like "master" being the target branch.
  • General lack of configuration and workflow/application logic (e.g., no way to only allow accepted revisions to land).

We had a very slow week and haven't made much progress here. There is a new list view and some new, more reasonable UI. We landed our first production change with this mechanism.

The next pieces don't have obvious or unambiguous paths forward yet:

  • We have a hardcoded one-resource-per-blueprint limit we need to lift, but doing this in a way that is too simple may make things harder once the mechanism expands in the future.
  • We need to store results from operations so we can tell the user what happened (or what failed) but don't have a great place to put them right now, and underarchitecting this may also make things harder in the future.

I think these are probably both straightforward once I figure out how to proceed, they just need some thought about what the medium and long terms versions of these features will look like.

I think we should disable the "Land Revision" action item if the revision has already been landed.

Screenshot_20151027-192254.png (1×1 px, 227 KB)

The action is always available right now because it makes testing (particularly, testing things like "does it do the right thing when the revision is already landed") easier, but I'm going to require "Accepted" for v0 of this, which will indirectly disable it for landed ("Closed") revisions. We'll probably relax that later but it seems like the right place to start from, at least.

This is in reasonably functional shape for ambitious users. There's some documentation now:

We've landed most of the recent upstream changes with this mechanism. These specific things have been improved:

  • DrydockRepositoryOperation UI is in fairly reasonable shape now and provides critical progress feedback through the use of animation.
  • Merge mechanics are in WorkingCopy now.
  • Merge mechanics are less bad, but still a bit bad.
  • Limits and error handling are in better shape.
  • Only accepted revisions can land, and some other workflow sanity checks now exist.

This stuff remains:

  • Hard-coded target of master.
  • Hard cases (revisions with extra history / need rebase) aren't handled yet and may do the wrong thing.
  • Workflow does not have a chain of custody yet, and what lands can still be different from what was reviewed.
  • Probably plenty of other rough edges that we'll hit over time.

If you try setting this up, I'm interested in feedback about problems you run into or which missing features are the most important for your use case.

Because this works from the staging repository, it will be possible to trick it right? I mean, I could do this to sneakily commit virus.exe to the repository:

git add virus.exe
git commit -a
# Make a legitimate change
git commit -a
arc diff HEAD~

Yes, that's what this refers to:

Workflow does not have a chain of custody yet, and what lands can still be different from what was reviewed.

Another way to do it is:

  • Comment out the staging push in arc land.
  • arc diff normally.
  • Push arbitrary, wholly different changes to the staging area.

I can't remember which ticket, but I remember a similar discussion regarding arc patch (whether arc patch should patch onto HEAD or pull from the staging area).

Do you have any ideas on how to improve this issue?

I think the discussion is in T8090.

There are two cases here:

  1. The changes to be landed are on top of other changes (as in your example, where virus.exe is snuck in underneath).
  2. The changes to be landed are wholly different from the changes to be reviewed (as in my example, where a malicious user sent a good change for review and pushed virus.exe for merge).

For (1), we already know exactly which commit hashes we expect to land, so the merging work in the LandWorkflow will just get stricter and reject merges which would bring in other commits.

This will create a mild usability issue: if you diff A and then diff B on top of it, and then land A, it would be nice to be able to land B without rebasing it first, but you won't be able to in a mutable repository because the version of A on B will differ from the version of A on master.

For now, you'll just have to rebase. Later, we can refine this by keeping track of prior lands, so when we land commit X (diff A) as commit Q, and then see commit X in the history of B, we can recognize that it's equivalent to Q and just remove both commits from the histories before we check if the merge is exact. This will take some work but we don't need to get it right 100% of the time, since there will always be some cases where we hit conflicts and users legitimately need to rebase anyway.

For (2), I think we need to read the diff out of the staging area instead of having arc submit it. This is going to be pretty messy, but I think it's doable. I'm primarily worried that the client wait may be unreasonably long (I don't see a way to avoid doing a blocking, synchronous read of the pushed commit from arc), but hopefully we can get that down into an acceptable range.

In the long run, with hosted repositories and more flexibility around virtualizing refs (like T5000 and T8092) we can do much better here, and we can do something reasonable today for any hosted repository by generating a diff as a side effect in the commit hook if the push touches a diff tag, but I don't see a super clean way to do this with tracked, external staging areas.

The not-so-clean flow, particularly, is:

  • Push to staging area.
  • Call a new differential.createdifffromstagingarea() Conduit method.
  • This method blocks until the diff is read.

Internally, it has to do something like start a Drydock operation to read the diff, then wait until the operation completes.

One possible alternative is for the import pipeline to start recognizing "magic" refs and doing things as a side effect of import, and to put similar logic on the hook pipeline for hosted repositories. They can theoretically share some code. The workflow as a whole would look the same, except arc would be responsible for polling Conduit to wait until the diff shows up.

All the policy, merge strategy, etc., stuff is hard-coded to the easiest set of reasonable defaults. Some of these defaults may not be suitable for many installs (e.g., always rebase, always merge to master, require edit capability for imported repositories in order to perform merges). I expect to expand these shortly, I just don't want to focus on making things more flexible until they actually work.

Are you planning a fully specifiable merge command so we can run anything we want for merging?

We make heavy use of submodules at my work, such that every change is part of a submodule change. git refuses to merge a super-project if a submodule changed in both branches, even if the changes to the submodule would merge cleanly. We have extended git with a custom command so running git rmerge <branch> from the super project will go through submodules that changed in both branches and merge each of them. If we can fully specify the merge command itself then we can extend git on the phabricator install with our custom rmerge command and invoke it. I realize we could dig into the phabricator source and change anything we want but I'd prefer to stick with the upstream version if possible.

Are you planning a fully specifiable merge command so we can run anything we want for merging?

There will be more flexibility in the future, but I don't anticipate making the merge command configurable, per se (principally, the merge is more complex than a single command, and will become increasingly complex in the future).

You can create a new blueprint which provides WorkingCopy resources and merges in a different way, then use that blueprint type instead of the vanilla WorkingCopy blueprint. However, this will probably be a lot of work to maintain today. Once blueprints stabilize more it may be more realistic.

@epriestley I'll keep an eye on blueprints and the merge configurability. Thank you. Phabricator is awesome/and getting awesomer.

eadler added a project: Restricted Project.Feb 23 2016, 7:14 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 23 2016, 7:18 AM

Just bookkeeping, this doesn't currently have external priority.

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:17 PM