Page MenuHomePhabricator

Allow Harbormaster to perform change handoff in a defensible way
Open, NormalPublic

Description

When triggering a build in an external system like Jenkins, the external system needs to be able to check out a working copy of the changes (say, D123). There are broadly four ways to do this:

  1. Check out the origin, then arc patch D123 the user's changes.
  2. Fetch some "real" branch which arc pushed to earlier (as a side effect of arc diff).
  3. Fetch a virtual branch which Phabricator provides.
  4. Phabricator builds the working copy and tells the external system where it is.

Implementations in the wild all use (1) or (2) today. These are both severely limited, fragile, complex, hard to configure, require needless dependencies, confusing, etc. For (1), you need arc on the build nodes, among many other problems. For (2), you pollute your remote with internal D123 branches or have to come up with some non-polluting side-band channel.

We should attempt to support (3).

(4) is Drydock, and lets us do a lot of other stuff, but it depends on some other handoff mechanism existing. So the ideal path forward here is building (3), then building Drydock on top of it later.

Generally, path forward is:

  • Evaluate (3).
  • Hopefully we can do it reasonably.
  • If we can't, figure out what the least-bad non-virtual-ref approach is.

Event Timeline

epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added a project: Harbormaster.
epriestley moved this task from Backlog to v1 on the Harbormaster board.
epriestley added a subscriber: epriestley.

I'm worried that plan (2) will make room for errors, where "git fetch-my-change" would produce a state that isn't identical to "download change as a patchfile and apply it over what the base should be", because it will also take any commit I have between my diff and HEAD.

For example, in order to test D13036, I've based it over the changes from D13034, although they are not actually dependent. arc-patch would currently fail for D13036, and fall back to applying to current state.
If in D13036 would also be pushed, and were accepted, an un-careful landing flow might also pick up the changes from D13034 (Which in this example is more risky).

The underlying issue is, I think, that we will now have 2 ways to represent a diff, and they might not be identical.

If some commit, "A", is on some other commit, "B", which is on master, we consider that to be the "correct" form of "A": it's the one you tested, it's the one any local lint / unit ran against, and it's what we expect you to land.

When you "arc patch A" on master, we'll attempt to "arc patch B" first to put the working copy in a "master > B > A" state (the same state that it was in when you ran "arc diff").

When you "arc land A", we'll attempt to warn you that A is on B and B hasn't landed yet (we expect you to land in the same order that the commits are stacked).

These cases are tricky, and a mixture of inherent difficulty, bugginess, and cases where we could probably do better mean that we don't always get them right, but that's the intent.

If you don't want "A" to be on "B", the expectation is that you don't put it on "B" locally. Any issues that "B" causes in the build pipeline are equally capable of affecting local testing, code review, etc.

It introduces more responsibility on the author, which we don't exactly have now, and might not be exposing very clearly.

As a reviewer, I expect the code that lands to be the code from the revision web page, except maybe rebased; And unless it's explicitly marked as "depends on B", that's also what I would expect the CI to test (And implicitly I want the author to say "I've tested this change", even if the parent commit is different).

This is one of the reasons I'm a strong proponent of T182 - I think the reviewed code should be a stronger signal than the intended code for what's being committed (Because that's the code we've all seen).

arc patch currently fits this thinking very well - it takes the change from the same source as the web page, and uses the formal dependency information. When I can't have T182, I tell users to arc patch and than land that, for the same effect.

With (2), we either force the author to be much cleaner in their local environment, or allowing them to say one thing to the reviewer and another thing to the CI bot.

As a reviewer, I expect the code that lands to be the code from the revision web page

I think I'm just not understanding something here, because this is also what I expect but what I believe we currently do.

The code from the revision web page includes "B", it just doesn't highlight it. When you review "A" on "B" on "master", the UI shows you that whole chain. It does not show you A vs master without B.

If we rebase A on top of master and test or land that, it's different from the code you reviewed in the web UI.

For example, suppose A is on B, which is on master.


Scenario 1

"B" adds "example.c", a completely new file.

Then "A" makes some changes to "example.c". A will not rebase cleanly on master.

The revision for "A" shows "example.c", as added by B. Changes made by A are highlighted, but the unhighlighted portion of the left and right sides of the diff came from B. It is impossible to review this change outside of the context of B. This change can't rebase cleanly against master, and A can't be tested independently of B.

I think everyone agrees that the current behavior in this case is reasonable and correct and consistent with expectations, except that maybe it would be nice if arc was better about automtically figuring out that A is on B and explicitly encoding "A depends on B" on the revision. In my experience, this is essentially always obvious from context, which is why I haven't prioritized it. Maybe this is less obvious in other use cases.

The proposed behavior is to assume that this change is wrong, rather than assuming that the dependency is implicit.


Scenario 2

"B" does something, and "A" does something totally unrelated. They touch no files in common and don't interact at all. Maybe B updates documentation and A fixes a typo in an error message.

The revision for "A" technically includes "B", but "A > B > master" is indistinguishable from "A > master", both from the source and from the behavior of the changes.

I think (?) the behavior in this case is moot, and that everyone can agree it doesn't matter. It would still be nice if we were more explicit about showing the context, but there is no practical difference in showing or omitting B.

I don't think anyone cares about this.


Scenario 3

"B" does something, and "A" does something related which interacts with "B", but can also rebase cleanly on "master".

Either they touch the same files, or maybe even touch the same feature, or their effects are somehow related, but "A > master" and "A > B > master" are distinct changes where "A" has different, distinguishable behavior, even though "A" rebases cleanly onto "master".

Specifically, the raw diff of "A vs master" and "A vs B" is the same, or is different only by line numbers such that rebase can apply cleanly.

Thinking about this scenario seems to diverge.

My claim is that you are reviewing a state -- "A > B > master" -- and this the real/authoritative version of this change. You are not reviewing "A vs master", nor "A vs B". We highlight these diffs because it is easiest to understand the state of "A" by seeing its differences from a known state, but this entire state is what the user actually wrote, what is shown in the web UI, what they tested, what their "Test Plan" and "Summary" make claims about, and what arc lint and arc unit are operating on.

The alternate approach seems to be that you are reviewing a diff -- "A vs master", which happens to have B in it but which everyone just ignores because it isn't highlighted in green, and that all local tests and information isn't really that relevant, and that "B" is so often not really part of the change that it is more reliable to ignore it than to include it in the build process.


I think the "diff" view of the world assumes that "B" being in the chain is essentially always a mistake? This is bizarre to me, because I use this workflow all the time and "B" is always intentional and frequently required for a change to make sense or work, even if "A" would rebase cleanly. (For example, B introduces a function and A calls it, and rebasing A on master will break the change.)

If I add a function X() in B and call it in A, and A is on top of B, and I tested my changes locally but didn't explicitly say "A depends on B", and then the build systems says "A is totally broken, function X() does not exist", I'd be very surprised by this behavior and consider the build system broken. Of course A depends on B -- my history is "A > B > master"! The web UI shows that X() exists! My local tests were fine! I can run the exact same unit test command locally and it passes! I can see "X()" in my working copy!

It seems like the proposed view of the world says that all of this is invalid, my working copy state is assumed to be a mistake by default, and that review, local testing, arc unit and arc lint are all so ineffective that we don't even care that they're running on the wrong state, and that they're not only ineffective but unsalvageable, so it isn't even worthwhile to bother trying to get things into the right state, and that only the unit test results matter.

This diverges heavily from my experience (I personally trust "Test Plan" much more than automated tests), so I'm not sure where the differences in use cases and thinking are derived from.

In your environment, is "B" essentially always a mistake? If so, why are users ending up in these states? Is local testing so ineffective that it's preferable to ignore it instead of trying to warn users about it?

For example, imagine arc did this instead:

$ arc diff
You are sending changes that begin with an unpublished commit.

This might mean you've accidentally built on top of local changes, instead of
making a clean branch from master.

Do you want to rebase on master first? [yes, no, never]

This workflow would be a mess and more complex than this in practice, but this approach seems better? I'm not sure where our scenarios are really diverging, though. The "diff" view of the world (vs the "state" view of the world) seems "obviously wrong" to me, not like a reasonable alternative approach.

The web UI shows that X() exists!

Only if X() exists in a file that the user changed in that diff. If X() was introduced in another file in the other diff or commit, then it is totally invisible to the reviewer that the patch relies on X(). Because X() exists in a file not included in the diff, it's not even viewable by requesting "show more lines" or by looking at the left hand side, because the file won't be included in the revision at all.

This (I believe) is the issue. Maybe in PHP this isn't so much of an issue, but in other languages where inherited classes are frequent and inter-file dependencies are common, it's essential to know all of the changes that were involved in making a diff work between what the reviewer has on their local machine (or more specifically, what they can get by fetching origin), and what is being presented for review.

I think we disagree on what the "current behaviour" is. I understand the current behaviour as diff-based: a Revision is a single Diff, which is analogous to a patch file.

This is in contrast to gerrit, in which a review is a commit hash, and "merging" it literally means "merge this to master".

For example, suppose A is on B, which is on master.


Scenario 1

"B" adds "example.c", a completely new file.

Then "A" makes some changes to "example.c". A will not rebase cleanly on master.

The revision for "A" shows "example.c", as added by B. Changes made by A are highlighted, but the unhighlighted portion of the left and right sides of the diff came from B. It is impossible to review this change outside of the context of B. This change can't rebase cleanly against master, and A can't be tested independently of B.

I think everyone agrees that the current behavior in this case is reasonable and correct and consistent with expectations, except that maybe it would be nice if arc was better about automtically figuring out that A is on B and explicitly encoding "A depends on B" on the revision. In my experience, this is essentially always obvious from context, which is why I haven't prioritized it. Maybe this is less obvious in other use cases.

The proposed behavior is to assume that this change is wrong, rather than assuming that the dependency is implicit.

In this scenario, we should make the dependency explicit (Either automatically or manually), which makes the argument moot.

Scenario 2

"B" does something, and "A" does something totally unrelated. They touch no files in common and don't interact at all. Maybe B updates documentation and A fixes a typo in an error message.

The revision for "A" technically includes "B", but "A > B > master" is indistinguishable from "A > master", both from the source and from the behavior of the changes.

I think (?) the behavior in this case is moot, and that everyone can agree it doesn't matter. It would still be nice if we were more explicit about showing the context, but there is no practical difference in showing or omitting B.

I don't think anyone cares about this.

In the behaviour (2) described in the task body (Virtual ref), the change everybody tests, possibly approving, and possibly automatically landing, for A, secretly includes B.
If B introduces a bug, this bug will be shown as belongs to A.
B might not have a Revision at all, or it may have been supposed to be removed before arc diff, and the author forgot (For instance - remove a bunch of test cases that can't run on the author's system. Lot's of developers are bad and do this, instead of fixing things).
These changes are now stowaways on A.

Scenario 3

"B" does something, and "A" does something related which interacts with "B", but can also rebase cleanly on "master".

Either they touch the same files, or maybe even touch the same feature, or their effects are somehow related, but "A > master" and "A > B > master" are distinct changes where "A" has different, distinguishable behavior, even though "A" rebases cleanly onto "master".

Specifically, the raw diff of "A vs master" and "A vs B" is the same, or is different only by line numbers such that rebase can apply cleanly.

Thinking about this scenario seems to diverge.

My claim is that you are reviewing a state -- "A > B > master" -- and this the real/authoritative version of this change. You are not reviewing "A vs master", nor "A vs B". We highlight these diffs because it is easiest to understand the state of "A" by seeing its differences from a known state, but this entire state is what the user actually wrote, what is shown in the web UI, what they tested, what their "Test Plan" and "Summary" make claims about, and what arc lint and arc unit are operating on.

You are not reviewing "A vs master", nor "A vs B"

The UI only shows "A vs base-of-diff" (Where base may be B, or master, or master~32), and we don't even have a way to show "A > B > master". If base is B, then all the changes in B v master not visible in the revision, and the state B is not visible in Diffusion (Or anywhere else).
If B introduces an evil change somewhere that isn't actually changed again in A, it won't be visible in any way, and can't be reviewed.

The alternate approach seems to be that you are reviewing a diff -- "A vs master", which happens to have B in it but which everyone just ignores because it isn't highlighted in green, and that all local tests and information isn't really that relevant, and that "B" is so often not really part of the change that it is more reliable to ignore it than to include it in the build process.

In this approach (Which I maintain is the current behaviour) - we ignore B as a separate change, but the content of the change is there (Highlighted in green) - we're actually reviewing "A+B vs master". If we want B to be distinct, it should be a different Revision, and A should Depend on it.

I think the "diff" view of the world assumes that "B" being in the chain is essentially always a mistake?

No... Not always, but sometimes.
The "diff" view wants to see the change between states of the codebase (Proposed vs Current), because it's easier to converse on, and easier to cherry-pick / reorder.
I want every change to the code based to be explicitly accepted before it's landed.


A few weeks ago, we were bitten by Gerrit's state-based view of the world:
A Change in gerrit is a git commit. The UI shows a tiny field for "base", which is the parent commit, and if it's not ahead of master, it shows a "needs rebase" notice.

We keep two branches (bA and bB) more-or-less in sync, and merge from one to the other routinely, so bB is often ahead of bA (In git's terms). A change was accidently authored on bB, and offered to bA. When merged, it took with it all changes between bB and bA into bA.
A few things went wrong to allow this to happen, but no-one in the team understood this is a possibility.

I want every change to the code based to be explicitly accepted before it's landed.

I think this is the meat of the issue? I strongly agree this is desirable, but I think the behavior you propose is not a good way to get there.

For example, suppose code.php contains this:

if (some-condition) {
  destroy-world();
}

In B, I make this change:

  if (some-condition) {
-  destroy-world();
+  thank-user();
  }

Then I put A on top of it, and make this change:

- if (some-condition) {
+ if (much-weaker-condition) {
    thank-user();
  }

I write up a revision "thank users more often" and send it to you for review. Maybe I'm doing this on purpose, or maybe it's totally by accident and this happens to be the effect of my change.

If anything rebases "A" on master without "B", it will have a severe, destructive effect -- calling destroy-world(); more often. You can't tell this by reviewing "A" if it is diffed against B.

I don't think it's safe or correct to ever silently remove "B" and assume it was a mistake. Although this is fine for some types of bad "B" (like deleting tests), it represents systematic failure for other types of bad "B" (like making dangerous things look safe). These changes might be a little more difficult to make accidentally or through laziness, but I think they're not much more difficult to make intentionally. I think it's imperative that claims that all code has been reviewed should be robust against an attacker compromising an account and submitting malicious changes. A malicious actor should not be able to trick a reviewer, and if we silently remove "B", they can.

Basically, if "A" is on "B", "B" must be reviewed before "A" can land. This must be a property of the system.

Today, it isn't, because users control what they git push so we can never enforce this property against malicious users (we could require that pushed commit hashes exactly match reviewed hashes, but I believe this would be unreasonably burdensome if implemented without the ability to automatically merge/rebase). We've made a half-hearted effort to enforce it for accidental mistakes because I haven't seen many reports of users making these mistakes. In particular, these are things we do weakly today that I plan to do strongly in the future:

  • Make the "A > B" dependency more clear. This is unambiguously desirable, I just haven't seen it be much of a problem in practice.
  • Make the fact that the base commit is not a published commit more clear. We do this in a very soft way today (it won't be linked). Again, this just hasn't been much of a problem in practice, as far as I've seen.
  • Prevent arc land if master does not contain "B". We make some effort to do this already, but primarily prevent mistakes in land order, not bad/secret "B". As above, I just haven't seen this be much of a problem in practice.

Once we have "Merge This", we no longer have the problem where the user can "git push" whatever they want at the end of the day. I would plan to have these behaviors:

  • Prevent "Merge This" if master does not contain "B" (I would certainly do this, at least by default: I imagine the default "merge this" being completely auditable -- we could let installs weaken this if they're willing to trade away trust for ease of use).
  • Prevent "arc diff" if the base commit is not a published commit for some or all developers (I think this is bad, and would slow good developers down, but is a more reasonable solution to the problem than ignoring "B").

The existing DifferentialLandingStrategy code in HEAD does not work like this, but I would plan to make it work like this before unprototyping it. It's a hack around making arc land easier, not an auditable custody. It should be trustworthy (and scalable, and not a sketchy hack) before it is released.

When merged, it took with it all changes between bB and bA into bA.

Put another way, I agree this is bad, but I think the behavior should be stricter than what you propose. Specifically:

  • We're landing "X" to "master" with a chain of custody.
  • Find the commit on "master" with the same tree hash as the base commit of "X", or which Phabricator vouches for as having had that tree hash prior to a trusted rebase.
  • If it does not exist, this operation is an error and may not continue. "X" depends on unpublished code. Tell the user they can not proceed (and give them as much help in possible in fixing the issue: usually the problem is "you have to land Y first").
  • In some configurations, we might give them a "Continue Anyway (Dangerous!)" button which would do the "apply as a patch" workflow, but this is not a safe operation and I would not allow it by default. I would resist building this capability at all.

then it is totally invisible to the reviewer that the patch relies on X()

  • This should be clear to the reviewer, and X() should be present in some other revision.
  • Regardless, this revision should not be allowed to land until the revision which introduces X() is reviewed and landed.

Today, clarity to the reviewer relies on context ("Ref"/"Fixes", human discussion in summary, knowing what the user is doing), explicit "Depends on X", and very weak cues (e.g., base commit not linked in UI). These are not robust, but my experience is that they're sufficient in the overwhelming majority of cases (I haven't seen this be a problem, personally, and haven't seen many reports of it being a problem), That is, I currently believe that these properties are actually essentially always true in practice. This is largely based on an assumption that all users provide about as much context about chains of diffs as we do in the upstream, although obviously that might not be the case, but I either haven't seen much discussion of it being a problem or have forgotten about it.

Improving this clarity is just an issue of priorities, not intent. My belief is that clarity is generally "good enough" today. Since we don't actually enforce the other part, making clarity perfect does nothing to discourage malicious actors, so the value of improving them is only in making accidents more difficult -- no level of clarity can actually establish a trustworthy chain of custody. My assumption is that these mistakes are fairly rare (I can't recall any in the upstream). This may not be true, but that's what's informing priority here.

Today, "not allowed to land" is also very weakly enforced (we attempt to issue some warnings in "arc land" which you can "Y" through; and you can write sanity check Herald rules to make sure the author at least claimed that the code was reviewed). Again, we can not build this in a trustworthy way that foils malicious actors today, so improving it only makes accidents more difficult. My believe is that the accident rate is very low (I can recall a handful in the upstream, but I think they were a long time ago, and might have been from before we introduced some of these checks).

Enforcing this is just an issue of priorities (and the high technical barrier to writing a functional, scalable, general-case "Merge This" button), not intent. I believe the major, primary value of "Merge This" is establishing a trustworthy chain of custody in the remote. Easier UI for merging is a distant secondary benefit, and I don't plan to make the button especially easy to use, at least by default (i.e., it won't let you break chain of custody unless there's huge demand for this and you explicitly configure it to be unsafe).

I intend to resist even implementing unsafe "Merge This" modes. We can make an unbroken chain of custody easier in other ways: for example, "arc diff" can warn you if your base commit's tree hash isn't present in the remote or any revision. This would warn users about "A > B > master" where "B" deletes unit tests before the change even made it off their machine. I generally think this is a better approach than letting "Merge This" break custody.

I think we agree on all the important things, and only differ on some of the terms;
Specifically:

  • We all want stricter control on what gets landed, and have auditable chain of custody
  • We agree that preventing arc diff just because the base commit isn't published isn't good
  • We agree that preventing arc land because we can't reconstruct the state the author had is reasonable
  • We agree on

if "A" is on "B", "B" must be reviewed before "A" can land. This must be a property of the system.

  • and we agree that the current DifferentialLandingStrategy is very hacky

I think we're not clear about the definition of "diff based" vs "state based" Differential.

eadler added a project: Restricted Project.Feb 22 2016, 4:57 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:20 PM