Page MenuHomePhabricator

Add another accept state for code review
Closed, WontfixPublic

Description

Motivation

Code review efficiency is essential for increasing Dropbox engineering productivity. The current code review process can be updated to formalize and encourage (when appropriate) an existing best practice already in use today at Dropbox.

Current State

Currently, when reviewing a diff, the reviewer has 2 options: Accept and Request changes. While these options indeed work in many cases, often enough, the general idea as reflected in the diff is correct, but there are some changes to be made. The reviewer trusts the author to implement the changes correctly without reviewing it another time, realizing such extra review will be neither necessary or productive.

Current Workarounds

In that case, the current 2 options [Accept, Reject] do not fit, and the reviewer is often tempted to use a ‘conditional’ Accept. Meaning the diff is accepted provided that the author will make those changes. This has the drawback that the author could accidentally miss the reviewers comment and land the diff as is.

Another workaround involves working with remote offices. The reviewer prefer not to block the author, so they resign as a reviewer and suggest someone from the author’s office as reviewer. This has two big issues. First, the commit message will not have the original reviewer on it, so if there’s a prod issue, or someone is just going through git log for context, they wouldn’t see who the actual reviewer was. Second, the diff doesn’t appear in the author’s queue but in the new reviewer’s queue. And the next action should be taken by the author.

It’s debatable, but it seems that the above behavior described in those workarounds is a good practice, and might be the preferred solution to the problem presented above. This is due to reducing non-productive iterations from the review process.

Proposed Solution

This situation above occurs often enough, and if we want to encourage and formalize this practice, then it would be useful to integrate this solution into the Phabricator tool. This can be implemented as a new state ‘conditional accept’ that is considered to be an accepted state by the arc diff tool. As an extra precaution from unintentional mistakes, arc land can prohibit landing a diff with conditional accept until a new change was introduced, or until all inline comments are marked as done, or both.

Benefits

While this proposal has the potential to speed up the review and development process in general, it can have an even greater impact, when working with remote offices in opposite timezones. When each iteration on the diff can take a day, it can save an entire day in each diff (or even 2 days on a weekend).

Additionally, such approach amplifies a sense of empowerment, ownership, and autonomy of the author, all of which are desired outcomes.

Event Timeline

dadi updated the task description. (Show Details)
dadi added a project: Restricted Project.
dadi added subscribers: jhurwitz, angie, dadi.

The concern is having alternate states of approval makes Phabricator significantly harder to use and understand. The current state right now is very clear. Either the code is safe to land or it isn't. If you have to place a conditional, it's not safe to land. I'd also be concerned that this is being sold as an efficiency update, but that comes clearly at the cost of code quality, which is another concern.

This has been suggested before, so I'll try to dig up the previous discussion, unless @epriestley gets it first.

Another workaround involves working with remote offices. The reviewer prefer not to block the author, so they resign as a reviewer and suggest someone from the author’s office as reviewer. This has two big issues. First, the commit message will not have the original reviewer on it, so if there’s a prod issue, or someone is just going through git log for context, they wouldn’t see who the actual reviewer was. Second, the diff doesn’t appear in the author’s queue but in the new reviewer’s queue. And the next action should be taken by the author.

I don't quite understand this -- if Reviewer A does not review a change, resigns, and adds Reviewer B, who actually reviews the change, isn't it correct to list "Reviewers: B"? And isn't the next action after this resign + add step for Reviewer B to review the change?

Or is Reviewer A actually reviewing the change, leaving a comment like "This looks good except for minor problems M and N, I'm deputizing Reviewer B to act on my behalf for the next 12 hours.", then swapping reviewers?


In the upstream, we handle this "conditional accept" by just accepting and trusting that the author will read and address comments and inlines before landing changes, which they almost always do. Although things do occasionally slip through because an author missed them, these issues are very rare (for example, I missed some inlines in D13662) and often addressable in a followup diff (in this case, D14049 later fixed the missed inlines). When this happens it does feel like the tool could do a better job of catching it, but, in practice for this project, the error rate is very low (I'd guess maybe 1 in 200 revisions?) and the cost of these errors is very small (since the reviewer wouldn't accept if the inline/feedback was important).

This use case has come up occasionally in the past too, but my general impression is that other installs have a roughly similar experience: it would be nice to prevent these, but the error rate is only slightly above the level of background noise.

Are you seeing a much higher error rate, or higher error cost, or both? Or is the perceived risk/cost here just greater? I just want to make sure I understand why this seems to be a more serious problem in practice for your install than it is for the upstream, or than I believe it to be for other installs.

One possibility is that maybe reviewers are accepting (or want to accept) changes with much riskier feedback/inlines than in most projects because of the timezone differences? For example, in the upstream we'll generally "accept" changes with very minor feedback (things like typos in docs or minor rendering issues), but "reject" changes with logical or behavioral errors. That is, the bar for "accept" is roughly "it would be fine to land this as-is, but it can be improved".

In your use case, is the bar different (or are reviewers trying to use a different bar)? Say, to something like "this should not land as-is, but can be made landable with straightforward changes"?

Generally, I anticipate making this workflow safer to have two effects:

  1. Authors make fewer mistakes by missing feedback.
  2. More code lands without review because reviewers accept more hypothetical changes.

The first effect is positive, but the second effect is negative. In moving forward here, I want to make sure we aren't reducing (1) a tiny amount and increasing (2) a large amount, because this could actually be worse on the balance even if there are fewer errors related to missing feedback.


Approaches

Some possible approaches here:

  • Add an "Accept After Next Update" action. I think this is roughly your suggestion. This would be a conditional accept that becomes a real accept on the next revision update, regardless of update content. This is just making sure authors don't miss feedback by mistake. This could get messy with multiple reviewers, since it doesn't ensure that each reviewer's feedback is dealt with. This could also encourage reviewers to pre-accept a wider range of changes, leading to more unreviewed code.
  • Add an "Accept In Theory" action. This would be an unconditional accept which says "the big picture here is fine, but someone else needs to do final signoff on the actual code" and clears any reject/block you're responsible for but does not count for moving the revision into "Accepted". Another reviewer would need to perform the effective "Accept". This might handle multiple reviewers better (the onus is now at least partially on the acceptor-of-record to make sure everyone's feedback is dealt with) and limit the cost of pre-accepting changes (the acceptor-of-record ultimately reviews them). The major downside is that it's not useful in 1-on-1 review.
  • Add both "Accept After Next Update" and "Accept In Theory" actions. This is more flexible but also more complicated, both in implementation and in UI (I think it will sometimes be overwhelming for new users to scroll down for the first time and see that they have many different flavors of "Accept" with no obvious distinction between them: unless you've used Differential before, you almost certainly need to read the documentation to guess what these quasi-accept actions would do).
  • Prompt users for comments since last update in arc land. This would be something like this:
This revision has comments since the last update:

(epriestley) Before you land this, can you double-check that the behavior is correct if the file is missing? Looks good otherwise.

Continue landing? [Y/n]
  • Prompt users for unmarked inlines in arc land. This would be something like this:
This revision has 2 inlines not marked "Done":

A.c:19 (alincoln) This is slightly cleaner as `T *`.
B.h:22 (alincoln) Typo in docs.

Land anyway, without addressing these inlines? [y/N]

I'm slightly hesitant to implement these prompts:

The "comments since update" prompt will have a very high false positive rate: when a reviewer accepts with comments, we can't tell if the comments are "this looks good, but could be improved with a minor change" or "this is great exactly as-is with no changes" without adding an explicit signal. If we do add "Accept In A Complex Way" actions, it also mostly or entirely moots this prompt.

The "unmarked inlines" prompt shows an incomplete picture (missing the actual comments) and I'm hesitant to nudge authors toward too much required/suggested interaction with the "done" feature. Philosophically, I think the scope of thinking about revisions should generally expand as you gain experience, and experienced reviewers should be approaching review focused mostly on the big picture questions ("is this a good change to make?", "is this the best way to make this change?"), and I want the product to nudge less experienced users into developing this kind of thinking (and not toward developing laser-focused line-by-line thinking).

There are a class of changes available to us which encourage very granular thinking, and frame review in terms of line-by-line inspection for essentially mechanical errors. Some examples of these features include partial review, checkmarks on individual files to mark them as reviewed, progress bars, general line-by-line or file-by-file tracking, etc.

Although this granular review is valuable, I believe it is much less valuable (thousands or tens of thousands times less valuable) than big-picture review (and, with an experienced team, some of it can be delegated to lint or prevented through good API design anyway). The most valuable feedback an author can receive is roughly "Do not pursue this change, there are fundamental problems in the approach.", and I want someone to catch that stuff as early and often as possible. I generally want to avoid product changes which make review more procedural and mindless, or teach less experienced engineers that review is about looking at each line in order until you've looked at all the lines.

This isn't to say that we'll never have any of these features, and "Done" certainly steps in this direction, but this is a high-level product direction concern I'm balancing in thinking about features in this vein. The current design of "Done" intentionally allows reviewers of all experience levels to ignore it if they don't feel they need it, and optionally use it to help work through a large amount of feedback. Although I think there's more work to do with "Done", I'm generally fairly happy with it as an "optional assistance for large feedback volumes" feature, and imagine it being used more heavily by new or less experienced engineers but staying out of the way for more experienced engineers.

Adding a prompt to arc land makes "Done" more obligatory, and pushes the workflow in a "check all the boxes" direction, particularly by teaching less experienced authors that they are expected to check all the boxes. At least for now, I'm very hesitant to have the product set this expectation.

If an "unmarked inlines" prompt was otherwise a complete solution, I would probably not be too concerned about this (the imposition on experienced authors is small, and we could maybe find a way to write the prompt that limited how much we were teaching authors that they need to check all the boxes), but it's not very complete: it would produce a mixture of false positives (addressed or irrelevant inlines) and false negatives (change requests in comments).


Upstream Mood

Of these approaches, I think "Accept In Theory" is the least problematic, but also the least powerful. I worry that "Accept Next Update" may lead to more unreviewed code and have a complexity cost higher than its utility, but could imagine implementing it.

The arc land flows I'm more hesitant about. I think these aren't complete enough (too many false positives and false negatives) and will often get in the way and prevent few errors.


Implementation Costs and Technical Pathway

Adding new actions is partially or fully blocked by T731, which is a large task. I'd also like to have a clearer path forward on the "bucketing" query UIs (some discussion in T1279, T9263, probably root cause of T9311) before making state more complex. We might be able to cheat our way past some of T731, but I've become increasingly convinced that the reviewer relationship should move away from edges to a separate table first (see T8996#133764) which we can't cheat through.

The arc land flows need access to comment text and are primarily blocked by T5873, which is a large task. They would be fairly straightforward otherwise, but there are also some likely followups (respond to comments from CLI, manage inlines from CLI) which I think are pretty low value and am not excited about pursuing in the near term.


Problem Fit

I'm not totally sure I understand your "remote office" case, but it sounds like you might actually want "Accept Next Update, In Theory"? That is: push this back to the author for changes, after changes clear any reject/block, if there are no other timezone-local reviewers I'll maybe add Reviewer B (in your timezone) to verify your update, then once they accept you're good to go with both me and Reviewer B as reviewers of record? Are people adding Reviewer B because authors are often sloppy and frequently miss changes (so Reviewer B would not be required if we had "Accept Next Update"), or because they have substantive feedback that they want to make sure is addressed, but not so substantive that they need to look at it themselves?

This remote office problem is also not a problem that the tool can ever solve completely: it will aways be more difficult to coordinate work with, say, 8-12 hours of latency than with 8-12 minutes of latency, since we can never reduce the number of operations that require round trips to 0 (if we could, it wouldn't be "blocking pre-publish review", and you can already perform fully asynchronous post-publish review in Audit).

We can take some steps to mitigate the cost here, there are limits on how much we can do if the organization is structured to often require high-latency round trips (e.g., engineers in Brazil and Turkey were randomly assigned to engineering teams; or everything must be reviewed by the Legal Compliance team in Cuba and the Security team in Laos; or engineers just don't know who the lowest-latency reviewers available to them are).

Before we spend too much time searching for technical solutions, how much of this problem would you estimate is a necessary cost of having a global team, versus an unnecessary cost that could theoretically be eliminated through better organization and planning in an ideal world? Obviously we can't just magically reorganize everything, but if this cost is, say, 80% organizational, it may make less sense to try to drive technical solutions for it (as they will never have much effect) than if it's, say, 10% organizational.

For example, I could imagine a hypothetical large company that acquires other companies globally but takes a very hands-off approach to managing them. I would expect this company might face very high ongoing coordination costs if it internally looks more like many small companies with slightly different workflows, cultures, and even timezones. The benefits of this approach might be huge and could greatly outweigh the costs, but I would expect high coordination costs to be fundamental to this kind of organization and for it to not be realistic to gain much ground on them technically.

Within the scope of what we can do, eliminating a round trip might be much harder than making a remote round trip local, and they have approximately the same impact on end-to-end review time (the cost reduction is not really 100% for a local round trip, but somewhere in that ballpark).

If we tackle this problem from the point of view of trying to trying to localize round trips instead of eliminate them, are there different approaches we might take? I'm not really sure what the organizational forces you're facing are that create remote round-trips -- could any of them have technical remedies?

For example, one problem might be authors working in a new part of the codebase not knowing how to select local reviewers. A remedy might be to make changes to arc cover (as per T2443) but also show reviewers ordered by timezone offset relative to the author. I don't know if this is a problem you actually face, but we're generally open to at least exploring approaches like this which might route around the latency problem.

Thanks for the feedback guys.

I don't quite understand this -- if Reviewer A does not review a change, resigns, and adds Reviewer B, who actually reviews the change, isn't it correct to list "Reviewers: B"? And isn't the next action after this resign + add step for Reviewer B to review the change?

Or is Reviewer A actually reviewing the change, leaving a comment like "This looks good except for minor problems M and N, I'm deputizing Reviewer B to act on my behalf for the next 12 hours.", then swapping reviewers?


In this case is the latter, meaning that the diff is fully reviewed, and the next action is for the author to do some changes in which reviewer B can accept. We want to leave reviewer A on the diff since he is also responsible for the changes in the diff.

As for the intent in remote offices: Its really a pain and we want the tools to be more flexible and help us be more productive. We are working in an environment where round trip is very very costly (in the beginning of the week in can be 2 full work days before getting feedback). So we basically need a button in the tool that is just saying:
"The approach of the diff looks good. There are some things that needed change, however either i trust the author to do those changes on their own or I trust some local reviewer to further review the changes. Either way, I'm done with this review, and I won't be necessary anymore before landing it"

In terms of code quality, I agree that this could lead to more unreviewable amount of code. But since this code was made under specific instruction of a reviewer, it somewhat reduce the risk of crucial errors. And maybe since we are working with remote office we are willing to take this kind of chance since on the other side lies heavily - productivity.

Now if the "Accept After Next Update" is not suitable in the general case, maybe we can have a configuration per diff that allow showing this field, so it would be visible only in cases that are problematic from the beginning (like remote offices)

Either way, we want to have the tool help us with our problem rather than limit us. Thanks again for the feedback.

epriestley claimed this task.

Closing this out since there's nothing actionable here.

angie moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Dec 11 2015, 8:22 PM