Page MenuHomePhabricator

arc revise: more collaborative code reviews (NOT FOR UPSTREAM)
Changes PlannedPublic

Authored by meitros on Sep 13 2015, 8:11 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Sep 5, 4:14 PM
Unknown Object (File)
Jul 28 2024, 11:50 AM
Unknown Object (File)
Jul 25 2024, 3:44 AM
Unknown Object (File)
Jul 16 2024, 3:30 AM
Unknown Object (File)
Jul 14 2024, 5:15 PM
Unknown Object (File)
Jul 14 2024, 5:14 PM
Unknown Object (File)
Jul 14 2024, 5:14 PM
Unknown Object (File)
Jul 13 2024, 1:29 AM
Tokens
"Like" token, awarded by 20after4.

Details

Reviewers
epriestley
Group Reviewers
Blessed Reviewers
Summary

As a hackathon/hackweek project, I wrote arc revise: a command that
automatically patches comments made on the differential web UI. I wanted to
send this to you to get some thoughts/feedback, but I'm not pushing for it
to be merged: I was just going to install it as a local extension and see how
well it works.

arc revise grabs all inline comments on the current differential revision, and
patches in any comment that starts with a code block (using the lint autofix
machinery).

  • It patches comments created by anyone. Supports multi-line comments
  • Right now, it patches comments on every diff, but it verifies that the original lines that were commented on are the same as the existing lines.
  • TODO: You can reply to a comment with :disagree to prevent it from being patched (picked an arbitrary syntax that didn't conflict with phabricator's !commands.) I may have to be smart about handling comments that aren't technically replies but are on the same line.
  • I need to better notify the user about what patches were dropped due to being outdated or disagreed.
  • It runs arc amend first as a child workflow.

IMO, this could be one of the larger shifts in the
diff-comment-revise-accept-land worflow that we've had since diffcamp. I'll
know more as people actually try it out, though.

P.S. Despite the disclaimer, its plausible to merge this upstream as a flag on
arc amend (e.g. arc amend --patch-comments), especially if its called out as
a prototype in help. I'm interested in first seeing how well this works in
practice, though.

Test Plan

Modified a few files in a local repo, ran arc revise. I'll test more
extensively after a round or two of feedback.

https://secure.phabricator.com/P1854

Event Timeline

meitros retitled this revision from to arc revise: more collaborative code reviews (NOT FOR UPSTREAM).
meitros updated this object.
meitros edited the test plan for this revision. (Show Details)
meitros added a reviewer: epriestley.
meitros added subscribers: jhurwitz, chad.

This sounds pretty cool, IMHO.

But I also write a lot of intern level code.

I think some kind of "step through the inlines workflow" is maybe-reasonable, but I worry that the diversity of editors / IDEs will make it hard to build a workflow that's very useful in general. I'm also hesitant about auto-patching out of inlines.

For "step through", maybe we could do something like git add -p:

$ arc revise
Inline 1 of 13:

 ... if (blah) {
 111   yada();
 112   yadb();
 113   yadc();
 ... }

(epriestley) I think we have to call yadc() before yadb()...

Actions: open in [e]ditor, [s]kip, mark [d]one, e[x]it, [r]eply, [u]ndo last action? e

This seems maybe-pluasible, at least, although we'd need to build out the API a bunch.

I'm a bit worried about encouraging reviewers to write code, because they can't test it, and I think authors are less likely to test it. I distinctly recall a case at Facebook where I wrote a snippet inline and the author copy/pasted it without re-testing and pushed, which brought down the target tier. Particularly for experienced-engineer-on-new-engineer review (this wasn't; the other engineer was also experienced), I worry that this kind of thing may be relatively common: the new engineer faces may reasonably just trust the code as written, especially if they can run a command to just swap it all out automatically.

At the root here, I'm sort of just not sure why some installs seem to have a huge number of inline comments and complex comment threads. Broadly, my expectations about inline comments are:

  • Changes made by experienced engineers generally have very little inline discussion.
  • Changes made by newer engineers have more inline discussion, but it is largely straightforward and mostly one-on-one.
  • Complex threading in inline comments should be very rare at all levels.
  • No reviewer ever writes production-ready code in an inline (even for completely new engineers, I'll almost always write pseudocode to emphasize the relevant parts of my point and that the code isn't production-ready).

My engineering environment for the last few years has been a fairly unique one, but has primarily adhered to this view of things. I recall this being my experience at Facebook, too.

This is at odds with some of the requests we receive and proposals for features like this, which don't make a huge amount of sense to me unless diffs are routinely collecting a ton of very substantive inline feedback.

Can you characterize your experience more qualitatively? What sort of environment are you developing in where reviewers routinely provide production-ready code inline?

Possible tl;dr: I think we have different perspectives on when this will be used - you added a git add -p proposal that makes this more heavyweight and substantiative (but also believe that this is just not the right approach), I want to make more lightweight diffs more convenient (lightweight diffs are probably a plurality, if not majority.)

Some of the uses I forsee (based on my experiences):

  • conveniently deleting stray slogs/console.logs/etc.
  • Rewriting an isolated block of code for clarity. In my experience this has often involved operations/transformations on lists (e.g. showing someone how to express something more clearly using array_pull, or any number of examples I've seen in javascript), but is generally about making code easier to understand or using the right idioms. I think its not quite inexperienced / experienced engineer as it is familiar/unfamiliar engineer.
  • Adding comments. Of all the various uses this one is my favorite (and possibly could justify this feature all by itself)...I want to be able to add, say, 2 line comments to code that I review. As the code reviewer reading this without bias, I'm sometimes better at writing comments than even the initial author.

An example of using this: if you were writing doorkeeper and I was the reviewer, I would have written two line comments for each of the classes so that someone who was later coming in to write their own custom feed worker would have been much faster at understanding the relationship between all of the doorkeeper classes.

Different IDEs/Editors:

Everyone still has to arc diff. I think the key here is that its not that every person has their own workflow, but that what they do varies depending on the diff. For any diff where the arc revise changes are similar to what I've described above, what IDE people use doesn't come into play.

This also is a bit of a pushback agaist the git add -p approach, as that seems more aimed at the case where the changes are heavier-weight.

Untestable review code:

This might be solved socially. I don't think either of us can really guess whether this will have the negative effect you describe - I'm not even sure I'd know until this has been tested for a few months.

Long inline comment threads:

I don't have that much more insight than you - our experiences are more similar than different (4.5 years at Facebook.) You could keep an eye on some of the major open source phabricator installs (wikimedia, ghc) to try to glean insight.

The contracting work I do is in an isolated little project, but the snippets I've seen and heard from other projects suggest that your model isn't complete. Its accurate based on my facebook experience, but things change. Sadly, I don't really have anything beyond this vague paragraph.

IMO, developing Phabricator is unique enough that I'd be leery of using it for insight on developer workflows.

Next steps:

Even if you became more enthusiastic about eventually landing this in upstream, its still worth seeing how people use it first. I'm almost definitely going to try this out for dropbox just to see what happens; can you carve out a portion of your feedback to focus on suggestions for what I should change/do before I ship it to them?

This seems like good idea at first sight, but gets dark pretty soon. I used to put code snippets in reviews, but the number of times guys simply copy pasted it and re-sent for review was staggering. I've even had intern that copied pseudocode believing it was C++ 😢

My experience says that this might be good idea to have inline comments available in code. For example: developer A has 9-5 workhours while developer B has 9-5 in different time zone, but developer B is reviewer. So. DevA creates arc diff and goes home. DevB reviews with couple inlines. DevA does arc commandX and his working copy is updated with inline comments, allowing him to start right-of without looking in phabricator. DevA makes appropriate changes, does arc diff that strips inline-comments from files andupdates diff for revision. This time DevB is A-OK with code. DevA does arc commandX and sees something like No new inlines, Your code is ready to land! or Some new inlines, but generally ready to land. arc land in that case would not strip inline comments (assuming those were TODO).

The problem with this is as in any with threading inlines. Plus this model could use arc diff to mark inline as done or disagree with inline etc 😉

That way, no code would be autopatched in, but be available within file, right where needed plus had comments etc.

lightweight diffs are probably a plurality, if not majority

This is interesting to me because it's not very common in my experience, and much of the feedback and requests we receive are motivated by heavy-sounding use cases (e.g., difficulty managing large amounts of inline feedback or huge diffs).

Offhand, the requests for better lightweight workflows that I can recall are all essentially represented in T6008, basically asking for "Edit from Web".

Would that conceivably be a better fix for you here, at least some of the time? How lightweight are these lightweight diffs?

conveniently deleting stray slogs/console.logs/etc.

Can you prevent these with lint?

(On its own, this approach also seems awkward as a solution for this, since you'd have to start your inline with an empty codeblock? But this is sovlable in theory, at least.)

Rewriting an isolated block of code for clarity.

This is the use case I mostly worry about being error prone, since it may not be tested and accountability is diffused by the workflow.

Adding comments

This use case makes sense to me and isn't at all concerning, although it doesn't resonate much with me as much as it sounds like it does with you. I also wonder if edit-from-web is maybe-better for it, in theory, at least some of the time?

(Generally, I think the reviewer should often just reject and say "Can you add some doc comments? I had a little trouble figuring out the relationship between the classes." and the engineering culture should ideally just calibrate to some appropriate bar based on this.)

what IDE people use doesn't come into play

Yeah, I was assuming the driving use case here was more like "I want to through my 300 inlines and address them", where the best solution I can come up with is IDE integration (i.e., some plugin which shows inlines in a window next to source). The git add -p flavored workflow is like a lowest-common-denominator of this which would half-work across a variety of editors.

An alternative is @johnny-bit's suggestion of dumping all the inline comments into the working copy as source code comments or some kind of annotated text in the source files, which has also been suggested in the past, but I think this approach is a huge error-prone mess and don't currently plan to pursue it.

Solving this particular problem ("How do I work thorough 300 inlines efficiently?") through IDE integrations seems best, and through a git add -p flavored workflow seems maybe-an-improvement, but my take on it is still broadly curiosity about the root cause ("Why do you have 300 inlines?").

Particularly, I suspect the reason is often "lots of substantially mechanical style and formatting feedback", and the far better answer is "lint".

But it generally sounds like this isn't your use case at all.

IMO, developing Phabricator is unique enough that I'd be leery of using it for insight on developer workflows.

Phabricator is certainly unique and I try not to assume that other projects face the same challenges we face, but I do want to make sure workflows we're supporting make sense given some reasonable set of challenges. (I also think our workflows are generally fairly good for the challenges we do face, and that they're usually not completely different from the challenges other teams face.)


can you carve out a portion of your feedback to focus on suggestions for what I should change/do before I ship it to them?

Sure. Here are my general thoughts, although I'm not sure how useful they are. They basically amount to this being legitimately difficult in a purely technical sense.

Were I to implement this feature, I think I'd make it modal in the UI: you'd press a modifier key or a button to go into "suggest" mode. In "suggest" mode, you get two boxes:

+----------------------------------------+
| Inline Suggestion                          | 
+----------------------------------------+
|   122    * @return true                |
|   123    */                            |
|       +-----------------------------+  |
|       | public function quack() {   |  |
|       |   yada();                   |  |
|       |   yadb();                   |  |
|       +-----------------------------+  |
|   127     yadc();                      |
|   128   }                              |
|                                        |
| +------------------------------------+ |
| | Swapped call order -- we have to   | |
| | call yadb() before yada() so...    | |
| |                                    | |
| +------------------------------------+ |
|                                        |
+----------------------------------------+
|                        [Cancel] [Save] |
+----------------------------------------+

The top box would show context around the selected lines and default to containing the selected text. The bottom box would let you provide a normal comment, explaining what you did. The context would help in general, but I'd particularly be trying to reduce specific errors related to eating blank lines when adding comments, making copy/paste errors, and getting indent levels wrong:

  • Blank lines: If you click a blank line and type a comment block, this will naively consume the blank line and leave all your code squished together. This UI might make it more clear that you need to start your comment block with a literal blank line if you don't want to do this. It's very possible this wouldn't be sufficient (I would expect users would often not add a leading blank line). Maybe an implicit rule where clicking a blank line always meant "insert after" would be OK, although I worry this would be confusing. Another possible rule would be to require the selection to start and end on non-blank lines, but this might be a confusing rule to communicate, understand, and use. The beginning and end of the file may also be tricky.
  • Copy/paste errors: if you're suggesting a small change, pre-filling the current code reduces the chance you get your starting place wrong by grabbing the wrong lines, missing whitespace, missing some text, etc.
  • Indent level errors: these will be very easy to make naively and are potentially severe in languages like Python. You make a reasonable effort to handle this already. Showing context may help reviewers align things properly. Your rule might be necessary anyway. If so, un-indenting is complicated to communicate. Per inline, there are other cases where this seems very hard to get right, like editing blank lines.

This UI would also make it clear how to delete text, like console.log(), etc.

This is probably mostly useless because you can't easily change the UI.

Perhaps more practically:

  • This attempts to apply the author's own inlines. That might be correct (if authors are using this feature to implement "edit from web"), but might be very surprising if they are not.
  • This attempts to apply "Done" inlines. It probably should not. If an author has marked an inline as done, presumably they do not expect it to apply because they believe they have addressed it.
  • If there is a thread of inlines with patches, I believe the conflict resolution behavior you'll get out of lint is that the first patch is applied and the others are ignored. This is probably not the behavior you want, as it makes it difficult to correct patches by replying with adjustments. You probably have little control over this while using lint to do patches, although you might be able to iterate through the inlines in reverse order to get "last patch wins" (the iteration might already occur in this order).
  • This will attempt to apply every inline from every diff. This will probably often produce reasonable behavior because of the "text must match" rule, but probably won't make sense sometimes. Applying inlines from only the latest diff probably isn't any better.

Broadly, I worry this will be hard for reviewers and authors to use in non-trivial cases.

I think reviewers will have a hard time producing accurate patches: technical difficulty with whitespace, blank lines and copy/paste, and fundamental difficulty with writing production-ready code in a web text box without being able to test it.

Authors will have a hard time keeping track of what has and hasn't been applied, anticipating and understanding the apply behavior, reviewing what has been applied, and handling this workflow across multiple changes.

I would anticipate these specific failures as things to watch out for:

  • Reviewers will frequently write patches which do not apply as intended, most often with whitespace errors.
  • Reviewers will somewhat frequently write patches which are not actually production ready (e.g., contain actual errors).
  • Authors will somewhat frequently push revised changes which contain whitespace errors because they have difficulty reviewing the patches and it's easier not to.
  • Authors will occasionally push revised changes which contain actual errors because the code looks good and is just a style rewrite (or whatever).
  • When a revision has several patches and one fails to apply, authors will frequently overlook it.
  • When a revision has patches, is updated, and has more patches, authors will frequently have difficulty understanding how to proceed and fall back to manually applying patches.
  • When a comment thread forms around a patch, authors and reviewers will have difficulty anticipating the effects of the command. Authors will somewhat frequently apply the wrong patch or fall back to handling this case manually.
  • No matter which rule you choose for author inlines (apply or ignore), authors will somewhat frequently be confused by the behavior and expect the other behavior.
differential_extension/DifferentialGetInlineCommentsConduitAPIMethod.php
33–43

You can omit the $revision_id check and use executeOne() to return a single $revision to slightly simplify this.

76

The lines in $file will not correspond to the lines of the file if the changeset contains multiple hunks, or the first hunk starts on a non-initial line.

For example, if someone copy/pastes a diff from git diff into the web UI, the diff may contain lines 10-20 and lines 200-210.

In this case, there will be 20 lines in $file and they will correspond to line numbers 10, 11, 12, ... 209, 210. When you array_slice() them later, you'll run off the end of the array or get some otherwise meaningless result.

You can ignore this (and assume all diffs come from arc diff). You can detect it (but not handle it) by testing if the changeset contains multiple hunks or the single hunk starts on a line other than 1 (then you could, e.g., throw an exception). You can handle it by iterating through the hunks. There may be some helper code somewhere (in Differential, we build a block of text where all the lines correspond and the missing lines are empty) that you can copy/paste if you can dig it up, but I'm 99% sure we don't expose at an API level in a way that's usable to you.

78–85

I think this will misbehave on "\r\n" files. Using phutil_split_lines() may improve behavior.

95–96

As a minor conceptual issue, this "type" will vary based on the viewer's language preference, so no caller should do anything but display it to the user. I think existing APIs aren't necessarily great about this, but I'd intend to cue this kind of thing better in the future in cases where it isn't obvious, e.g. call it changeTypeDisplayName or something less verbose.

100

Minor, but repliedPHID is probably more clear than repliedID when returning a PHID. Many objects have both IDs and PHIDs.

src/workflow/ArcanistReviseWorkflow.php
9

Extraneous backtick.

54–57

I don't immediately understand why you do this step? I can't think of anything that needs to be amended before we can patch.

I don't recall what arc amend does in working copies with history.immutable set, but my guess would be "exit with an error".

211

Wrong behavior for ``` inside a line, I think?

219

I think this will also do silly things if a user replaces a blank line with a comment: the indent level for the block will be 0 (no text on a blank line) so the comment will be inserted aligned with the left margin. This is probably not desirable in most cases.

Can you prevent these with lint?

Yes. But in my experience lint is not often set up well, for whatever reason (maybe linters are too hard/annoying for people to configure or write?) Items in this list are based in my experience, so whether or not projects should have done something better, they didn't. People also sometimes have test code (e.g. $some_variable = 'fixed_value';) that isn't caught by lint.

Also, this seems quite similar to me saying "won't unit tests catch it?" as a counterargument to people blindly amending code without testing it.

Edit from web?

I think something along those lines could solve many of the same problems, although each feature will have a different impact. I'll admit that I probably wouldn't have written this prototype if we had edit from web.

the reviewer should often just reject and say "Can you add some doc comments? I had a little trouble figuring out the relationship between the classes

In that situation, I think that the "right" thing to do is harder / more heavy weight than the easy thing to do (I'd probably be a little unhappy but still accept a diff), whereas a revise workflow makes it very easy to just write some solid comments yourself.

I also like the (maybe intangible) sense of collaboration...you send some code out, and your reviewer reads and understands your code well enough to write the comments that explain it to other people.

feedback

I appreciate this a lot...its very well organized and perceptive about the things to watch out for and/or alleviate via ui changes as I finish writing this and send it out for others to try.

As you said I won't be able to change the UI that much (I guess I could, but I don't really want to write the local patches), but that's ok for now...I think if you give people something simple, they naturally get more sophisticated in how they use it. e.g. technically you can't insert lines right now, but people will work around it by including an existing line in the replacement text.

I'll leave this diff up in secure.phabrictor for a little bit in case anyone else wants to comment, but my next steps are to revise my code and try it out in our local install.