Page MenuHomePhabricator

Differential Checklist Feature Request
Closed, WontfixPublic

Description

Often when reviewing code in a Differential, I am checking for a few common items every time to make sure the code is acceptable. It would be extremely helpful to have a way to define a repeatable checklist of items that could be embedded within the Differential in order to mark each as Pass/Fail. This would remind me as a reviewer the basics of what I need to check each time and could allow my whole team to standardize what everyone is looking for in each code review. Automated linters can be used to check for some things, but there are always checks that are very difficult to automate within a linter and human eyes need to determine if the code is ok. Having a way to include a checklist as part of the Differential itself would be very handy!

Event Timeline

Can you give us some examples of the kinds of things you're checking for? Specifically, things where rote human review is the best solution to implementing a check?

My general expectation is that these cases are better served with unit testing, lint rules, commit hooks, or by fixing APIs so they're harder/impossible to get wrong, or the things being checked for are so grand and high-level that a checklist isn't helpful (architecture, performance).

Here are a few off the top of my head:

  • Is the right data structure used for the job - e.g. std::map vs std::unordered_map, both will work and pass unit tests, but won't perform the same at scale
  • Are comments clear and descriptive
  • Is the actual use case implemented per design
  • Did the code get updated in every place it needs to be (dependent libraries, database schema, wiki pages, etc)
  • Did the code follow the code patterns like throwing exceptions instead of using return codes

And many more...

These things seem very general. I would expect them to be, e.g., on a wiki page like "Guidelines for Effective Code Review at <My Organization>". What advantage are you expecting to gain by building these into Differential as a checklist?

For example, two of the things you list are "Are comments clear and descriptive?" and (approximately) "Does the change do what it's supposed to do?". Surely no reviewer who is properly trained and taking their role seriously can legitimately need a reminder about these things? If a reviewer reads a confusing, unclear comment and doesn't object to it, I can't imagine that's because they didn't have a checklist of 15 things they should be looking for that includes "comments should be useful" and "code should do what it is supposed to do".

What sort of mindset do you imagine reviewers are in if they would ignore an unclear comment without a checklist, but flag it with a checklist?

Broadly, this seems like a training/leadership issue to me, not a technical/tooling issue. Do you already have this checklist in some form available to all engineers as onboarding documentation (e.g., "guidelines for effective code review", "how to handle errors in <our codebase>", "choosing between datastructure X and Y", etc)?

Adding a data point.

At my company previous code reviews consisted of filling out several checksheets after a group-meeting style review. These checklist items were our coding guidelines which are a mix of lint-able rules as well as higher-level ones (such as "Do comments explain Why instead of What?"). Each development team required approx. one of these reviews a month for which the signed checklist is archived into QMS for audits.

We fortunately have updated our policies to defer all code reviews being in our Phabricator instance which is overwhelmingly seen as A Good Thing. Looking past documentation policies one of the functions of our code review which we have yet to distinctly recover is that the old-style reviews both communicated and exposed new developers to our guidelines. Right now this happens through more-senior developers reviewing code and the communication of our guidelines becomes implicit. Every developer is trained on where our official document is and are required to review it however having an actual checklist provides a more tangible learning experience -- and can take the bite out of "another policy document" by working with the team.

This is not a problem per se (specifically not at all even in the domain of Phabricator) though more an observation with this change. I personally think the experience of the old style code review provided some benefits over per-commit style -- largely related to a fun team-building experience as well as humanized exposure to our checklist. My personal approach is to force upon my team at least one old-style review as new developers join, as appropriate.

In my situation the only need for per-review checklist would be strictly for policy reasons at this point. Luckily I was able to work with QMS and convince them this new way has benefits miles ahead of where we were, and we updated our policies accordingly. Having a handy KB/wiki to refer to would be convenient but I don't want to bog down reviewers with having to check off items from a hefty list for every review -- especially since most of our reviews have multiple reviewers.

A couple of general notes/thoughts:

We'll probably build a "Manual QA" application at some point where the primary object is some kind of checklist. This is far away (it doesn't even have a name yet; see T9652), but might be part of an approach here. There are some processes and use cases where this kind of workflow seems valuable on its own, although "review checklists" aren't currently one I'm really thinking of. But this application might be able to provide support for a review checklist workflow, or offer a more useful mode for presenting a review training document, or be helpful as part of a secondary "in-depth review" process.


Earlier in Differential's life, at Facebook, some checklist-esque product features were added. Specifically, in addition to "Summary" and "Test Plan", revisions once had several additional fields (like "Database Impact", "Cache Impact", and "API Impact") and a several additional checks on the detail page (e.g., reminders to test RTL if you touched any CSS). These weren't an explicit checklist and were more aimed at the author than the reviewer, but they served some of the same purpose (e.g., present a list of reminders of things to look out for).

I think these features were net negatives: they were always present and got in the way, making the workflow more heavy-weight, but rarely (or maybe even never) helped anyone catch issues. Because they usually were not relevant, they sort of trained users to ignore them.

This isn't to say that this necessarily holds in all environments, but at least my personal experience with adding more process to the review workflow at Facebook was a negative one.


One concern I have which I've discussed elsewhere and which heavily shapes Differential product decisions is that I don't want the tool to teach or encourage a rote or mechanical approach to code review.

I believe the most important feedback that can come from review is high-level feedback in the vein of "don't make this change" where the reviewer is reacting to the big picture and context. I think the value of a comment like "I think this can cause a race condition" or "how can we actually deploy this without poisoning caches?" or "we might have built the wrong thing here", arising from stepping back and thinking about the change as a whole and the context that surrounds it, is far more valuable (thousands and thousands and thousands of times more valuable) than any comment about an individual block of code.

I want the tool to support and encourage this kind of thinking about review, and worry that there are some features we could build which would discourage it by framing code review as a mechanical, line-by-line process where you inspect one line at a time and are done when you've looked at every line and no lines contain errors. In my mind, this isn't code review. Every line in a change can be error-free and the change can still be a catastrophic mistake.

Some examples of these "rote, mechanical review" features are:

  • some other tools literally provide progress bars showing which files/lines you've looked at;
  • GitHub recently added a feature to filter reviews by filename, touting the ability to look at only Ruby code in a change and ignore (!!!) changes to HTML and CSS.

These sorts of features are things I'm generally resistant to pursue, and checklists seem a lot like a feature in this vein to me. Even if it's possible to write a "good" checklist that focuses on the right things, I suspect the presence of a checklist feature would encourage a lot of very mechanical, lint-like checklists to spring into existence.

Sometimes we have to give installs enough rope to shoot themselves in the foot, but if the best use cases for a feature are only marginal and the worst use cases are toxic, that makes me resistant to pursuing it.


Finally, Differential is trying to provide a solid baseline workflow which most changes and discussion can flow through, not an exhaustive workflow which is well-suited to every class of change.

I don't expect Phabricator flows to cover every case, and alternate flows like the in-person one @cspeckmim describes make perfect sense to me to use to supplement Differential, particularly as onboarding or mentorship tools.

Another case where Differential may not be a great fit is high-level design/architecture review, where there's only some skeletal prototype that's really serving as a vessel to communicate a larger idea, or maybe the only code that exists is some hints at the outline of an API. I can recall a few cases at Facebook where just getting people in a room to talk about the technical pathway forward was more fruitful than Differential review would have been.

(A special case where Differential probably isn't the right tool is purely technical: huge changes that won't receive detailed human review, like adding an entire open source project to a repository. Creating a revision of 300,000 files is likely pointless since only the metadata about the change will be reviewed.)

Alternate flows may still use Differential to actually present the code, but layer a flow other than asynchronous, remote, individual review on top of it (by getting everyone in a room, or having the author and reviewer sit down and go over the code together in detail, or whatever else).

epriestley claimed this task.

I'm going to close this since I don't think there's anything we plan to act on here.

I do think we are likely to build some kind of checklist-esque application (see T9652) eventually, but do not have a timeline for this and suspect it won't happen in 2016. Such an application would integrate with other applications, but may or may not solve any of the problems discussed here.

Funny enough there were discussions among the engineering leads on Friday about having this type of functionality in Phabricator being useful. It wasn't a main point of discussion but several did look to Phabricator for being able to solve some concerns. The primary concern was around having accountability for specific quality points which may not necessarily be clear from the revision alone. Without Phabricator we approach this pretty much as I had mentioned above (training materials and KB/wiki with a checklist to be used with some pre-commit unit tests to help with some items that can be automatically checked), however we work in a well-regulated industry where checklists/documents as evidence are often highly valued. I think for the time being if we decide to move forward with this in Phab we may include a "template comment" to be filled out and used when accepting revisions.