Page MenuHomePhabricator

Make it easier to know why herald adds reviewers
Closed, WontfixPublic

Description

At Dropbox, we have so many herald rules, that figuring out why herald did something is a huge pain - our herald transcripts are extremely long (on the order of 10 screen lengths) and mostly consist of rules that were not matched. And not all actions herald takes are equally important (it's fairly normal for herald to cc a mailing list based on which repository is being edited, which is not interesting); on the other hand, we also have herald rules that add blocking reviewers - and it's way more important to know why a blocking review was added than most things that herald had done. This is such an annoyance for some of my coworkers, that one of them wrote a chrome extension just to parse our herald transcript for the reason that a security review was added: https://chrome.google.com/webstore/detail/heraldsplainer/djkffgblckgfgjknbgcjnbffoocnhbcj

The current process for figuring out why herald did what it did looks like this:

  1. for a diff with lots of back and forth, expand all the old comments to find the link to the herald transcript
  2. on the herald transcript... it tells you what actions it took, and what rule numbers (e.g. H101), but not what those rules are. Scroll down and hope to find the PASS rules in the sea of FAILED. Or better yet, ctrl+f for "passed".
  3. Hope the rules are well-named so you can tell which action corresponded to which rule, and why it matched.

It would be helpful if matched herald rules could be surfaced more prominently for why blocking reviews are added - ideally like the chrome extension does, just below the test plan of the diff (or something functionally similar, that isn't prone to be buried in the comment stream). This way it's easy to see, at a glance, why your review (or some group of reviewers) was added to the diff.

Such an improvement could probably be generalized to any reviewers added by herald, with no loss of usefulness. Having all herald matches surfaced like this, though, would be noisy, because some herald rules match a large portion of diffs but aren't very important to the author or any reviewers / subscribers (like emailing code review mailing lists).

Event Timeline

dgoldstein raised the priority of this task from to Needs Triage.
dgoldstein updated the task description. (Show Details)
dgoldstein added a project: Herald.
dgoldstein added a subscriber: dgoldstein.

This is the most awesome feature request I've ever read. Well stated, sir.

Do you think the majority of these blocking reviewer rules are good, or is figuring out why blocking reviewers were added more-than-occasionally the first step in removing them? If these rules added normal reviewers -- instead of blocking reviewers -- would this be a problem?

My product expectation is that "blocking reviewers" will be used very rarely, so it's surprising to hear that use is so common that it's hard to figure out where blocking reviewers came from. Offhand, I would expect <5% of revisions in a "normal" environment to trigger blocking reviewers.

One product change which might be appropriate is to restrict who can create "blocking reviewer" rules if they're prone to abuse in the wild, although ideally this should be self-regulated / cultural. But, philosophically, the goal of all the tooling on top of this pipeline is to put the onus on reviewers to review promptly if they want to have a say in changes, not let them add rules to block everything.

So, to step back, do you think the use of blocking reviewers is currently appropriate/desirable in nearly all cases and the best solution is definitely to improve the tooling, or is the use of blocking reviewers sometimes questionable and improving the tooling is just a mitigating step?

Or, put another way: what actions do you take based on this information? That is, once you figure out why blocking reviewers were added, what do you do next?


I also get this error when following the Chrome extension link:

Screen_Shot_2015-03-18_at_3.27.43_AM.png (1×1 px, 97 KB)

In particular, at Facebook:

  • We didn't have "Blocking Reviewers".
  • I built, then removed "Add Reviewer" because I felt it was being used way too often. Revisions would often end up with 4-5 random reviewers and the sense of who was responsible for review would be lost. I think this was a beneficial change on the balance.

So, at least when I left in 2011, you could only add yourself as a subscriber to things you were interested in, not a reviewer. This worked well, even at Facebook's scale, and I think we were better off for the removal of the "Add Reviewer" feature.

We later restored "Add Reviewer" because users frequently requested it or found it confusing that it wasn't available, and didn't find my old war story about questionable use very compelling, and I got tired of telling it. I think this feature might make the product worse on the balance, but giving installs enough rope to shoot themselves in the foot makes the product much easier to support because no one finds "one time I saw it get misused so I'm pretty sure no one can handle it" to be a very convincing argument.

costanzapopcorn

Definitely lots that can be done to improve the ui here, but I'm most excited to learn what you do once you figure out why a given set of reviewers was added. (...I hope there's drama sometimes!)

So I guess I should explain more about our use case.

As far as I am aware, we only have 2 groups of blocking reviewers (we used to have a 3rd for schema changes, but that's been replaced by an automated schema validator). They are Security and Availability. The latter is pretty straightforward - generally triggers on nginx config changes, which aren't terribly common. Security, however, has a ton of herald rules to catch changes to security-sensitive files and review uses of otherwise dangerous functions. I believe the fact that we have a security review process is part of our SOC2 compliance, so it isn't going anywhere soon They are making efforts to limit what requires security reviews (e.g. by providing always safe abstractions in place of unsafe functions whose use needs to be reviewed, and better lint rules to warn about e.g. xss vectors), but fundamentally security reviews aren't going to stop being a thing anytime soon.

The heraldsplainer chrome extension was the Security team's hacky way of making security review easier on themselves. A lot of things trigger security review that are just uses of security abstractions - like using a certain unsafe function for redirects, or fetching our csrf token. Sometimes security reviews are triggered for spurious reasons, like mentioning a security-sensitive function in a comment (which arguably shouldn't trigger security review, but herald isn't that smart), or rearranging existing imports. In most cases, it's not really reasonable for the security reviewer to read a whole 500 line diff if there's only 1 line that herald thinks is important to them, so it's helpful if they can very quickly identify why herald flagged it and just focus on that. Since a lot of things trigger security review, the security reviewers need to be able to go through outstanding reviews quickly to ensure timely responses for everyone.

Heraldsplainer isn't really the best solution, though, as not everyone is going to install it, and security engineers aren't the only ones who want to know why security is a blocking reviewer on a review. Having this info natively in phabricator would be preferred.

It sounds like lint is a much better fit for this problem than Herald (for example, it won't raise warnings in comments, and it points out exactly where issues are, engineers get a chance to see issues before they get sent for review, etc) -- are there reasons it isn't that I'm missing, or reasons that moving these review rules to lint rules has been slow or impossible?

there are plenty of cases where we do want a security review instead of
just lint warnings - it's definitely not as simple as converting everything
to lint rules. And part of the problem is that lint isn't expressive
enough - and in some cases can't possibly be.

Here's one such example: we have an unsafe_redirect function (which
triggers security review) which can redirect to almost anywhere (the
preferred option is to use redirect, which checks that the input url has
a domain on our safe list). The general advice is that unsafe_redirect
should be given a constant url, or at least a url with a constant domain.
But not all use cases can use constant domains, and security definitely
wants to review these before they are committed.

Suppose it worked like this:

  • Lint flagged the line with a warning, like "(SECURITY) This unsafe_redirect call may be unsafe, see [some documentation] to learn more about redirects.".
  • Herald had a rule like "When any lint message text contains (SECURITY), add blocking reviewers: #security".

Then you'd still be a blocking reviewer, but all the security stuff would be in the "Lint warnings" at the top of the page, and you could click to jump to the exact lines where the issues occur. The lines would also be annotated inline by the lint messages, too, so you could quickly scroll through the diff and see where the problems are.

Setting aside the complexity of actually moving the rules to lint, would that be a better product interaction? Just want to make sure I understand the problem.

that would be nice (maybe even ideal) if lint was actually a reliable thing
for us.

Context: I've seen various issues in the past where linters aren't
installed properly and end up silently not running. And since we don't run
them serverside but only on dev vms, and can't guarantee vms all have the
latest linters... I'm not sure this is really a workable solution. Though
I should probably ask our dev vm guys what they think about it.

Overall, we can definitely improve the transcript view, but it will never be an especially good solution to this problem:

  • Herald can not realistically do real parsing, so it will always misfire for comments and can not implement complex rules (for example, in this project we have some security rules which require specific parameters to be statically evaluable scalar strings, but you can't really do this with a regular expression).
  • Knowing which rules fired is helpful, but it won't pinpoint where in the diff the problems are.

We'd rather tackle the lint problems directly, e.g.:

  • Provide better ways to make sure the right linters run.
  • Provide any infrastructure required to have linters make sure they're up to date.
  • Server-side lint will eventually be available through Harbormaster (T1049), although it is inherently more limited than client-side lint because it can not emit patches to automatically fix code. But, for the security use case, this would ensure that lint rules run.

Some changes to Herald which would make your use case better (like being very verbose about which rules matched, by default, in a highly visible way on the revision itself) would make more common use cases worse (by adding a lot of information which is usually low-signal in a prominent place), but improving linters makes things better for everyone.

One hacky workaround might be to create a project for each type of security issue and have the rules both "Add Blocking Reviewers" and "Add Project". This would let you examine the projects to see tags like "Security: unsafe_redirect", which would give you approximately as much information as knowing which rule matched. There's currently no Herald action for this, but it is approximately described in T6973 and not difficult to implement. The roughness of this solution would also only impact your install, rather than creating a UI tradeoff.

We're meeting at Dropbox next week -- I'm not sure if you're involved, but we can try to follow up on this. We expect to have more ability to prioritize requests after that.

hey! So I am in the security team at Dropbox. Would you have time to chat with some of us when you are at Dropbox next week? A lot of this might be easiest to figure out IRL.

Michelle Sordal owns the agenda / schedule. Any dropbox folks should shoot her a note to get included on the action...! :D

+1000 to improving serverside lint support. I have no problem with "can't
auto-change code" as the few uses of that we have have drawn some
complaints.

As far as meeting with people here, Dev is the right person for you to talk
to - I was raising this as a feature request mostly as a "we have a chrome
extension for what?" complaint. And also because Herald logs are fairly
hard to read.

angie added a project: Restricted Project.May 20 2015, 5:22 PM

I really enjoyed reading this comment thread! :)

I think reliance on lint for Security enforcement is probably blocked on T5064. Otherwise, someone could accidentally add an unsafe method after the code is accepted, but before they arc land, and lint would never catch it.

epriestley claimed this task.

I think T8405 and this are both basically discussing the same use case, which I understand to be:

  1. a very large number of lint-like rules implemented as Herald rules with regular expressions that match changes to source code;
  2. difficulty understanding and managing these rules because Herald doesn't really support this use case;
  3. a desire to treat these lint failures somewhat differently than normal lint failures in some cases (e.g., interact with revision acceptance state); and
  4. this situation came into being largely because of an environment where client-side lint is unreliable for various reasons, and impractical to make reliable in any reasonable timeframe (see also T9937).

Per discussion earlier/elsewhere, this isn't a use case we want to support in the upstream. Some tools you might be able to use instead:

Server-Side Lint: You can run server-side lint through Harbormaster, and report lint messages through harbormaster.sendmessage, since D13380 (June 2015), and arc itself has used this mechanism to do lint reporting since prior to that. In the future, arc lint will gain support for the --target flag (like arc unit currently has) to simplify this in some cases, although doing the call manually is straightforward (see the harbormaster.sendmessage API page for details and examples). Whatever is doing the reporting could make additional concurrent calls to differential.createcomment to submit actions (e.g., request changes). After T9132 comes to Differential (near-ish future?) a new differential.edit endpoint will provide additional power. I believe you don't currently use Harbormaster or staging areas in your CI pipeline, but T9529 discusses some glue which does work in your environment. These reporting mechanisms will work regardless of whether Harbormaster is truly driving the pipeline or not.

Herald Actions: Since T8957 (circa August 2015) you can write a custom Herald action as an extension, which runs in the "Differential Diff" phase (or "Revision Update" phase, or both) and does whatever you want (e.g., shells out to external programs). In the "Diff" phase, you can completely block writes and reject diffs with a custom message. In the "Revision" phase, you could apply state changes, add reviewers, leave comments, add inlines, report lint messages, etc. The major downside to this approach is that it will make creating diffs or updating revisions slower by however costly your ruleset is.

Custom Fields: You can write a custom field which examines revision contents and emits arbitrary markup onto the page. This is less flexible than alternatives and will (by default) run every time the revision is viewed, likely at a sharp performance penalty. While you could reduce this cost with caching, custom fields might be more useful in conjunction with Herald actions which set some flags (e.g., "this revision triggered badness X"), which could cause the fields to render additional large flashing animated GIFs and siren alert sounds or whatever.

Herald transcripts are also much more detailed now (after T8957) but I suspect the changes aren't very much help in this use case.

Minimally, you could reimplement all your regular expressions as a single custom HeraldField, and have that field emit a rich message into the transcript. This would be identical to the "lots of Herald rules" flow, except:

  • you'd have one rule;
  • the "rule body" would be in PHP (or some other script that PHP would call out to, at some overhead cost), although if you really want to keep writing regular expressions via the web UI you could have the rule do something like "Load the contents of paste P1234, then execute each line as though it was a regular expression in a Herald rule" and edit that "magic paste" to make updates;
  • users clicking through to the transcript would get an arbitrarily detailed message emitted by the rule.

sweet. I am sure we can work with those. I think Herald Actions in itself should work pretty easily in our existing workflow. I am very excited about server-side lint, but a bit more documentation about them would help me out. I can't understand it fully with your comments :)

Briefly, T9529 discusses some server-side system running on revisions that's live in your environment, presumably. Its current behavior is something like this:

  • Be told about a new revision (by "Changes", I think)?
  • Run some-coverage-program on the code.
  • Upload the coverage results with harbormaster.sendmessage (in unit messages).

Just add two more steps at the end (or copy/paste the whole thing or whatever, I have no idea what the system actually looks like):

  • After uploading coverage, run some-lint-program on the code.
  • Upload the lint results with harbormaster.sendmessage (in lint messages).

The harbormaster.sendmessage API page has detailed examples of how to format lint messages for the API call. arc itself also uses this API call in arc lint.

angie moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Dec 11 2015, 8:26 PM