Page MenuHomePhabricator

Phabricator audit queue: show which files are triggering the audit
Closed, ResolvedPublic

Description

From an engineer who has been auditing a package:

Usually audits are triggered by revisions that touch many files, only a few of which are in the package that triggered the audit. The auditor presumably wants to give those files special attention, but there is no good way to find them.

For many revisions, this makes auditing extraordinarily difficult. Can you give me any kind of estimate for when this might be done?

Revisions and Commits

Event Timeline

So, specifically, what's happening here is:

  • The user, say "Joe", is an auditor for some package, let's say "Chat".
  • Since Joe is an auditor for the Chat package, he isn't some random engineer; he must be somewhat familiar with its code -- probably an experienced Chat engineer who has spent a lot of time in the Chat codebase. This will be important later!
  • Joe visits Phabricator's home page and sees that he has some pending audits for Chat.
  • Joe clicks on one, and goes to the Diffusion commit interface for that commit.
  • Joe sees a yellow "Audit Required" row for the Chat package.
  • Joe scrolls down to the list of files.
  • Joe is like "OMG WHICH ONES OF THESE ARE CHAT?!?!?! I HAVE NO IDEA WHAT IS HAPPENING SO CONFUSED~~ PLEASE FIX IMMEDIATELY EXTRAORDINARILY UNUSABLE!"

While we could probably improve this, I'm not quite sure I understand the motivation -- surely an engineer who is auditing code is familiar enough with it to know which files they should be looking at?

Say there are 200 files in the revision, 40 directories in the Chat package, and about 5 of the 200 files in the revision are in the chat package. It's kind of a drag to examine each of the 200 paths. I have an audit in my queue right now with 2207 files changed.

It would be much easier with one of the following features or something similar:
—The files in a selected package were highlighed.
—The diffs for each file included somewhere the list of packages that the file was in. Then auditors could search for the name of their package(s).
—A button or something that would collapse/hide all diffs not in the selected package.

As an example, consider: https://secure.phabricator.com/rPd0af617818b6f56bcf90a99e34b0be925bbaad02 — imagine that was in your audit queue because you were the owner of a huge package containing 5 of the files in that diff. The 5 files don't show up together in the commit detail, they are each from different directories in your package. How can you do this without considering each of the 182 files in the diff?

tomovo triaged this task as High priority.May 29 2012, 11:20 AM
tomovo raised the priority of this task from High to Needs Triage.

Jason and I were discussing this, and he suggested another solution that seems like it might be less intrusive—if commit detail links took a query parameter like https://secure.phabricator.com/rPd0af617818b6f56bcf90a99e34b0be925bbaad02?package_id=12, causing only diffs from the package specified to be shown, that would also make things a lot easier.

I think this idea is perfectly reasonable, I just wanted to get a sense of where the disconnect between the language of the report ("many revisions .. extraordinarily difficult") and my subjective experience (that this is a rare issue, and inconvenient but not a huge problem) was for prioritization. It sounds like some of it is in poetic license, and some of it is in the elevated complexity of everything at Facebook (for instance, it's absurd to imagine a package with 40 directories in Phabricator, because the whole codebase is only 200,000 lines -- and for rPd0af617818b6f56bcf90a99e34b0be925bbaad02, I could easily scan to "src/applications/chat/" and identify changed files because there aren't really other places they can be).

For the actual implementation, here are a couple of other suggestions:

  • We know which package audits the viewer is responsible for, so show a bullet or marker next to each file in at least one of those packages. Maybe add a filter to show only those diffs, or highlight them in the diff list view, or add a key command to jump to the next file.
  • Maybe have a "Flat / Group by Package" option in the "Changes" table. I think this would be helpful in the case that you're auditing several packages. However, this would give you a nice summary view (you can scroll through the files and easily see what's affected) but once you jump down to diffs you'd lose some of that context. Maybe this would still be good in combination with open-in-new-window or something.

Adding some flavor of package filtering to the UI is probably easiest, but adds some per-package overhead if you're responsible for 2..N packages, since you have to change the filter every time. These solutions are less direct but add less overhead for each package.

These ideas sound great. For solutions involving marking each file, it would be nice if the mark was searchable by the browser. When there are thousands of files in a diff, scrolling around looking for a color or image marker won't be much fun.

There may be some poetic license, pardon, but most of the diffs I have to audit have several hundred files. I can only audit them by making a list of words to search for; fortunately I recently realized that most of the directories in my package contain one of a very short list of words. If that weren't the case, this would be infeasible. This feature really does not scale generally for large "packages" in large codebases. The (relatively few) diffs in my queue with thousands of files are quite difficult for me to audit, despite the ability to search for words. Part of this is simply that my browser goes so slow when show_all=true, so searching through hundreds of matches for my words ends up taking a really long time.

vrana changed file(s), attached 0: ; detached 0: .Jul 16 2012, 10:08 PM
chad changed the visibility from "All Users" to "Public (No Login Required)".Jul 3 2015, 3:55 AM
eadler added a project: Restricted Project.Jan 9 2016, 12:52 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
epriestley claimed this task.

I think this was effectively resolved by T8004 / D13923.