Page MenuHomePhabricator

Provide a workflow for "auditing a codebase" via Nuance or some new tool
Open, WishlistPublic

Assigned To
None
Authored By
epriestley
Jan 27 2014, 5:27 PM
Referenced Files
None
Tokens
"Like" token, awarded by siepkes."Like" token, awarded by ralph.van.etten."Like" token, awarded by cspeckmim.

Description

An occasional request is roughly:

I have an existing codebase, and want to audit the whole thing. How can I do this / create an audit for every commit?

I'm not sure this is really ever an especially valuable thing to do, but we don't have any answer right now, beyond "you probably don't actually want to do that". One issue with this approach is that it's almost certainly not the best one, even if auditing is desirable: a file-oriented approach is much better than a commit-oreinted approach, because you have no way to tell if a bug you spot was fixed later when auditing commit-by-commit.

Facebook had a tool for doing manual file operations against the codebase ("Mochi", I think?), where it would make a small task for every file and had some tools to track progress, assign files, mark them completed, etc. This was used to translate the codebase, and I think a few other times for other tasks. My sense was that it was not hugely popular/effective, but did seem like a reasonable solution to the i18n issue, at least.

Building a similar tool might make sense, and it could be used to express "audit a codebase" (this audit wouldn't be very good, but it would make most of the people who want this happy). It might also make sense to try to plug most of this workflow into Nuance, since the actual mechanical work (human processing of a big queue of stuff, locking items, etc.) is a good fit.

This request is rare and I question how useful it ever really is. It seems like the kind of thing that sounds good on paper but is probably not ever actually effective. The number of hours required to seriously audit a codebase in a comprehensive way is just enormous, and I think looking at a codebase commit-by-commit or file-by-file isn't an effective way to audit it comprehensively: it's fine for stuff like i18n, where the task was "find all untranslated strings and mark them for translation", but the most important things to catch in audit are large ideas which span across commits and files (e.g., "the security model makes unsafe assumptions in its core, which are violated at the edges"). I would guess very few engineers can identify broad problems like that by examining a codebase file-by-file.

Event Timeline

epriestley raised the priority of this task from to Wishlist.
epriestley updated the task description. (Show Details)
epriestley added a project: Diffusion.
epriestley added subscribers: epriestley, btrahan, asherkin.

T5722 mentions a use case for this related to regulatory compliance (roughly, period audit if segments of a codebase). This is similar to the Mochi/i18n use cases.

T5744 mentions another narrower and more reasonable use case (get specific badness which matches some pattern fixed), which is similar to the Mochi/i18n stuff.