This is a general meta-task which discusses building data/reporting systems. Example requests include:
- Build more reporting for (Some Application).
- Build a smarter algorithmic sort for (Some List of Stuff).
- Build (some visualization) for showing (something, often the codebase).
- Have users annotate (stuff) so we can get (more data).
Broadly, we take a cautious approach in implementing data systems, especially when the proposal requires users to do work which only serves the data.
The overarching concern with data systems is that it's very easy to build huge, complex, expensive systems for analyzing and reporting data without actually getting much value out of them: essentially, data for its own sake. With some proposals, this data comes at the additional cost of requiring users to do more work in order to gather it. Data needs to drive action, not just look pretty. If a proposed report won't actually drive action, it's very difficult to prioritize.
How to Propose a Data/Reporting Feature in a Compelling Way
The best way to motivate a report is to make a strong argument that it has concrete value. Valuable reports should be able to answer this question:
- With this data, what decisions will you make differently than without it?
- (Or, with this data, what decisions will you be able to make much more quickly than without it?)
When proposing a report, you can motivate it by answering these questions as concretely as possible. Two good ways to answer these questions are:
- point to a similar, successful system which produces analogous data and describe the value of its reporting in concrete, specific terms; or
- manually generate an example report and point out the value it provides.
In particular, these are not compelling arguments for motivating prioritization of data/reporting features:
- The data "seems" interesting. / It looks cool.
- I believe it might be useful, but can't support that claim.
Examples
For example, an occasional proposal is to require users to tag each change as a "Feature" or "Bug", so we can do various kinds of analysis (which systems have the most bugs, which employees create the most bugs, etc). This sounds like it might be a good idea, but it's also not clearly or obviously valuable. Will the data really be actionable? A good way to motivate and support the value of a feature like this is to generate a manual report:
- Take a selection of 100-200 changes.
- Categorize them manually in Excel or a similar system.
- How long does this take? Is the categorization unambiguous? This can help you estimate the cost and accuracy of data collection. For example, many changes may not clearly be either a "Feature" or a "Bug", but both or neither or something else entirely. If this is too much work, or difficult to do accurately, is it reasonable to expect users to do it?
- Use some hacky JOINs or whatever to build some reports that you think might be useful.
- Are they actually useful? Do they give you new, actionable insight that causes you to make different decisions? Does the value of these insights clearly overwhelm the cost of generating them?
An example of this kind of analysis in Phabricator itself is T3611, which proposes sorting tasks by interest or activity. This sounds like it might be a good idea, but we can be much more confident that it's actually a good idea by spending a few minutes mocking it up against real data. In T3611, as of the time of this writing, we generated several "activity" lists based on various algorithms, but none of them seemed very promising or useful.
An example of highly actionable data which is unambiguously valuable is the profiler integration in Phabricator's DarkConsole. Having access to profiling data makes many otherwise-difficult optimizations easy, and essentially every commit in Phabricator's history which starts with "improve performance of..." is possible only because we have a profiler.
The Way Forward
We will continue to build on and improve Phabricator's reporting capabilities (e.g., see T1562), but this is generally a lower priority area with less focus than building new features and applications. Almost all data/reporting request we receive are entirely speculative ("it would be cool if..."), and have no concrete use cases behind them. Other products in this space (like GitHub) often offer only limited levels of reporting, and much of it rarely if ever drives actions. A number of codebase visualization tools were built at Facebook, but few provided any substantive, lasting value. Generally, this area lacks strong motivating forces to make it clear that it's more important than other things we could be working on.