Page MenuHomePhabricator

Building reporting and data systems
Closed, ResolvedPublic

Description

This is a general meta-task which discusses building data/reporting systems. Example requests include:

  • Build more reporting for (Some Application).
  • Build a smarter algorithmic sort for (Some List of Stuff).
  • Build (some visualization) for showing (something, often the codebase).
  • Have users annotate (stuff) so we can get (more data).

Broadly, we take a cautious approach in implementing data systems, especially when the proposal requires users to do work which only serves the data.

The overarching concern with data systems is that it's very easy to build huge, complex, expensive systems for analyzing and reporting data without actually getting much value out of them: essentially, data for its own sake. With some proposals, this data comes at the additional cost of requiring users to do more work in order to gather it. Data needs to drive action, not just look pretty. If a proposed report won't actually drive action, it's very difficult to prioritize.

How to Propose a Data/Reporting Feature in a Compelling Way

The best way to motivate a report is to make a strong argument that it has concrete value. Valuable reports should be able to answer this question:

  • With this data, what decisions will you make differently than without it?
  • (Or, with this data, what decisions will you be able to make much more quickly than without it?)

When proposing a report, you can motivate it by answering these questions as concretely as possible. Two good ways to answer these questions are:

  1. point to a similar, successful system which produces analogous data and describe the value of its reporting in concrete, specific terms; or
  2. manually generate an example report and point out the value it provides.

In particular, these are not compelling arguments for motivating prioritization of data/reporting features:

  • The data "seems" interesting. / It looks cool.
  • I believe it might be useful, but can't support that claim.

Examples

For example, an occasional proposal is to require users to tag each change as a "Feature" or "Bug", so we can do various kinds of analysis (which systems have the most bugs, which employees create the most bugs, etc). This sounds like it might be a good idea, but it's also not clearly or obviously valuable. Will the data really be actionable? A good way to motivate and support the value of a feature like this is to generate a manual report:

  1. Take a selection of 100-200 changes.
  2. Categorize them manually in Excel or a similar system.
    • How long does this take? Is the categorization unambiguous? This can help you estimate the cost and accuracy of data collection. For example, many changes may not clearly be either a "Feature" or a "Bug", but both or neither or something else entirely. If this is too much work, or difficult to do accurately, is it reasonable to expect users to do it?
  3. Use some hacky JOINs or whatever to build some reports that you think might be useful.
  4. Are they actually useful? Do they give you new, actionable insight that causes you to make different decisions? Does the value of these insights clearly overwhelm the cost of generating them?

An example of this kind of analysis in Phabricator itself is T3611, which proposes sorting tasks by interest or activity. This sounds like it might be a good idea, but we can be much more confident that it's actually a good idea by spending a few minutes mocking it up against real data. In T3611, as of the time of this writing, we generated several "activity" lists based on various algorithms, but none of them seemed very promising or useful.

An example of highly actionable data which is unambiguously valuable is the profiler integration in Phabricator's DarkConsole. Having access to profiling data makes many otherwise-difficult optimizations easy, and essentially every commit in Phabricator's history which starts with "improve performance of..." is possible only because we have a profiler.

The Way Forward

We will continue to build on and improve Phabricator's reporting capabilities (e.g., see T1562), but this is generally a lower priority area with less focus than building new features and applications. Almost all data/reporting request we receive are entirely speculative ("it would be cool if..."), and have no concrete use cases behind them. Other products in this space (like GitHub) often offer only limited levels of reporting, and much of it rarely if ever drives actions. A number of codebase visualization tools were built at Facebook, but few provided any substantive, lasting value. Generally, this area lacks strong motivating forces to make it clear that it's more important than other things we could be working on.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
qgil moved this task from Feedback welcome to Important on the Wikimedia board.May 10 2014, 6:29 AM

It would be excellent to see queue-based tools like cumulative flow diagrams and Little's Law modelers.

A CFD of tasks would provide a solid visualization of both demand and capacity over time, as well as the queue size over time. Right now the burnup report only shows queue size over time, but it's impossible to tell if e.g. an increase in queue size is the result of an increase in the rate of arrival of new tasks or a decrease in in the rate of tasks being handled. If you're managing a project, that's a critical distinction to make.

Little's Law modelers is a bit more of a pony, but right now it's essentially impossible to do any modeling myself, because I can't get access to any of the parameters:

  • Average amount of time tasks spend unassigned or not otherwise in progress (i.e., enqueued)
  • Average number of tasks in the queue
  • Average rate of task completion

(These would be useful both for the system as a whole and also for various subqueues inside the system, such as "waiting for review" or "assigned to Mary".)

Getting access to these parameters enables all sorts of modeling, like "if the team has 40 tasks pending and does 10 a week, it will take an average of a month to complete a task", figuring out how much of cycle time is queue-time vs. value-added etc. etc.

Yeah -- the burnup chart is mostly a joke, I didn't realize it was sort of a real thing when I wrote it and thought I was being oh so witty and clever. On this install, I primarily use it to conclude that Phabricator is getting much buggier over time. We actually had what I think was approximately a CFD a long time ago (i.e., just a plot of closure rate under the open rate):

In general, what kind of actions do you use this data to motivate? For example:

  • Does this data provide novel insight? Particularly, I would expect project-wide data to rarely be surprising (that is, someone managing a project probably has an approximately accurate gut feeling about arrival rate and queue time most of the time). Is this not true in your experience? Or is this data insightful mostly in the details (e.g., most insight arises from examining subqueues like "Jake's queue")?
  • Generally, what kind of actions do you want this data to drive? For example, how would you react differently to an increase in arrival rate vs a decrease in task rate? What sort of things might cause a decrease in task rate that wouldn't be indicated by other sources (e.g., if Jake goes on vacation for a couple of weeks you'd expect a decrease in task rate, but presumably these are not "who is on vacation?" graphs, and presumably there is no action to be taken in that case)? How would you identify and remedy them? These charts seem like they may be able to identify some kinds of systematic problems, but I'd guess they're not effective at identifying root causes and driving resolution? Or are they?
  • I currently believe tasks are extremely nonuniform in many software projects (essentially, they are often unstable systems with numerous extreme tasks that will, e.g., be open for years), and worry it may impact the utility of applying high-theory approaches. That is, a class of approach is more attractive to us if it works well for most projects than if it works very well for highly disciplined or highly structured projects, but poorly for less structured projects. What's your experience with this? Do you structure work so that tasks are relatively uniform? (How structured do things have to be for this kind of analysis to be useful?) Do high-theory approaches successfully deal with systems that are characterized by frequent outliers and general chaos and upheaval?
  • Tasks are also nonuniform in terms of how much work they represent. How do you normalize between situations where tasks are completed normally versus, e.g., broken into tiny pieces? In the case of Little's law, this will create an apparent increase in throughput, right? Do you just assume it all comes out in the wash? Require estimates and/or measurement of the size of tasks?

(Feel free to spend as little time as you want responding to that, we can probably find reasonable answers to these kinds of questions on our own when we get closer to building things.)

The big question I have overall is just: what specific actions or decisions are undermined by not having this data? If you managed two identical projects, and one was supported by infinite/perfect data while the other required you to scrounge around in the mud like Phabricator currently does, how would the courses of those projects differ?

This question might seem dumb/obvious, but I don't personally have experience in a work environment where this kind of data clearly drives meaningful actions, so I'm not sure what those actions are. I imagine they might be things like:

  • Improving the accuracy of timelines communicated to clients (or internally).
  • Identifying slow queues (Jake's queue) and reassigning work (to Molly's queue). Requires accurate predictions about task size? Is work really reassignable often enough for this to be valuable?
  • Identifying underperforming queues (Jake's queue) and fixing them (how? pep talks? firing? Where do you look next? When does this switch from a data/analysis problem to a human intelligence problem?).
  • Justifying things at the manager/executive level, e.g. "we need more resources on team X, and here's the data to prove it".
  • Justifying things at the HR level, e.g. "Jake is underperforming, and here's the data to prove it".
  • Justifying things at the technical level, e.g. "team Y is way more efficient than we are, we should go figure out why -- here's the data proving it".
  • General confidence / awareness / charts-are-cool-ness: having data may not drive many decisions on its own, but reinforces "gut" decisions and sometimes adjusts them slightly when they're inaccurate?

But I'm not sure which of these use cases are actually important or even real.

Just to get it out of the way, I think trying to derive any person-specific data is a terrible idea. Variance in individual performance is overwhelmingly determined by variance in the work. It seems like my off-hand "assigned to Mary" got read as a much more significant statement than it was. My apologies for the lack of clarity. I think the virtue here is being able to analyze the system as a whole; micro-managing individuals against performance targets is obviously poison.

You've asked some great questions, but it's going to be a lot easier for me to refer you to an existing book rather than write part of one here myself. I'm currently re-reading Principles Of Product Development Flow, which is piquing my interest about our Phab data. I'd recommend it.

Sounds good, thanks!

Charts don't need to provide unique insights/drive decisions in order to be useful.

Obviously, if you're in a project, you rarely need to look at charts/metrics in order to have an idea of where the project is at. In my world, burnup charts and other visualizations are more for the benefit of non-project members than the people in the project. But as such, they serve an important purpose. Dealing with internal stakeholders (or external clients as well, I suppose), it is very useful to be able to provide them with diagrams that succinctly summarizes the progress (or non-progress), resource usage and so on of a project.

Not everyone who is interested in following a project's progress will be a developer, or be willing to wade through the details of the tasks that make up the project.

A side-benefit of having prominently visible burndown charts, incidentally, is that it motivates developers to break down tasks into smaller work units, rather than just leaving them as big, multi-week tasks. I've been on a few teams where people did the latter. The burndown chart works against that behavior, because it looks bad when the chart flatlines.

timor added a subscriber: timor.Aug 7 2014, 2:01 PM
robla added a subscriber: robla.Aug 20 2014, 4:45 PM

If you needed a real use-case now we have one: T6041: Metrics for Maniphest. Your feedback is welcome.

codahale removed a subscriber: codahale.Sep 5 2014, 4:52 PM
svemir added a subscriber: svemir.Oct 3 2014, 12:08 AM

In the industry I work in, we require documentation surrounding our code quality, specifically documented code reviews/audits (currently this is all done on paper). These tasks regarding reports/metrics/data seem like it would be a good place to mention our use case in hopes that it's considered during design & development.

Essentially for a given code review/audit, we need

  • Selected code review/audit - summary, list of files, changeset, etc., but not the actual contents/diffs.
  • All discussions/comments
  • Resulting new tasks

Compounding on that, the ability to create reports for all reviews/audits for a project would be handy, though not something we're actively looking for.

This would be largely beneficial to us not only for quality and documentation requirements, but as well for doing post-mortem analysis of a projects/feature work.

ksmith added a subscriber: ksmith.May 4 2015, 4:19 PM
eadler added a subscriber: eadler.
eadler moved this task from Backlog to Nice To Have on the FreeBSD board.Jun 14 2015, 4:39 AM
srijan added a subscriber: srijan.Jul 19 2015, 12:22 PM
kwk added a subscriber: kwk.
aik099 removed a subscriber: aik099.Sep 17 2015, 10:22 AM
ox added a subscriber: ox.Nov 6 2015, 7:54 PM
eadler added a project: Restricted Project.Jan 9 2016, 12:46 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

As cspeckmim, I'm working in a company where all code is safety related and must be certified. We'd like to have a report on the code review activity. At present we have to submit a lot of screenshots to demonstrate we performed code review using Phabricator...

Some "nice to have"s:

  • Bug tagging: identify with a tag the kind of bug, e.g.:
  1. General Unit Testing
  2. Comment and Coding Conventions
  3. Error Handling
  4. Resource Leaks
  5. Thread Safety
  6. Control Structures
  7. Performance
  8. Functionality
  9. Security
  10. Safety

Report could range from a minimalistic list of all code comments with a final sentence about approval/not-approval of the submitted code to a complete report showing some statistic information.

This could be a good example of reporting function.

@dserafin - For something in the meanwhile, this is what I do to get reports:

  1. Added custom field(s) to Maniphest Tasks to classify task type - see Configuring Custom Fields
  2. Install extension for exporting Maniphest Task search results to Excel spreadsheet
    • This requires PHPexcel to be installed on the machine, and for your php.ini to be configured to load that library.
    • The extension can be downloaded here (use at your own risk) - this adds some columns to the output to include some custom fields -- not all field types have been tested/implemented. If the task is on a project workboard, it will also list the workboard columns it's on. The code is fairly straightforward so you should be able to modify as-needed.
  3. From web interface do a search query for tasks, then at bottom select Export to Excel, then select the 'Default with Custom Fields' option.

It's minimal and only gives a view into tasks, not on code review. If you need to I imagine it should be fairly easy to add a column which lists the revisions associated with a task.

ablekh added a subscriber: ablekh.Feb 7 2016, 1:06 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 18 2016, 6:32 PM
urzds added a subscriber: urzds.Apr 20 2016, 2:23 PM
fanis added a subscriber: fanis.Apr 25 2016, 9:00 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:09 PM
cmmata added a subscriber: cmmata.Jul 8 2016, 11:49 AM
lyahdav added a subscriber: lyahdav.EditedSep 17 2016, 11:57 PM

For anyone trying to get Kanban statistics / reports in the meantime, you can try this project I recently started working on:
https://github.com/lyahdav/analytics-limn-analytics-data/tree/kanban_stats. It generates a CFD (Cumulative Flow Diagram) and computes cycle time and lead time.

@lyahdav : That looks pretty cool. It's not quite clear to me where it runs. Is this a script that one would install on the same box that phab itself is running on?

@ksmith it's a Python script that you would run on your local machine. If you have arcanist setup for development it should be pretty simple to get it working. Ideally it would just be a plugin, but I didn't look into that approach yet.

rachel added a subscriber: rachel.Mar 28 2017, 10:42 PM
son.to added a subscriber: son.to.Apr 19 2017, 9:05 AM
rbalik added a subscriber: rbalik.Nov 21 2017, 8:33 PM
ivo added a subscriber: ivo.Mar 1 2018, 4:36 PM
epriestley closed this task as Resolved.Apr 30 2019, 8:57 PM
epriestley claimed this task.

This is approximately coming into existence and this task no longer serves much of a purpose. See T13083 / T13279 for followups.