Page MenuHomePhabricator

Build Charting for Facts
Open, NormalPublic

Assigned To
None
Authored By
epriestley
Apr 16 2019, 5:41 PM
Referenced Files
F6868194: Screen Shot 2019-09-18 at 10.44.34 AM.png
Sep 18 2019, 5:55 PM
F6862351: Screen Shot 2019-09-16 at 10.41.29 AM.png
Sep 16 2019, 5:44 PM
F6862353: Screen Shot 2019-09-16 at 10.41.38 AM.png
Sep 16 2019, 5:44 PM
F6607442: capture.png
Jul 19 2019, 3:32 PM
F6412690: Screen Shot 2019-04-30 at 8.17.32 AM.png
Apr 30 2019, 3:18 PM
F6412688: Screen Shot 2019-04-30 at 8.17.45 AM.png
Apr 30 2019, 3:18 PM
F6382347: Screen Shot 2019-04-17 at 6.02.47 AM.png
Apr 17 2019, 1:03 PM
F6382349: Screen Shot 2019-04-17 at 6.03.14 AM.png
Apr 17 2019, 1:03 PM
Tokens
"Orange Medal" token, awarded by Krinkle.

Description

The next major step for Facts is better charting support.

I'm currently planning to follow the Query/SearchEngine model, where charts are addressed by hash (so you can quickly edit/share them) and can optionally be named/saved.

I'm not yet entirely clear on how chart "types" will work. Essentially all current charts are based on time-series facts where it makes sense for the X-axis to be "time" and the Y-axis to be "value" (often "count"). At least for now, perhaps this is left to the API, but it would be good to develop a clearer picture of this.

Charts should also support a widget and maybe some kind of {chart XYZ} quick reference markup.


Continuity Across Axes: For functions that are continuous between discrete datapoints (like "accumulate(...)"), if there are points on either side of the domain limits (e.g., a point at x = -1 and a point at x = 1), we should interpolate a point on the axis. That is, here, if we draw only points in the domain, we get this:

Screen Shot 2019-04-17 at 6.02.47 AM.png (861×976 px, 156 KB)

We'd like something more like this, but without the glitchy rendering:

Screen Shot 2019-04-17 at 6.03.14 AM.png (861×976 px, 156 KB)

Revisions and Commits

rP Phabricator
Abandoned
D20879
D20822
D20819
D20818
D20817
D20816
D20814
D20506
D20504
D20503
D20502
D20501
D20500
D20498
D20497
D20496
D20495
D20489
D20487
D20486
D20485
D20483
D20446
D20445
D20449
D20444
D20443
D20442
D20441
D20440
D20439
D20438

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
epriestley triaged this task as Normal priority.Apr 16 2019, 5:41 PM

The concrete use cases that are currently filed are:

Multiple Project Burnups Stacked in an Area Chart: (T12403) We're close to being able to produce this today, although configuring the chart would be very complicated, and I'm not entirely sure which layer of abstraction the "stacked area" part should go into. For one project, you could draw the part below the axis like this:

[sum
  [accumulate [fact tasks.open-count.create.project <phid>]]
  [accumulate [fact tasks.open-count.status.project <phid>]]
  [accumulate [fact tasks.open-count.assign.project <phid>]]]
[scale -1]

...and the part above the axis like this, perhaps, provided we write window(min, max) (which is f(x) => x if min <= x < max; null otherwise):

[window 2019-01-01 2019-04-01]
[accumulate [fact tasks.open-count.status.project <phid>]]
[scale -1]

For 2+ projects, it would be possible to write each series as a very complex function, but it's probably easier to put the "stacking" part in the display logic.

If we're shipping this as an upstream chart, I think it either needs to be heavily templated or we need a lot of templating support so that you can copy/paste a template from somewhere on the internet and get a usable chart-building-thing out of it.

Forecasting: (T12459) This is a big can of worms that I don't really want to touch any time soon. Maybe we write a linear-fit(f) function which throws a line of best fit on some data and let installs write optimistic-forecast(f) by copy/pasting it. I suspect no function here provides much real insight, projects all become highly non-linear toward the end, etc., although maybe this doesn't matter.

Stacked Bar Grouped By Time Period: (T12402) If we write a group by week function, each color in the example in T12402 is [accumulate [fact tasks.open-count.status.project <phid>] [scale -1] [group-by-week]. Then we render it as stacked-bar instead of stacked-area or line. The caveat here is that tasks in multiple projects will be double-counted. We'd need to move those events into a "multiple projects" group. This doesn't map cleanly to the function model as it exists today.

Queue Length: Various requests focus on queue length, see:

In these cases, we need to ETL time-to-close. In the case of revisions, we could also pull time-to-first-response, time-to-first-accept, time-to-accept-state, number-of-updates, etc. We can do less of this automatically for tasks today, but we could do "time-to-assign" and could annotate task statuses, conceivably, if installs want to micro-manage task lifecycles.

I don't particularly love the example reports in either T12410 or T12177 as ways to visualize this data. In T12410, the magnitude is lost in the color, and the bar height shows count/points rather than lead time. The resulting chart doesn't seem especially useful for answering any of the questions posed (i.e., looking at it, there isn't much of an apparent pattern). I think it also excludes "tasks that were filed but which have not been completed", which is a pretty critical piece of information.

In T12177, the chart shows completion but not time, and doesn't indicate scale.

For the current state of the world, I suspect that better is a stacked bar chart, grouped by time period. The height of the bar is the number of objects created in that period. The bar colors show the current status of those objects (open, accepted, etc).

This chart doesn't show changes over time well. I'm hesitant to put "time to close" on a minor axis like color, because the value is likely logarithmic. A histogram of age-at-close for tasks closed in a given period might be more insightful, although this omits tasks which have not been closed yet, which is pretty critical. A histogram of age-at-close for tasks created in a given period, including "still not closed", is perhaps more useful.

In any case, most of the missing stuff here is just at the display level.

I think a possible model for the UI is to have some base ChartConfigurationEngine, similar to SearchEngine, which defines a set of fields. Some fields would be common (e.g., all time-series charts can have time controls for domains).

Then we'd provide concrete subclasses of this. In Maniphest/Projects:

  • Burndown, for zero or more projects or owners. One project draws the current "burnup" chart. Multiple projects stack ala T12402. Projects can have an on-project page for their burndown, plus a link into Facts to refine the chart.
  • Task age, for zero or more projects or owners.

In Differential/Diffusion:

  • Revision age, for zero or more projects, repositories, or authors.

We could put author charts in People but I suspect this is a bit of a nest of vipers, and maybe we shouldn't even build these for now.

In Facts, we could conceivably provide a FreeformChartConfigurationEngine that would give you raw access to the underlying JSON configuration and let you pull whatever you want out of the fact engine, but I suspect this is a very advanced feature and not of much general interest.

See also PHI1226. An install has an environment where automatic rules (Herald/Owners) frequently trigger group reviewers (projects/packages), and some groups are not prompt about performing review.

It would be helpful to have charts which can try to answer questions like:

When a rule adds group X as a reviewer, how much longer do we expect review to take?

We can make a first pass at this by charting time-to-review for revisions with that reviewer against time-to-review for all other revisions, or time-to-action for that reviewer against time-to-action for all other reviewers. This is probably best as overlaid box plots, although it would be nice to show total review count too (width of the box?) so we don't run into situations like "no one did any review while the company was having its annual pool party week, so the numbers are completely whack".

A problem with "time-to-review" or "time-to-action" is that it discards cases where no action occurred. If you review 5% of revisions in 1 hour and never look at the other revisions, you shouldn't get a chart out of it that says "average review time: 30 minutes".

Multiple Project Burnups Stacked in an Area Chart

Mostly rambling for my own purposes:

I think I'm going to start here. I think we get two flavors of chart out of this today: one where the Y axis is task count; one where the Y axis is points. They should mostly be similar. Points is available if maniphest.points is enabled.

This chart accepts a list of one or more projects as parameters. When a parent project is specified, subprojects or milestones are broken down and stacked separately. Maybe we can do this all in one chart, or maybe there's a separate summarize(project) token later to do the rollup visually.

On the points chart, maybe we overlay un-costed task count? Or just put that on a separate chart? These charts may otherwise be misleading when you have 30 points of open tasks and 92 uncosted open tasks. This should be on a second Y-axis if it's present.

When two or more projects are present, we need to create virtual groups for "Two or more projects" -- either one dumb "Two Or More" group, or smart "X and Y" groups, which we stack between X and Y. A "dumb" group is probably a requirement when you chart 30 different projects, so maybe we start there and then refine into "X and Y" if it's relevant. "X and Y" is a good group if your chart is "X" and "Y", but it's the same as "everything". "X and Y" might be a good group if your chart is "X and Y and Z", but we only have two slots between the stacks and it's probably better if stacking order is consistent, so the in-between groups would be forced. This might not be worthwhile.

Visually, the best stacking order is often "area", but that may feel somewhat random.

We could let users drag-to-restack.

So the specific things we need are:

  • Application-level configuration engines on top of generic charting.
  • Some kind of prototype place to put charts in Projects.
  • Stacked-area as a way to render a dataseries.
  • Support for dataseries with multiple functions (each tier of the stack), assuming the stacking logic is in the display layer ("draw this stacked area series"). Or a new object above "dataseries" that covers "a rendering of one or more series onto the chart", but I think "dataseries" is this object. Another way to approach this is "draw this specific area, bounded above by line X and below by line Y", but I suspect that's less powerful in the long run.
  • Labeling, click-stuff-to-do-things, customizable domains, etc.

I would also like to relate each point on the chart to the exact actions which affected it, e.g. "this line went up here because Txyz was rescoped". There's currently soft support for this.


When this chart shows only one project (e.g., a particular milestone, or a project with no subprojects) it would be nice to stack reasons instead. The reasons are "task created", "task status change" (i.e., task closed or reopened), and "task added/removed from project" (i.e., scope change). Knowing that a project is behind because of scope changes or late additions is potentially useful.

This is a little ways out, but the actual facts Maniphest currently generates need to be adjusted to properly support stacked-area-by-reason: currently, when a task is closed, we don't have enough data to figure out whether it should be removed from the "created into this project" area, the "reopened work" area, or the "moved work" area. This is okay if we're summing the opens/closes but not precise enough if we're breaking them down.

One issue with model we're evolving toward is that if you view a chart in Klingon, then send the link to a user whose primary language is Kerbal, we'd prefer to relabel the chart in Kerbal, not restore serialized Klingon-language chart labels. That means label generation has to be at render time, not at chart construction time, and I'm serializing a data-structure which is slightly too low-level for common application cases.

There's some tension between concerns here. If we want to let you recolor or relabel a chart, save it, and present it later, we presumably need to be serializing fairly low-level primitives that are common across all charts so that serialized data can also include "but this function should be red, since the user recolored it". But perhaps the correct level of serialization is that we put unique keys on each function, then store more like "chart engine + engine configuration + per-unique-function-key-overrides" to let you specify the display behavior of each function. I don't anticipate letting you go any lower than that (e.g., move points on the chart) and things I could imagine allowing (e.g., draw on top of the chart) are compatible with higher-level serialization.

This doesn't have a lot of impact and is easy to move around (we just move chart generation forward a little bit: Engine -> Serialize -> Deserialize -> Chart -> Render instead of Engine -> Chart -> Serialize -> Deserialize -> Render) but the change does get somewhat more mechanically difficult over time since somewhat more lines of code will need to be touched.

If we do expose "raw chart API" in some sense eventually, the "Engine" and "Chart" steps are the same step anyway and we need most/all of the serialize/deserialize-on-functions code anyway.

The actual facts Maniphest currently generates need to be adjusted to properly support stacked-area-by-reason: currently, when a task is closed, we don't have enough data to figure out whether it should be removed from the "created into this project" area, the "reopened work" area, or the "moved work" area.

This is kind of a mess that's somewhat hard to reason about.

Suppose a task is created in project A with a value of 20 points. We're looking at the stacked area burndown for project A. The burndown is stacking "base work" (the cost of tasks created in the project), "rescoring" (the cost of changes to score of tasks already in the project, i.e. a realization that an estimate was incorrect), "rescoping" (tasks moved into the project), and "reopening" (work which was marked complete but then reopened).

If the task value is increased to 30 points, we can record that as a +10 rescore event, put that area on top of the basic "points" area, and color it differently to show how much new work is due to rescoring.

If the task is closed, we can record that as a -10 rescore event plus a -20 base event. The size of both areas decreases.

However, if the task is rescored to 5 points, things get tricky. If we record this as a -15 rescore event, our rescore line may now have negative values and the area may be negative. This is confusing and sort of nonphysical. Although it's technically possible to draw negative area, I suspect this is a very bad path to walk down and will make the resulting chart wildly perplexing to viewers without multiple Phabricator PHDs.

We can record this as a -10 rescore event plus a -5 base event. All the areas on the chart will remain positive, which is seems good. But this looks like you completed work when you really just rescored work.

Another consequence of this is that if you have a task worth 50 points, increase the score to 100, decrease the score to 1, then increase the score to 100, the chart will now show two different pictures when the task was in the same state. After the first rescore to 100, it will show 50 points of base work and 50 points of rescoring. After the second rescore to 100, it will show 1 point of base work and 99 points of rescoring. This is a plausible interpretation, but the human expectation of changing a value from X to Y and back to X is usually that it restores us to the same state.

We could record this as a -15 rescore event, then draw the function [sum [fact base] [interval [fact rescore] null 0]], i.e. draw the base score as lower (the lower bound of the negative area) but track the real value in the database. Then the X -> Y -> X state change works as expected and we draw 50 points of base score and 50 points of rescoring at both points.

This doesn't fix the rescore making it look like work was completed, but maybe that's fine.

Under this approach, I think the data is (mostly?) okay as-is, and we just need more magic in the composition functions:

  • At the top of the stack is [interval [fact rescope] 0 null], i.e. the area where added scope is positive.
  • Under that is [interval [fact rescore] 0 null]], i.e. the area where rescoring is positive.
  • Under that is [interval [fact status] 0 null]], i.e. the area where reopening is positive.
  • Under that is [sum [fact create] [interval [fact rescope] null 0] [interval [fact rescore] null 0] [interval [fact status] null 0]], i.e. the remaining values plus the negative areas of the other regions.

This is perhaps not a good area of focus for the moment, since just summing everything rather than trying to stack stuff gets us an easy "useful" result, but interval is easy to write so maybe I'll at least take a shot at this.

Not sure where to report this, but since a recent upgrade at Wikimedia, the new version of the Burnup Graph (now Reports: Burndown), has a tendency to go below zero. I'm aware the old version had inaccuracies so perhaps it was happening before as well but hidden (e.g. artificially replaced with zero or something like that).

https://phabricator.wikimedia.org/project/reports/1212/

capture.png (1×2 px, 216 KB)

Yeah, this is still in a transitional state, it's just been stalled for a bit (not blocked by anything, just other stuff has been getting attention). The two major issues I'm aware of right now are:

  • burnup chart data is pretty questionable; and
  • if prototypes are disabled, some stuff is broken, particularly "View Chart".

Both issues should be resolved in the next few patches once this gets more attention.

if prototypes are disabled, some stuff is broken, particularly "View Chart".

See also https://discourse.phabricator-community.org/t/maniphest-burnup-rate-chart-is-not-displaying/2852/.

I think there isn't really an easy way out of this other than marking Facts as not-a-prototype and putting prototype-and-application guards (instead of just application guards) on the integration points. This can happen relatively soon (in "development time" terms, not necessarily "linear time" terms) but I don't want to vanish parts of the UI even if they aren't entirely working.

burnup chart data is pretty questionable

I think we're closer than it looks to having a useful result here, but the opaqueness of this chart itself is a problem -- I wanted to take this opportunity to improve observability of the data before actually fixing the chart. That is, the chart should provide enough tools for a knowledgeable observer to figure out what's wrong without digging into the source and internals. The major tools I want to provide next are:

  • tooltips indicating the events associated with each datapoint; and
  • the ability to view a dataseries in tabular format.

These should make charts generally more useful, and let the viewer understand why things are doing whatever they're doing.

In stacked area charts, when we stack areas, we accumulate some error by stacking integer points on top of interpolated points.

For example, if tasks close at T = 1 and T = 3, the "closed task" count at T = 2 will be, say, "7.5", interpolating between the points at T=1 and T=3.

If we have a real event in a dataseries stacked on top of this, it may be raised to an actual Y-value of "18.5". This isn't meaningful as a datapoint, even though it is a geometrically accurate interpretation of the interpolated data.

That is, the real values are integer step functions. We draw them with smooth lines since this looks nicer and it's what users expect (and it's convenient for all points to have unique X values), but the aggregation should be based on the underlying step function, not the graphical interpolation. We could continue using the interpolated points for graphical aggregation, although it's probably fine if the top and bottom of an area aren't completely in sync in terms of how they're interpolating the underlying step function.

D20814 combines nearby points so we don't have tons of overlapping tiny circles. For now, the grouping is just based on X-axis distance:

BeforeAfter
Screen Shot 2019-09-16 at 10.41.29 AM.png (553×1 px, 62 KB)
Screen Shot 2019-09-16 at 10.41.38 AM.png (548×1 px, 57 KB)

We could later imagine grouping points based on some other measure (like trying to put points at the bottom and top of sudden changes in value, or always putting a point on the 1st day of each week or 1st day of each month).

The "compose(...)" function won't select input values correctly if the first function in the chain isn't a data function. For example, if we compose:

shift(-3600) ∘ fact(beep)

...to generate an hour-by-hour overlay chart of how many times something has beeped, we'll incorrectly select the current hour's x values, shift them by -3600 seconds, then push them through the map. This will accidentally work if we happen to sample at those points: for example, the underlying data has one datapoint exactly every 60 seconds and we're shifting by a multiple of 60. This will also accidentally work if our data function has been made continuous, e.g.:

shift(-3600) ∘ accumulate(fact(beep))

However, at least in the specific case of accumulate(...), we'll currently lose the datapoint refs if this happens.

The correct behavior in this case is to likely to determine that shift(-3600) is invertible, remap the domain, generate input values from fact(beep) using the remapped domain, then pass the values back through shift(-3600)^-1. This is somewhat complex.

We can't do this if shift(-3600) is not invertible, but this kind of construction is probably meaningless anyway:

cos() ∘ fact(beep)

In this case, we could conceivably clip the domain into [-1, 1] and emit arccos() as the inverse function? I don't think the UI is going to let user have access to any of this stuff anyway, so it's pretty moot.

I believe charts in the UI now make physical sense. They aren't very fancy/polished yet, but everything I see on this install looks cohesive (no more overlapping negative areas, weird gaps, general nonsense, etc):

Screen Shot 2019-09-18 at 10.44.34 AM.png (1×1 px, 416 KB)

The next steps I'm planning to take are focused around fixing the most severe sharp edges; notably:

  • Prototype / Fatals
    • get rid of all obsolete / policy-violating stuff so no part of the Facts application is broken;
    • get rid of the inline data generation in the chart controller, or move it behind a developer mode gate, since it's a huge performance cost in service of making developing easier;
    • update the gating so that Facts can be unprototyped;
    • unprototype Facts, fixing the various weird fatals.
  • Basic Usability
    • move tabular views to a debugging UI;
    • charts should support basic stuff like zooming;
    • when you try to view a chart you don't have permission to access, we should raise this as a sensible policy exception;
    • range fitting could probably be much better without much work (e.g. "display Y max = data Y max + 5%");
    • point hover behavior is exceptionally un-useful right now and likely low-effort to fix.

get rid of all obsolete / policy-violating stuff so no part of the Facts application is broken;

I don't think we have any policy violations currently. The only thing you can really do is take the chart of everything, then pull every chart for every project you can see, and use the difference to derive total activity for all objects you can't see. Since this is generally in the class of information we let you know about objects you can't see, I think this is okay.

For example, if you can see T100 and T200 but nothing else, you can already determine that there are 99 tasks between them that you can't see. We're comfortable disclosing this because the value of human-readable Txxx monograms outweighs the policy implications of knowing that things you can't see exist.

Charting currently gives you very slightly more information (if you do a significant amount of legwork, you can probably figure out how many of those 99 tasks are open or closed) but I think we have to accept this as the cost of metric extraction.

I think the only obsolete part of Facts is the home screen. I can just redirect this to the demo chart minimally.

get rid of the inline data generation in the chart controller, or move it behind a developer mode gate, since it's a huge performance cost in service of making developing easier;

I'll just gate this to developer mode for now.

update the gating so that Facts can be unprototyped;
unprototype Facts, fixing the various weird fatals.

These are straightforward.