Paths

Table of Contentst

Differential D20452

Separate "accumulate(...)" from Fact functions
AbandonedPublic
Actions

Authored by epriestley on Apr 18 2019, 9:07 PM.

Details

Reviewers

amckinley

Maniphest Tasks

T13279: Build Charting for Facts

Summary

Depends on D20446. Ref T13279. Currently, the raw ETL fact data is just changes to counts, e.g. a "+1" when a task is created or a "-1" when a task is closed.

We accumulate these changes into a line as part of the "fact()" function, but we can do this more cleanly by making accumluation a separate function.

The raw, unaccumulated functions become "impulse" functions, i.e. each point is like an acceleration "impulse" which we can accumulate to plot speed, since "accumulate" is really "crappy, low-budget integrate() that only works in super easy cases".

The "accumulate()" function can only operate on discrete "impulse" functions because I'm not expecting us to be able to chart "accumulate(mul(x, 2))" and have it figure out that that ∫(2x)dx = x^2 and chart that. We can actually run "accumulate()" on sampled real functions and get a numerical approximation, but this is silly and far afield from the useful set of problems we're trying to solve, so just prevent it.

The name "impulse" may change since I'm still not totally sure how functions will end up organized, I'm just trying to move toward a reasonable definition of "add(x, y)" that works when X and Y are functions like "open tasks in project X" and "open tasks in project Y" and being able to get a sensible line out of it.

Test Plan

Here's accumluate(scale(x(), 2)) for kicks. This is not allowed, but does draw a fairly accurate chart numerically approximating x^2:

Screen Shot 2019-04-18 at 2.02.32 PM.png (855×1 px, 180 KB)

Here's accumulate(fact(open-tasks)), which is just the same thing that fact(open-tasks) used to be (the spike is when I used bin/lipsum to create a lot of tasks):

Screen Shot 2019-04-18 at 1.57.17 PM.png (895×1 px, 162 KB)

Diff Detail

Repository

rP Phabricator

Branch

chart11

Lint

Lint Passed

Severity	Location	Code	Message
Advice	src/applications/fact/chart/PhabricatorChartFunction.php:158	XHP16	TODO Comment

Unit

Tests Passed

Build Status

Buildable 22674
Build 31072: Run Core Tests
Build 31071: arc lint + arc unit

Event Timeline

epriestley created this revision.Apr 18 2019, 9:07 PM

Herald added a subscriber: yelirekim. · View Herald TranscriptApr 18 2019, 9:07 PM

Harbormaster completed remote builds in B22674: Diff 48803.Apr 18 2019, 9:09 PM

epriestley requested review of this revision.Apr 18 2019, 9:09 PM

Also featured here is "load the data for all functions in the call tree, not just top-level functions".

I'm think I'm going to tackle this a little differently. Mostly more rambling:

accumulate(...) is not really a function of evaluating fact(open-tasks). That is, accumulate() can not produce a y-value at point X given only fact(open-tasks) evaluated at X. accumulate() is a sort of functor on the "open tasks" dataset.

Perhaps we're better off looking at this on two dimensions: each function is really a "functor" with configuration arguments, and these functors are chained together.

So sin(x()) is x | sin. Easy enough.

scale(x(), 2) is x | scale(2).

scale(shift(scale(cos(x()), 1), 2, 3) is x | cos | scale(1) | shift(2) | scale(3).

accumulate(fact(open-tasks)) is really x | accumulate(open-tasks), not x | fact(open-tasks) | accumulate.

The start of a pipe is always x (or constant(...), but we can pipe x to constant without any issues).

So far, so good. This gets tricky with sum(), though.

We'd like to actually pass functions to sum() as real arguments, so sum(cos(x), sin(x)) is truly x | sum(cos, sin). sum evaluates the pipe as (x | cos) + (x | sin). That's fine.

But x is also a list of distinct samples, not some abstract mathematical ideal.

So I think the real approach here is:

Get rid of "source functions" as arguments. Functions are sources (like x) or not (like cos). (Actually, I'm unsure if we even care about this distinction anymore.)
Restructure functions to use chains instead of composition.
To find the list of points we're going to evaluate, we walk the chain until we hit a function with a domain, and fall back to sampling if we don't find one? I think this works. But what we actually care about is whether the function wants to suggest samples or not. We can figure the domain out from the samples.

So we actually figure out the domain like this:

Walk the chain until we find a function which has samples.
If we find one, those are our input samples. Their extent is our domain.
If we don't find one, pick a default domain (or we can do another pass and have functions guess a domain, some day).
If we have a domain but no samples, use linear samples (or we can walk the chain and ask functions to guess some reasonable samples, some day).

The last two steps are refinements, since fact(...) can guess that "last 90 days" is a good domain, and weird functions like atan() could guess that better sample density near the part of the graph where it goes "woosh" gives us a better graph shape.

Now we have samples, and we shove them through the chain and make each function image them. Whatever we get out of the other end is our actual data.

I've sort of headed in this direction by exploring things anyway, and I think this is going to look significantly more robust as an approach.

epriestley mentioned this in D20445: Make chart function argument parsing modular/flexible with 900 pages of error messages.Apr 18 2019, 11:41 PM

On selecting a domain, you can build a chain like this:

x | shift(1000) | fact(Q)

Then, if fact(Q) supplies samples or a domain, we're out of luck.

We can only "fix" this if shift(1000) can be inverted. I'm not going to worry about this for now since I don't think this chain makes a ton of sense. I think it's only useful for "show today's activity against yesterday's activity", and we should accomplish that by having two different X axes overlaid, not by manually shifting the points around so that "evaluate(April 19)" pops out the datapoint for April 18.