Page MenuHomePhabricator

Simplify implementation of "pure" Chart functions
ClosedPublic

Authored by epriestley on Apr 17 2019, 10:08 PM.
Tags
None
Referenced Files
F14633775: D20446.diff
Fri, Jan 10, 9:13 PM
F14621475: D20446.id48830.diff
Fri, Jan 10, 6:56 AM
F14620926: D20446.diff
Fri, Jan 10, 5:45 AM
Unknown Object (File)
Fri, Jan 3, 8:49 PM
Unknown Object (File)
Sun, Dec 15, 5:16 PM
Unknown Object (File)
Dec 11 2024, 3:56 AM
Unknown Object (File)
Nov 28 2024, 10:31 AM
Unknown Object (File)
Nov 27 2024, 4:56 PM
Subscribers

Details

Summary

Depends on D20445. Ref T13279. I'm not sure what the class tree of functions actually looks like, and I suspect it isn't really a tree, so I'm hesitant to start subclassing. Instead, try adding some isSomethingSomething() methods.

We have some different types of functions:

  1. Some functions can be evaluated anywhere, like "constant(3)", which always evaluates to 3.
  2. Some functions can't be evaluated anywhere, but have values everywhere in some domain. This is most interesting functions, like "number of open tasks". These functions also usually have a distinct set of interesting points, and are constant between those points (any count of anything, like "open points in project" or "tasks closed by alice", etc).
  3. Some functions can be evaluated almost nowhere and have only discrete values. This is most of the data we actually store, which is just "+1" when a task is opened and "-1" when a task is closed.

Soon, I'd like to be able to show ("all tasks" - "open tasks") and draw a chart of closed tasks. This is somewhat tricky because the two datasets are of the second class of function (straight lines connecting dots) but their "interesting" x values won't be the same (users don't open and close tasks every second, or at the same time).

The "subtract X Y" function will need to be able to know that subtract "all tasks" 3 and subtract "all tasks" "closed tasks" evaluate slightly differently.

To make this worse, the data we actually store is of the third class of function (just the "derivative" of the line chart), then we accumulate it in the application after we pull it out of the database. So the code will need to know that subtract "derivative of all tasks" "derivative of closed tasks" is meaningless, or the UI needs to make that clear, or it needs to interpret it to mean "accumulate the derivative into a line first".

Anyway, I'll sort that out in future changes. For now, simplify the easy case of functions in class (1), where they're just actual functions.

Add "shift(function, number)" and "scale(function, number)". These are probably like "mul" and "add" but they can't take two functions -- the second value must always be a constant. Maybe these will go away in the future and become add(function, constant(3)) or something?

Test Plan

Screen Shot 2019-04-17 at 2.56.58 PM.png (1×1 px, 520 KB)

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Mostly thinking out loud and this probably doesn't make much sense, but:

subtract "derivative of all tasks" "derivative of closed tasks" is meaningless

This isn't exactly meaningless, I think, it's just a weird operation.

That is, we currently store this data:

All Tasks
10:00 AM  +1
10:03 AM  +1
10:32 AM  +1
...

...that is, a single "+1" every time the value increased. This is basically the derivative of the actual "all tasks" line users would expect.

Right now, we apply a function like accumulate(that-data) to draw the line, which approximately takes the integral of the dataset (by adding all the numbers up) and turns that into a new dataset.

Then we connect the dots in a vaguely misleading way: if there were 5 tasks at 10AM and 6 tasks at 11AM, we draw a graph which shows there were 5.5 tasks at 10:30 AM. There weren't -- the number of tasks was 5 until 10:59:59.9999, then it jumped to 6, and there's a discontinuity there. But this is a convenient fiction which makes the data easier to read visually.

To subtract two derivative sets, we just invert all the values in the second one, then combine the points, summing the values where both are defined. We can then accumulate that set to get the line we want -- that is, these operations are equivalent:

add(accumulate(all-tasks), accumulate(closed-tasks))
accumulate(add(all-tasks, closed-tasks))

...assuming add() knows that it needs different behavior when the functions are derivative functions vs integrated functions.

So -- what if one function has been accumulated and the other one has not?

I think we can mostly use the same logic, we just need to know if the function is defined everywhere or not. If it's defined everywhere, we treat its value everywhere as the value of the last discrete point. If it's defined almost nowhere, we only process distinct values. Then an accumluate anywhere works like we expect it to?

So we're still left with the three classes of functions: continuous and defined everywhere ("constant(3)"), discontinuous and defined everywhere (on some domain) ("accumulate(open tasks)"), and discontinuous and defined almost nowhere / discrete points (raw "open tasks").

The first class is now "canEvaluateFunction()". The second class is like "isDefinedEverywhereInDomain()"? That's pretty wordy and unclear so I'm inclined to label the third class as "isRawFunction()" or "isDatapointFunction()" or "isImpulseFunction()" or something, instead, I guess? "Impulse" sounds super cool.

That screenshot produces quite the intense optical effect. I thought I was looking at moiré effect from a cellphone camera.

I'm sure you mean an intense productivity effect, optimizing your leverage to achieve key performance results.

src/applications/fact/chart/PhabricatorShiftChartFunction.php
26

This should be "+", fixed elsewhere.

amckinley added inline comments.
src/applications/fact/chart/PhabricatorChartFunction.php
111–113

Maybe make this take an array of x values to evaluate at? I know function overhead is a silly thing to complain about, but I bet 2000 calls vs 1 would show up on a profiler, and I bet some functions that have to spin up a lot of additional state could be implemented less painfully by evaluating every point in a single function call. (This is assuming that getDatapoints below is the only caller of evaluateFunction and there's no other caller that would want piecemeal results).

src/applications/fact/chart/PhabricatorChartFunctionArgumentParser.php
185

I'm honestly not sure if the correct language here should be "function" or "functions". Got an AP style book handy?

185

#literallyunusable

src/applications/fact/chart/PhabricatorCosChartFunction.php
4

Whoa, watch out, Mathematica!

src/applications/fact/controller/PhabricatorFactChartController.php
35–51

parrot

This revision is now accepted and ready to land.Apr 19 2019, 12:07 AM
src/applications/fact/chart/PhabricatorChartFunction.php
111–113

Yeah, the next change looks like it will make this a vector function.

(accumulate() was very hard to implement in a sane way if the function only took $x.)

This revision was automatically updated to reflect the committed changes.