Page MenuHomePhabricator

Metrics: number of active Maniphest users over time
Closed, DuplicatePublic

Description

Let's start tackling T6041: Metrics for Maniphest with a very basic data point that most open projects will be interested in knowing:

How many active users do we have in Maniphest over time?

One way to calculate this could be: number of registered users performing any action in Maniphest tasks (creating, assigning, prioritizing, resolving, commenting, subscribing, awarding) on a given month. Then we could use Facts to show a graph of the evolution of this number on a daily basis.

How complex would be to build this?

Event Timeline

qgil raised the priority of this task from to Needs Triage.
qgil updated the task description. (Show Details)
qgil added projects: Wikimedia, Facts, Maniphest.
qgil added subscribers: qgil, cburroughs, g-p-g and 2 others.

Almost all of the planned work here is in building general infrastructure. We don't plan to build charts, we're going to build tools that let you build charts. The upstream goal here is to avoid a dynamic where installs ask us for everything they want a chart of and we're stuck maintaining a large number of hard-coded ad-hoc charts. Assuming we do a good job of building tooling, we'll end up in a state that's better for everyone: you'll be able to build your own charts (by selecting what data to chart, configuring the chart, and then embedding the chart in other objects), and we will just be maintaing a relatively small and simple piece of infrastructure, which is actually large and complex, but simple when compared to the cost of maintaining hundreds of ad-hoc charts.

If we didn't plan to attack the problem like this, we could hard-code this specific chart very quickly using a raw SQL query -- it would take maybe an hour to have working, and probably less. But, over time, the cost to us to build all of the other charts you want and everyone else wants, and then maintain them, and deal with scaling problems, and then be required to proceed more carefully when making infrastructure changes elsewhere because we have a lot of fragile hard-coded chart queries against the raw SQL -- would be very large. We don't want to pay an hour per chart up front and then several hours per chart over the chart's lifetime: we want to pay as little per chart as possible, because we anticipate that users will want a very large number of different charts.

Because our plan is to build a tool, not to build specific charts, the upfront cost to get the first chart is large, essentially independent of what that chart is. Even if you want a straight line graph of time-over-time, we still need to make the initial tool investment. Assuming we do a good job of building the tool, chart generation afterward will be primarily self-service, with occasional upstream involvement to add new general-purpose features (e.g., support for pie charts or exporting raw data to Excel or whatever else).

Pulling this data against the raw SQL is very easy. Here's a query which can pull the raw data, and the data for this install:

mysql> SELECT DATE_FORMAT(FROM_UNIXTIME(dateCreated), '%Y-%m') period, COUNT(DISTINCT authorPHID) N FROM maniphest_transaction GROUP BY period order by period;
+---------+-----+
| period  | N   |
+---------+-----+
| 2011-02 |  10 |
| 2011-03 |   5 |
| 2011-04 |  27 |
| 2011-05 |  29 |
| 2011-06 |  45 |
| 2011-07 |  57 |
| 2011-08 |  66 |
| 2011-09 |  42 |
| 2011-10 |  45 |
| 2011-11 |  38 |
| 2011-12 |  46 |
| 2012-01 |  54 |
| 2012-02 |  49 |
| 2012-03 |  66 |
| 2012-04 |  71 |
| 2012-05 |  86 |
| 2012-06 | 108 |
| 2012-07 | 123 |
| 2012-08 | 138 |
| 2012-09 |  91 |
| 2012-10 | 138 |
| 2012-11 | 101 |
| 2012-12 |  83 |
| 2013-01 | 133 |
| 2013-02 | 138 |
| 2013-03 | 138 |
| 2013-04 | 157 |
| 2013-05 | 149 |
| 2013-06 | 120 |
| 2013-07 | 109 |
| 2013-08 | 134 |
| 2013-09 | 100 |
| 2013-10 | 107 |
| 2013-11 | 106 |
| 2013-12 |  84 |
| 2014-01 |  81 |
| 2014-02 |  96 |
| 2014-03 | 128 |
| 2014-04 | 154 |
| 2014-05 | 187 |
| 2014-06 | 160 |
| 2014-07 | 196 |
| 2014-08 | 193 |
| 2014-09 |  76 |
+---------+-----+
44 rows in set (0.29 sec)

Understood. Since I cannot help coding tools that let us build charts, I thought that defining specific metrics could help you coding them. However, I don't want to create tasks that don't belong here. Just tell me if this feedback is useful, and how you prefer to get it. No problem if the answer is 'no thanks'. :)

Examples of use are always helpful.

btrahan added a subscriber: btrahan.

I merged this back to the parent task T6041 (which is really just use cases motivating T1562 and not directly actionable.)