Page MenuHomePhabricator

Expose the ability to add some JS to every page
Closed, WontfixPublic

Assigned To
Authored By
jcox
Jun 28 2017, 7:42 PM
Referenced Files
F5022038: image.png
Jun 28 2017, 7:55 PM
F5022021: tumblr_inline_op017kGTqT1teupc5_540.gif
Jun 28 2017, 7:50 PM
F5021943: image.png
Jun 28 2017, 7:42 PM

Description

I'd like to add some tracking javascript at the top of every page (a la Google Analytics or Piwik). This is either roughly or explicitly related to T6621, T6914, Q571, and T4213. It seems the current suggested way to do this is to maintain a fork of Phabricator that directly modifies either PhabricatorStandardPageView.php or PhabricatorBarePageView.php. We don't have a fork of Phabricator and have instead opted to extend it when we need additional functionality that doesn't exist in the upstream. I'd like to continue doing that if possible so I see a few options:

  1. I could add a config option called html.in-the-head-of-every-page which lets administrators insert any arbitrary HTML . PhabricatorStandardPageView would then read that config value and insert it at the top of every page. This has obvious security concerns as it would allow a malicious administrator to insert mean code at the top of every page and steal everyones' secrets.
  2. I could create an abstract PhabricatorThirdPartyAnalytics class (or something of that ilk) in the upstream that exposes a getTrackingHTML() function. Then PhabricatorStandardPageView would issue a PhutilClassMapQuery and insert the tracking HTML for each concrete subclass (which we'd write downstream). This would be safer than the config option as it would require server access to edit the javascript.
  3. I could build out a more third-party-aware Tracking/Analytics application that exposes config options like google.analytics-key or piwik.application-key and then it just knows what to do with that information for each application. This seems like a pretty intractable approach since google tells me there are ~4.7 million client-side analytics libraries:

image.png (149×315 px, 12 KB)

  1. I could just get over it and fork Phabricator for that one little change, then pretend that the fork doesn't exist the next time someone asks me to make horrific changes to Phab. Then we'd likely want to just close this task as a duplicate of T4213.

I'm partial to approach #2 if that's something you'd be amenable to accepting upstream.

Event Timeline

Why do you want to track users with Google Analytics or a similar library?

I think the paths forward in the upstream are probably:

  • (5) We provide tools for doing whatever it is that you're trying to do.
  • (6) There are some use cases for modularizing Javascript in a similar way to how PHP is modularized, like adding new types of PHUIX control. Today, I believe these use cases are fairly weak, but that they'll eventually add up to building a system similar to the PHP system. This might be adjacent to (2) in some sense, at least eventually. But this is like three degrees of tentativeness removed from this actual feature request.

(Or just do (4).)

Why do you want to track users with Google Analytics or a similar library?

The general answer is "to better understand how our users are using phab". For example, one thing we want to look at is how people review code, with the goal of making the code review process more efficient and less burdensome. Another thing I'm curious about is what our most common task-creation workflow is. Are people linked directly to a form, going to maniphest and hitting create task, do bots create most of our tasks, etc.

I know this is vague. As I was typing this up none of the answers felt super satisfactory to me. I'll probably just go with (4) for the time being unless I can think of more convincing upstream use-cases.

epriestley claimed this task.

I think you probably won't be able to answer most of those questions with Google Analytics. For example, bots will never hit client-side analytics, so any question about bots probably can't be served by GA. Likewise, GA can't see Conduit/API activity.

You can answer at least some of these questions by examining the schema today, or by using future tools (T1562) to make it easier to extract information from the schema. See also T12177 for Differential, specifically, although that's less about workflow.

For bot questions, you can look at whether the author of each task is a bot or not.

For some workflow questions, you can look at the content source of the first transaction (e.g., was the task created via email, conduit, or the web)?

Some usage questions can be answered with Multimeter (https://secure.phabricator.com/multimeter/) but this tool is aimed more at performance and resource utilization than user behavior. You could conceivably set the sample rate to 1, although I suspect this tool is generally not very useful for undirected exploration of data.

I think we're amenable to recording which form a task was created with in the transaction log, e.g. as metadata on the "Create" transaction. There's been some discussion of using this to drive UI behavior in the past, but we ended up with a slightly different model instead (the "Subtypes" model) which feels a little better to me.


Putting GA on Phabricator lets Google read all your data (and potentially write it too, including writing to repositories, if they're clever enough in mounting an attack), and the value of this data isn't clear, so this isn't something I want to encourage in the upstream. From a "pure security correctness" point of view I'd specifically discourage it if, say, your company happens to be involved in ongoing litigation with Google worth billions of trillions of dollars.

Purely as an anecdote, we deployed active Javascript countermeasures -- URIs which selectively served an attack script vs an inactive script based on details of the request -- at Facebook in ~2007-2008. The goal was to take down phishing sites rather than conduct corporate espionage, but I also didn't have an opportunity to use it for espionage at the time.

If you do use GA and derive significant classes of insight from it we'd be interested in hearing about your experience, but if GA proves to be a bottomless font of useful insight we'd likely write a "Spy on Your Users" application, not provide support for third-party tracking.

I think GA's normalization will also be bad for this purpose, just in different somewhat-less-bad-overall ways from the Multimeter normalization -- for example, it can't automatically normalize /D123 and /D124 by controller, but Multimeter can.

I guess everyone uses GMail nowadays anyway (as mandated by international galactic law) so my paranoia is probably mostly moot because they can undetectably read all email communication anyway.

Haha to be clear we're using an open-source self-hosted tracking solution, I was just throwing GA out there as an example. But I do understand your point vis-a-vis data security.

I'll let you know if we get any useful insight from these tools.