Page MenuHomePhabricator

Surface repository pull logs in the web UI
Closed, ResolvedPublic


See PHI305. A cluster install is interested in expanded access to repository pull logs for auditing.

T12611 is a related operational issue that may make sense to fix here.

T4842 is somewhat adjacent.

Since exporting logs to CSV/text/JSON is also probably desirable, there's some argument for pursuing T5954? Maybe? This is probably not very difficult, although it may be the difference between 2 hours of work and 2 days of work.

Event Timeline

epriestley created this task.
epriestley added a revision: Restricted Differential Revision.Jan 23 2018, 12:19 AM

@20after4, heads up that we're going to start exposing pull logs, which contain remote addresses in the upstream. The visibility policy will likely be similar to push logs.

You can configure the retention policy by adjusting the diffusion.pull garbage collector (default: 30 days), stop remote addresses from being recorded by tweaking DiffusionSSHWorkflow->newPullEvent(), or mangle data in the UI once whatever I'm going to do here lands.

As some general commentary on this:

I've historically been resistant to features that look or feel like user tracking since I think they're generally "creepy". See D3620 and D3304 for some discussion, where the proposed feature was "Users Who Have Looked at This Revision: alice, bailey".

I think I'm fairly far on the privacy side of this, and most users are generally comfortable with features that are over the line into "creepy" territory for me, like read receipts in Facebook Messenger and iMessage, "Currently Viewing this Document" in Google Docs and Dropbox (which also shows "last viewed time" for each user), presence in Slack, etc. However, I don't think my viewpoint is completely extreme and D3304 links to a case where users pushed back on Quora in 2012 for launching a "who looked at this" feature. Facebook also shows, e.g., "Someone is typing a message..." on comment threads without identifying the user; I suspect this might not be purely an implementation constraint and is at least somewhat motivated by not wanting the feature to feel too creepy.

At the same time, we retain access logs and I don't think anyone finds access logs "creepy" (although they might want a site to have a clearly documented retention policy for logs or whatever else). WMF is an organization fairly far on the privacy side of this, with rules around retaining and displaying user IP addresses, but no expectation that servers won't retain access logs.

Beyond general HTTP access logs, we've supported application access logs (log.access.path; log.ssh.path) for a number of years. Broadly, these are fancier versions of an Apache access log.

We also explicitly log views in one case today: when you expose the plaintext for a stored credential.

We've retained and displayed push logs for a long time without complaints about creepiness, but pushes are writes and your name is attached to them so it's generally unsurprising that this data is available (i.e., the same data is available in git log, broadly). We've also retained -- but not displayed -- pull logs since Jan 2016, although the motivation was originally diagnostic. These logs are materially valuable in diagnosing pull failures, especially in the cluster, so there's a good technical reason to record and retain them.

In PHI305 there is a reasonable (although not especially strong) desire to expose these logs for auditing/security purposes (e.g., identifying credential disclosure or improper access). I think there are other ways we could tackle some aspects of this (binding sessions or credentials to remote addresses, for example), but none can completely serve the broad role that an access log can.

After D18914, we will now display pull logs to all users, which means anyone with access to a repository can see which users have pulled from it and when.

For me, this doesn't cross the grey area into "creepy" territory, although I can't define bright line rule about what is or isn't creepy here and I think there's room for disagreement. That is, to me, the spectrum looks something like this:

Definitely Not CreepyShowing other users your committed writes, like edits to tasks and push logs.
A Teeny Tiny Bit CreepyRetaining access logs for operational reasons; showing other users your credential plaintext disclosures.
A Bit More CreepyShowing other users pull logs.
----- Dividing Line of Acceptable Application Behavior Today? --
Kinda CreepyShowing other users your uncommitted writes (like draft comments, "typing a message...").
Sorta CreepyShowing other users your general activity (active vs AFK).
Pretty CreepyShowing other users your web views on objects; showing other non-admin users your access logs.
Really CreepyShowing other users your physical location; hooks to notify other users when you look at stuff.
Extremely CreepyShowing other users your webcam video stream; using CSS :visited selector disclosure bugs to detect your browser history and publishing it; pulling your Facebook private messages over OAuth and making them public, etc.

I think most users would probably be comfortable if the default settings moved the line down a couple slots, and comfortable moving it even further if they opt in to the settings. I generally think software users are becoming more comfortable with the line moving down the list. I think this trend is a little worrying in the abstract, and I'd like to keep the line as high as possible and make technical choices which advocate for users with more conservative viewpoints on privacy. That said, I think that we're probably on safe ground shipping visible pull logs, even though they push the feature set which is part of Phabricator down the list.

(I also don't rule out pushing the line down in the future, I just want to put more thought into it than "Google/Facebook/Dropbox do it so it must be OK".)

epriestley added a commit: Restricted Diffusion Commit.Jan 23 2018, 9:44 PM

T13049 is a followup about generalizing exporters.

T12611 is desirable, but survives this task.

epriestley renamed this task from Improve repository pull logs to Surface repository pull logs in the web UI.Jan 26 2018, 8:44 PM