Scramble file secrets when attached objects change their view policies
Closed, ResolvedPublic

Description

  • Prior to T5685, files were protected by unguessable secrets in URIs.
  • This is the standard method used by Facebook, Google, etc. However, it means that knowing a URI allows you to view the file data, and created a hypothetical risk where leaking, say, an access log might also leak valid URIs.
  • This was unacceptable to WMF, and in T5685 we changed this behavior to issue and require one-time use tokens to view files.
  • Uncachable images which require a one-time-token handshake to view cause performance problems essentially everywhere that we load file content.
    • UIs where the user might want to page back and forth between images (like Pholio and Lightboxes) are particularly crippled.

Some particular issues:

  • You can no longer share files by sharing the URLs (T6007).
  • Lightboxes and Pholio suffer crippling performance penalties (T9377, T8359).
  • All images in all applications are negatively impacted (T10261).
  • As of ~6 months ago, Safari flips out on thumbnails in comment previews and they're reduced to flickery messes.
  • Images can not be referenced in email (T9896).

Related Objects

epriestley renamed this task from Paranoid file security from WMF causes nearly universal performance problems to Paranoid file URL security mode causes nearly universal performance problems.
epriestley added subscribers: joshuaspence, Krinkle, nornagon.
epriestley added subscribers: liuxinyu970226, revi, chad and 2 others.
epriestley added a subscriber: swisspol.
epriestley added a subscriber: avivey.
epriestley updated the task description. (Show Details)Feb 2 2016, 1:18 AM
epriestley updated the task description. (Show Details)
avivey added a project: Restricted Project.Feb 2 2016, 1:39 AM

FWIW I'd be happy to trade performance for security by obscurity aka Facebook model etc... when it comes to files stored in Phabricator - because right now on Phacility, the current approach makes wiki pages with numerous image files feel slow since images are not cached at all. And we don't have anything that sensitive. Maybe a config setting could allow this?

tgr added a subscriber: tgr.Feb 10 2016, 3:04 AM

Preventing images from being cached in public proxies/CDNs is a reasonable security precaution. I'm not sure what benefit Phabricator's URL randomization adds though. the standard approach would be to stream files through the application and require an authenticated session (with the appropriate permissions), and set something like Cache-control: private so that the browser will cache it but upstream proxies won't. That would fix all the issues enumerated here (except maybe the Safari one - not sure how that's related) while still keep the files unleakable. The only performance impact that would remain is that files cannot be cached in a CDN and they need to be streamed through a lightweight PHP process - that's nontrivial but not huge.

There are several reasons we don't do that:

  • I believe Cache-control: private may not be respected on some nontrivial subset of corporate networks.
  • I don't think Cache-control: private is a standard approach, and can't find any evidence that it is.
  • A secondary goal of the security scheme as it currently exists is to make it obvious to users that the scheme works. That is, showing users errors is explicitly desirable so they trust that the scheme is secure. This is an independent goal from the scheme actually being secure.
  • We serve arbitrary user content, including arbitrary HTML files, SVG files, SWF files, etc, which are unsafe to serve from a domain which also has (meaningful) cookies. I am absolutely not willing to adopt blacklist-based approaches for enumerating all possible bad content, and unwilling to put any sort of meaningful authentication token on the user content domain in a cookie because this has been the source of many, many web security vulnerabilities over the last decade.

Particularly, on this last point, the approach you recommend has historically been the source of many, many dangerous and insecure behaviors, even recently with modern browsers. For example, see homakov's "cookie tossing" on GitHub pages in 2013:

http://homakov.blogspot.com/2013/03/hacking-github-with-webkit.html

Prior to that, Content-Disposition bypass in iOS until 2011:

https://support.apple.com/en-us/HT202349

Nothing has fundamentally changed about the browser security environment, and there is no reason to believe this strategy is any less dangerous today than it was in the past.


Cache-Control: private

Do you have empirical evidence to suggest that Cache-control: private is actually respected in the wild?

This was a number of years ago, but I caused a very bad privacy issue in 2007 at Facebook by misconfiguring HTTP headers. My recollection is that Cache-control: private was empirically not respected by many proxies on corporate networks, leading to coworkers seeing each others' content. I don't know whether this was proxies misinterpreting "private" or misconfiguration by administrators, but the issue seemed prevalent.

Broadly, the internet does not work and is broken in obvious ways if you ignore "Expires", but (as far as I know) everything works fine if you implement "Cache-Control: private" like "Cache-Control: public" on your corporate network appliance, so I don't think there's any real reason to believe this header actually works. From a quick search, here's a recent case of someone seeming to run into this issue recently:

http://www.rayofsolaris.net/blog/2015/incapsula-does-not-obey-cache-control-private/

(We model your coworkers as a threat, and they are most likely to have access to side-channel attacks which allow them to discover logs or observe URLs.)


Standard Approach

You say this is "the standard approach" -- why do you say that? What software uses this approach? Here are the approaches I can find in the wild:

ProviderApproach
FacebookUnique URL Hash + Fully Cacheable Resources
GoogleUnique URL Hash + Fully Cacheable Resources
TrelloUnique URL Hash + Fully Cacheable Resources
GitHubUnique URL Hash + Fully Cacheable Resources
BitBucketUnique URL Hash + Fully Cacheable Resources
AsanaUnique URL Hash + Fully Cacheable Resources (5 Minute Expiry)
Pivotal TrackerUnique URL Hash + Fully Cacheable Resources (30 Minute Expiry)
JIRACookies + Cache-Control: private + Same Domain
RedmineCookies + Cache-Control: private + Same Domain
BugzillaAuthenticated, but fully cacheable???

To me, it seems like a unique hash with fully cacheable resources is the most common approach by a large margin.


Obviousness to Users

Broadly, a primary goal of Phabricator's scheme from T5685 is to make the security behavior obvious to users.

No one has come up with a practical attack against unique URL hashes that doesn't require an existing high level of access (e.g., compromise of logs, network/proxy compromise) because there isn't one, which is why everything uses it, but it's uncomfortable for users to think that access to resources is protected by a large random hash in the URL instead of a large random hash in their cookies, even though the cookies are probably far more vulnerable in most situations.

Part of the implementation goal in T5685 is literally to make security errors painfully common so that users are more confident that their data is safe.

This is not a security goal, it's a "security theater" goal.


Arbitrary User Content

We serve arbitrary user content. I consider serving arbitrary content to be important to normal software development processes (for example, you should be able to upload or download a .html file, .svg file or .swf file from a user upload or from a commit, and get the exact same file downloaded that was originally uploaded).

The strategy used by JIRA, Redmine and Bugzilla (at least, by default) is vulnerable, or was vulnerable in the past, to various attacks involving UAs that ignore "Content-Disposition", flash text-bytecode attacks, Java applet attacks, and long ago to content-sniffing attacks, and fundamentally relies heavily on enumerating and blacklisting bad content. Although browsers and user agents are less vulnerable to this class of attacks today than they were in the past, this class of attacks has been the cause of vulnerabilities (and browser countermeasures) for years.

Two requirements I'm not willing to yield on are:

  • Content MUST be served from a user content domain, not the primary domain.
  • The user content domain MUST NOT have cookies which allow any kind of meaningful privilege escalation. It should be safe to execute an arbitrary .html file on the use content domain without risk of XSS, CSRF, etc (less severe issues like content injection / phishing / redirection are unavoidable and acceptable). We'll still try not to execute arbitrary HTML files, of course, but the failure of this blacklisting should not be a meaningful compromise.
eadler added a subscriber: eadler.Mar 3 2016, 1:35 AM

https://www.w3.org/TR/capability-urls/ is relevant - Good Practices for Capability URLs.

cache-control: private is defined in rfc7234 (which obsoletes rfc2616) but I do not know if it is commonly respected.

Naive question, but would it be possible to have a new "disable security theater" setting and have Phab then do "Unique URL Hash + Fully Cacheable Resources" like everyone else?

I'm strongly resistant to that, see T8227.

Understood. I'm of the very same opinion when it comes to adding settings / preferences to a product. It shouldn't be a way to avoid taking a decision or pushing the problem to the user.

What's the alternative here though? If I'm reading this correctly, some people want "security theater" (they have their reasons, that's fine), but now, in the very words of this task, this causes "nearly universal performance problems".

Sounds like a deadlock: either someone has to give in (perceived sense of security OR perceived UI responsiveness), or a setting is needed to satisfy everyone, no?

chad added a comment.Mar 15 2016, 4:21 PM

I imagine we also want to just remove the feature, but some install(s) rely on it? The point of this task then is to be a discussion around removing it outright, and giving those who rely on it time for input.

cduruk added a subscriber: cduruk.Mar 15 2016, 4:25 PM

In some cases, the effects of the theater can be mitigated issue-by-issue. For example, we may be able to write caching/rendering logic in Javascript to mitigate the performance impact on lightboxes and Pholio. We may be able to attach images directly to email. There may be a workaround for the Safari issue, or we may be able to do preview rendering partially in Javascript. With Quicksand, we can also increase the "hit rate" of client-side Javascript caches, at the cost of additional complexity.

There is probably little we can do about the general case (e.g., Phriction) or sharable URLs, although maybe solutions or workarounds will present themselves as options are evaluated for other issues. We could provide an explicit "generate a shareable URL" action, for example.

We may also be able to find data about how broadly "Cache-Control: private" is respected, and use that to inform a decision on using it, which could improve some of these behaviors.

We may be able to find an approach to remove this from the upstream without breaking installs or requiring us to maintain options by modularizing this part of the stack.

All else being equal, I would prefer to remove this code outright. This would be particularly problematic for one install, but user confusion about the security model is not limited to one install. For example, we received multiple HackerOne reports about this being a vulnerability before T5685. The hash + cacheable model defies user expectation.

Naive question, but would it be possible to have a new "disable security theater" setting and have Phab then do "Unique URL Hash + Fully Cacheable Resources" like everyone else?

"disable security theater" is both appropriate and hilarious.

No to beat on a dead horse, but I just finished creating a new wiki page with ~20 screenshots of mobile UI (in a series of tables to easily compare UIs on different OSes), and the fact images don't cache is a massive pain:

  • whenever editing the markup, even to add a single character, the preview area needs to reload 2+ MBs of screenshots
  • not only is this slow, but the layout of the preview jumps all around while the images are (re)loading, so you can't navigate easily
  • same thing when saving the changes: everything needs to reload again from scratch and layout jumps again all over the place

@epriestley I understand how you're very wary of adding new settings to Phab, but how is the current situation ideal for users? I mean this is 2016: having a decent UX when editing a wiki page with 20 images should be a solved problem 😉 We're on Phacility so we can't patch Phab but otherwise, I would most likely have disabled that "security" behavior somehow.

@chad You said "The point of this task then is to be a discussion around removing it outright, and giving those who rely on it time for input." but are such folks even aware that such a discussion is ongoing? This task doesn't seem active.

eadler added a project: Restricted Project.Mar 28 2016, 8:23 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Mar 28 2016, 8:29 PM
cduruk removed a subscriber: cduruk.Mar 28 2016, 8:35 PM

@csteipp was one of the people who originally advocated for such strict file visibility enforcement. He would be the best one to respond with justification for wanting the feature in the first place.

Personally I think that the less strict Unique/unguessable URL Hash is perfectly good enough.

Here's another big problem (IMHO) I just realized: using Phab on mobile or through tethering on your laptop is terrible: the fact user images don't get cached, kills usability: it takes forever to load / reload anything with user images.

I've really never experienced any poor performance when using phabricator. It's not blazing fast but that is mostly due to the complexity of the server side php bits. I have literally never seen slowness in loading static resources.

I am, however, fortunate enough to have a fast-enough connection (50 megabits per second at home, usually >10mbps on my phone) so I'm not accustomed to using phabricator over a very slow link.

User profile images should be getting cached. For example, your profile image here:

https://p.phcdn.net/file/data/@secure/4vmnhfzvqgco7knqd7pw/PHID-FILE-jju47qlbewk6awzgqfqa/profile-github_-profile.jpg

...is cacheable and CDN'd:

$ curl -v https://p.phcdn.net/file/data/@secure/4vmnhfzvqgco7knqd7pw/PHID-FILE-jju47qlbewk6awzgqfqa/profile-github_-profile.jpg
*   Trying 2400:cb00:2048:1::6812:20ec...
* Connected to p.phcdn.net (2400:cb00:2048:1::6812:20ec) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
* Server certificate: sni26336.cloudflaressl.com
* Server certificate: COMODO ECC Domain Validation Secure Server CA 2
* Server certificate: COMODO ECC Certification Authority
* Server certificate: AddTrust External CA Root
> GET /file/data/@secure/4vmnhfzvqgco7knqd7pw/PHID-FILE-jju47qlbewk6awzgqfqa/profile-github_-profile.jpg HTTP/1.1
> Host: p.phcdn.net
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: cloudflare-nginx
< Date: Thu, 31 Mar 2016 17:25:40 GMT
< Content-Type: image/jpeg
< Expires: Fri, 31 Mar 2017 17:25:40 GMT
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
...
< CF-Cache-Status: HIT
< Cache-Control: public, max-age=31536000
...

This should be true on Phacility instances as well. Here's my profile picture on meta.phacility.com:

https://d3cn662t4iw3q4.cloudfront.net/file/data/@meta/nzwdv5dkkwhdse2cspak/PHID-FILE-czwrzidp6jgarqopyezh/profile

$ curl -v https://d3cn662t4iw3q4.cloudfront.net/file/data/@meta/nzwdv5dkkwhdse2cspak/PHID-FILE-czwrzidp6jgarqopyezh/profile
*   Trying 54.230.144.140...
* Connected to d3cn662t4iw3q4.cloudfront.net (54.230.144.140) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: *.cloudfront.net
* Server certificate: Symantec Class 3 Secure Server CA - G4
* Server certificate: VeriSign Class 3 Public Primary Certification Authority - G5
> GET /file/data/@meta/nzwdv5dkkwhdse2cspak/PHID-FILE-czwrzidp6jgarqopyezh/profile HTTP/1.1
> Host: d3cn662t4iw3q4.cloudfront.net
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: image/png
< Content-Length: 17135
< Connection: keep-alive
< Accept-Ranges: bytes
< Date: Thu, 31 Mar 2016 17:26:59 GMT
< Expires: Sat, 30 Apr 2016 17:26:59 GMT
...
< Age: 30
< X-Cache: Hit from cloudfront
< Via: 1.1 ff09df097f823b2834408d17e9779d62.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: Ia8rDvYZWKrJxZwyjs-fPBGnP7ZCHnHwUH5jKnLdmNPbBkXjCZg1AA==

When I said "user images" above, I meant to say "user uploaded images" like what you drag & drop in Pholio, Phriction, Maniphest, etc... i.e. explicitly not Phab built-in images which are indeed cached. Sorry if that was confused with "user profile images". There are no performance problems here. In general I have observed no performance issues with Phacility outside of the matter tracked by this very task.

Try editing a wiki page with a number of reasonably sized images in it over tethering (even LTE) and you'll see what I mean.

I'm on my laptop using tethering on T-Mobile right now and I just noticed the user profile photos in the workboards are not being cached. Whenever I reload the page (Chrome 49.0.2623.110 on OS X 10.11.4), these photos reload from scratch. Here are the headers as reported by the Chrome inspector:

General

Request URL:<REDACTED>
Request Method:GET
Status Code:200 OK
Remote Address:54.192.142.190:443

Response:

Accept-Ranges:bytes
Age:100
Connection:keep-alive
Content-Length:23766
Content-Type:image/png
Date:Wed, 06 Apr 2016 15:50:52 GMT
Expires:Fri, 06 May 2016 15:50:52 GMT
Server:Apache
Strict-Transport-Security:max-age=0; includeSubdomains; preload
Via:1.1 4f54bc08924be63f55c889270941ed23.cloudfront.net (CloudFront)
X-Amz-Cf-Id:vRmklEMhh-YtXddqyaRGUrE-m53rJKTzYQb1HCtJiuTVDQ726z-yzQ==
X-Cache:Hit from cloudfront
X-Content-Type-Options:nosniff
X-Frame-Options:Deny

Request:

Accept:image/webp,image/*,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,fr;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
DNT:1
Host:d3cn662t4iw3q4.cloudfront.net
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36

Interestingly, even though the user profile photos in the workboard are like 20x20px, it's downloading what appears to be the full-size profile photo (23K). Compare this with the photo downloaded when browsing the user profile page e.g. https://<REDACTED>.phacility.com/p/pol/ which is only 3.5KB.

Let me explain the WMF use case a little more. We have lots of users create public tasks, often with pictures attached, which should be private (security/privacy issues, legal issues, etc). We push notifications of new tasks to our public irc channels, and a lot of our users subscribe to huge numbers of projects. So once the task is created, the content is widely dispersed. And we need the ability to make that as private as possible.

We migrated from bugzilla where this Just Worked, so we've come to rely on it for our process.

If the unique url hash changed whenever the visibility policy changed, that would be fine. Bugzilla handled this by only requiring the expiring token for attachments to bugs that were private-- I haven't looked at the authz code since the big update a few months back, so I have no idea if that's possible or not.

If the unique url hash changed whenever the visibility policy changed, that would be fine.

Ah, thanks! This is practical to implement and seems like a reasonable compromise to me. I'll realign this task.

epriestley renamed this task from Paranoid file URL security mode causes nearly universal performance problems to Scramble file secrets when attached objects change their view policies.Apr 6 2016, 6:51 PM

D15641 has the technical details, but here's the behavior I've implemented:

  • When a file's view policy changes, we scramble it.
  • When an object's view policy or space changes, we scramble all attached files.

When we "scramble" a file:

  • If the file's view policy is "public" or "all users", we do nothing.
  • Otherwise, we regenerate the internal secret hash which feeds into access control. This will break existing URIs, and implicitly create new ones.

Currently, we still have the one-time token behavior on top of this so there's no meaningful change to actual caching behaviors, but I should be able to remove that now.

On this:

Whenever I reload the page (Chrome 49.0.2623.110 on OS X 10.11.4), these photos reload from scratch.
...
Expires:Fri, 06 May 2016 15:50:52 GMT
X-Cache:Hit from cloudfront
...
Cache-Control:max-age=0
...

I think your browser (or some other part of your user agent) is intentionally invalidating its cache (and sending this Cache-Control header to request that intermediate caches also invalidate), presumably because you're reloading the same page.

Browsers should normally cache resources with an "Expires" header (visible in the response), and CloudFront has cached this response (visible in X-Cache). This doesn't look like a problem on our side to me.

Compare this with the photo downloaded when browsing the user profile page

I can't reproduce this. Both pages generate the exact same URI for me:

https://p.phcdn.net/file/data/@secure/6jr7vs4rmmlnids4tac7/PHID-FILE-clfyipsks65542hv4vpd/profile

D15642 has additional details, but the new header/cache behavior is:

  • Access to files is controlled by knowing a URI with a 100-bit random secret, as before, with a lifetime of 30 days.
  • This URI changes when the file is scrambled, described above (file policy changes, or policy/space of any object it is attached to changes).
  • Immutably public files (CSS, JS, profile images, etc) use "Cache-Control: public" and should be cached by CDNs.
  • Files with mutable policies (all normal user uploads) use "Cache-Control: private" and can serve through CDNs, but should not be cached by CDNs.
    • When their URIs scramble, the effect should be immediate for users who don't already have the data locally.
    • If you have a bad corporate proxy on your corpnet that treats "Cache-Control: private" as "Cache-Control: public", this won't be true. The burden is on you to fix it.
  • No more one-time tokens.

This represents a slight compromise in favor of paranoia over the old behavior, but I think the compromise is generally reasonable and that this is probably a better model overall, and one which is more consistent with paranoia-favoring attitudes elsewhere in the product.

Those changes are live on this server now, let us know if you run into issues.

Previews are still flickery for me but I think that's a mostly-unrealted browser issue with redirect caching, and likely have a reasonable fix for it.

Thanks @epriestley, much looking forward to have this deployed to Phacility.

Regarding the user profile photos, I'm now on WiFi, not tethering through my phone and LTE anymore, but the workboard is still redownloading the photos. Some resources like .otf have 304s, but user photos are always 200 and re-downloaded AFAIK.

Regarding the image sizes, here's what I have in the Chrome inspector: 2 out of 4 are downloaded full-size instead of thumbnail-size or something:

I can't reproduce that unless I actually use the "reload" command to reload the page, which looks like it makes Chrome dump caches on my system.

Browsing between pages by following links gives me 200 + from cache.

I'm using the reload button next to the URL field indeed. I'm not sure this trashes the cache because if that was the case, there would be no 304s, right?

Chrome is free to send more "If-Modified-Since" headers when you press reload than when you navigate. I have no idea what its exact behaviors are.

When I press reload, I get a request:

When I click to the project detail page, then click the link to return to the workboard, as I would when browsing normally, I get a cache hit:

I conclude that caching works fine and reloading changes Chrome's behavior to something other than normal browsing behavior, which is more aggressive about revalidating or fetching data.

epriestley closed this task as Resolved.Jun 8 2016, 6:03 PM
epriestley claimed this task.

This has been live for a while and appears to be working properly. If anyone is still seeing specific, reproducible problems with cache strategies, feel free to file a new issue detailing them.