Page MenuHomePhabricator

Endpoint+controller for a remarkup image proxy
ClosedPublic

Authored by jcox on Sep 21 2016, 2:02 PM.
Tags
None
Referenced Files
F18825018: D16581.id.diff
Thu, Oct 23, 7:03 PM
F18821458: D16581.diff
Wed, Oct 22, 11:19 PM
F18819692: D16581.id39907.diff
Wed, Oct 22, 8:37 AM
F18804186: D16581.id39906.diff
Sat, Oct 18, 6:40 AM
F18803922: D16581.id39906.diff
Sat, Oct 18, 4:52 AM
F18798937: D16581.id39932.diff
Fri, Oct 17, 10:08 AM
F18798617: D16581.id39907.diff
Fri, Oct 17, 7:16 AM
F18798584: D16581.id39923.diff
Fri, Oct 17, 7:02 AM

Details

Summary

Ref T4190. Currently only have the endpoint and controller working. I added caching so subsequent attempts to proxy the same image should result in the same redirect URL. Still need to:

  • Write a remarkup rule that uses the endpoint
Test Plan

Hit /file/imageproxy/?uri=http://i.imgur.com/nTvVrYN.jpg and are served the picture

Diff Detail

Repository
rP Phabricator
Branch
ImageProxy (branched from master)
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 13792
Build 17820: Run Core Tests
Build 17819: arc lint + arc unit

Event Timeline

jcox retitled this revision from to Endpoint+controller for a remarkup image proxy.
jcox updated this object.
jcox edited the test plan for this revision. (Show Details)
jcox edited edge metadata.

Cache the image view URI so we don't fetch the same image twice

Looks good so far. I think the cache needs to be slightly more complicated than this -- I'll expand in T4190.

src/applications/files/controller/PhabricatorFileImageProxyController.php
21–22

Unless you need to, prefer $request->getStr('...') (or other typed accessors) over reading the entire data directly.

In PHP, users can request ?uri[x]=y, which PHP parses as an array. Then you end up with an array-valued $img_uri and probably a fatal a few lines below (but, in extreme cases, you can get a security vulnerability instead).

To test this, you should be able to access /file/imageproxy/?uri[]=anything on this code. I expect you'll get a fatal or some bizarre behavior.

Added a cache storage table so we can clean up the cache.
Don't quite have this working yet.

Got the DB backed cache working

epriestley added a reviewer: epriestley.

Some minor feedback, but mostly stylistic stuff. Tackle whatever makes sense; we can always revisit things later.

src/applications/files/controller/PhabricatorFileImageProxyController.php
25–27

This should query by uriHash = %s ... PhabricatorHash::digestForIndex($img_uri), since that column is the one with a key on it. Once the table gets big, query using a keyed column will execute much more quickly than this query.

(You can also omit LIMIT 1 since there's a unique key, and if we get more than one result back somehow it's probably good for loadOneWhere() to throw an exception instead of picking the first result.)

35

You can executeOne() above (instead of execute()) to avoid this (although maybe you're trying to dodge policy exceptions)?

41

For now, you could just throw an exception here (similar to what we'll do later on).

Actually, can you restructure things a bit so that the code is shared in both cases? Maybe something like:

$response = $this->getExternalResponse($cached_external_request);
if ($response) {
  return $response;
}

// <...do all the cache stuff...>

return $this->getExternalResponse($freshly_generated_external_request);

Maybe it can't look quite like that, but I think the logic for "turn an ExternalRequest object into a response" is the same in both cases, and it should be possible to consolidate them.

45–52

We should perform these validations before checking the cache so that if there's, say, a bug in requireValidRemoteURIForLink() then we can just push a fix and it will apply retroactively. By doing these checks later, existing security issues/buggy data already in the cache might remain active after a bugfix.

88

This seems fine to me.

89–94

I think it's probably possible for this to race:

  • Clear the cache.
  • Load a page with two (or more) references to the same image.

I suspect both pages will fairly regularly get through the initial check (cache is empty) and run the request, then whichever one is a little slower will fail here with an AphrontDuplicateKeyQueryException when it tries to save a second copy of the same record.

Some ways we could deal with this:

  1. Do nothing for now until we see that it's actually an issue.
  2. Catch AphrontDuplicateKeyQueryException, load the colliding record, pretend that one is the one we saved.
  3. Hold a lock based on the URI while making the request.

I think (1) is fine for now since multiple copies of an image on a page will be rare.

When this kind of request-based race happens we usually do something in the vein of (2), but I think there are at least some arguments for doing (3) instead here: if we don't lock, the duplicate requests will write duplicate files; and they'll incur duplicate points on the user's rate limit; and they'll make duplicate requests to the remote server. These costs are small but higher than the costs we usually face when choosing to catch the exception and load the colliding record.

The lock would look something like this:

try_the_cache();
$lock = PhabricatorGlobalLock::newLock($lock_name_based_on_URI_hash);
try_the_cache_again();
make_the_request_and_write_the_data_and_cache();
$lock->unlock();

Handling this is probably worth splitting out into a followup change, since I'd guess it will be somewhat hard to hit this in the first place.

100–105

To slightly reduce the amount of code here, you could create this object earlier, and then share it in both the success and failure branches.

src/applications/files/garbagecollector/PhabricatorFileTempGarbageCollector.php
3 ↗(On Diff #39923)

Maybe call this "PhabricatorFileExternalRequestGarbageCollector"? Using "Temp" in the name and "tempttl" as the key confuses it with "Temporary Files", which are a separate type of file with a TTL on the file itself.

src/applications/files/storage/PhabricatorFileExternalRequest.php
26 ↗(On Diff #39923)

We should also put key_tll here on the ttl column, so the GC query can take advantage of it.

I anticipate probably wanting to query this table when viewing the file detail page ("Was this an external request?") so it would also be reasonable to add a key_file on filePHID, although it's possible we won't actually use that so we could also wait until we're sure we really want it.

This revision now requires changes to proceed.Sep 22 2016, 7:18 PM
src/applications/files/controller/PhabricatorFileImageProxyController.php
25–27

ah yep. that was an oversight on my part.

35

I copied this pattern from elsewhere so there wasn't much reason behind it, other than I didn't know about executeOne.

Ah. execute() and executeOne() mostly do what you might expect (return a list vs return a single result), but have slightly different behavior when results aren't visible to the viewer:

  • execute() throws a policy exception if the matching object exits but isn't visible (technically: the user doesn't have all the required capabilities);
  • executeOne() just removes the un-permitted results from the list instead.

This mostly corresponds to the default behavior on a page like /T123 (show the user a policy exception) vs a page like /maniphest/ (don't list tasks the user can't see).

Occasionally code will do execute() + head($results) to dodge policy exceptions or do other sketchy/unusual things.

(This behavior can be adjusted with setShouldRaisePolicyExceptions(...) on the Query object.)

jcox edited edge metadata.
jcox marked 11 inline comments as done.

Tidied up code a bit based on code review

Updated based on CR comments

epriestley edited edge metadata.

Cool, this looks good to me. I'll write up a little guidance on the remarkup rule.

src/applications/files/application/PhabricatorFilesApplication.php
81

Oh, and nice catch on this being defunct.

src/applications/files/controller/PhabricatorFileImageProxyController.php
105

It's possible (albeit unlikely) that $file will come back null here. An administrator would pretty much have needed to go delete the file out from under our noses, but it would be a little cleaner to throw an explicit exception instead of fataling on $file->getViewURI() below.

This revision is now accepted and ready to land.Sep 23 2016, 1:10 PM
src/applications/files/controller/PhabricatorFileImageProxyController.php
105

Ah good call. If the file is null should we just go ahead and delete the cached request? Or should we leave it to the user to fix it, since there might be other weirdness that they were messing with?

I'd probably just throw and we can deal with it if/when it ever occurs in the wild. I can't come up with any way that we'd actually end up in this state.

I could imagine deleting the request possibly leading to some weird loop where we have a bug with loading the file, causing us to fetch and delete it over and over again. This could happen anyway and the rate limiting would stop it, but trying to recover here might make it slightly more likely. This is pretty weak reasoning but the best argument I can come up with one way or the other.


I think there's also a minor permissions improvement we can make: right now, I think users other than the first user to request the file won't actually be able to see it: the policy gets set to "No One" + specific author, which means "Only That User".

Two things here: first, the authorPHID itself. I think we probably should not set the authorPHID, since it's not necessarily the actual "author" in any real sense -- it's just the first person to fill the cache. This information feels like a little bit of a policy leak to me, not correct behavior for this request cache. We could fix this by either not setting the authorPHID at all, or setting it to the Files application PHID.

Second, the actual policy issue. We could set the file policy to "Public", but that would make the file listable in the file list at /file/. This seems slightly wrong -- if you know the URI you should be able to see the file, but if you don't know the URI you probably shouldn't be able to discover all the URIs that users are embedding anywhere.

For now, I think it's probably better to leave the policy at POLICY_NOONE, not set authorPHID(), and use PhabricatorUser::getOmnipotentUser() as the $viewer when querying for the file in getExternalResponse() to bypass policies. We know it's OK to bypass policies on the file since we looked up the PHID on an ExternalRequest, so the file isn't private/secret.

I think that should give us reasonable behavior for now.

src/applications/files/controller/PhabricatorFileImageProxyController.php
77

Oh, one other thing is that we should probably not set the authorPHID here -- see some discussion below.

jcox edited edge metadata.

Deal with null file and bypass policies for external files

This revision was automatically updated to reflect the committed changes.