Details

Reviewers

Group Reviewers

Maniphest Tasks

T5479: Unbeta Phragment
T4884: At some point, review hach-que's diffs

Summary

This allows users to configure Phabricator to redirect to Amazon S3 for file downloads. This allows bandwidth costs to be mitigated to Amazon S3 and means that Phabricator doesn't need to download data from S3 to serve it back to the user.

Test Plan

Tested on https://code.redpointsoftware.com.au/. Applied the patch, migrated all the file data, and played around with the configuration options (and tested both downloading files and requesting a Phragment list via getstate).

Diff Detail

Repository

rP Phabricator

Branch

arcpatch-D8655_1

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 2130
Build 2134: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

hach-que updated this revision to Diff 20521.Mar 31 2014, 5:37 AM

hach-que retitled this revision from to Allow Phabricator to redirect to Amazon S3 / CloudFront for file downloads.

hach-que updated this object.

hach-que edited the test plan for this revision. (Show Details)

hach-que added a reviewer: epriestley.

Herald added a reviewer: Blessed Reviewers.Mar 31 2014, 5:37 AM

Herald added subscribers: Korvin, epriestley.

Append the name parameter if provided

Append the name parameter if provided, since S3 doesn't provide redirection rules that we can use to remove any trailing characters, nor does CloudFront. A user accessing files still needs to know the seed, so this doesn't allow users to access files based purely on the name..

Trim trailing slashes on config entry

Add option to configure whether to bypass data controller

This allows Phabricator to be configured to return URLs directly pointing to Amazon instead of redirecting through the data controller. Obviously this exposes raw links to Amazon through the UI and Conduit, but it allows methods such as phragment.getstate to return the public URIs (which means clients don't have to hit Phabricator to download lots of files).

hach-que edited the test plan for this revision. (Show Details)Mar 31 2014, 7:28 AM

Throughout this diff, the use of "CloudFront" probably suggests an undesirable configuration. Particularly, the right way to configure Cloudfront (or another CDN) in the long term is to put the entire file domain behind it, so static resources (JS, CSS, images) also benefit from being served through Cloudfront (see T2382).

What's the value of storage.use-direct-public-uri? It seems somewhat hard for users to evaluate the tradeoffs, limits our ability to detect or correct errors, breaks if the storage engine is changed, etc., and we just save one redirect which will eventually be served by a CDN anyway. Am I missing stuff, or is this basically just a micro-optimization?

I'd suggest:

Remove references to CloudFront, since it encourages configuration of a CDN a layer below where it should actually go. When we get around to T2382, we'll walk users through setting up CloudFront or another CDN, and all resources will benefit.
If possible, make storage.s3.public-uri a boolean instead of a string, so users have a more difficult time misconfiguring it.
Remove storage.use-direct-public-uri unless there's a really compelling reason for it.

Then some practical/technical issues I hit:

Enabling "Static Web Hosting" doesn't make all the existing junk in my bucket public for me.
We store files in S3 as /aa/bb/cc/defghijklmnop, which is probably secret enough, but means that files download as defghijklmnop instead of dog.jpg, which is a (pretty significant?) usability issue.
- We probably need to (a) publish to /aa/bb/cc/defghijklmnop/dog.jpg and possibly (b) include Content-Disposition information.
- A downside of including Content-Disposition when publishing to S3 is that we can't change it later. For example, if users change configuration so PDFs are viewable in the browser, any PDFs which have already published will continue to download instead of display.
- A further downside is that since we deduplicate content, if I upload a picture of you as "jerk.jpg", and you upload the same picture as "coolguy.jpg", your file will download as "jerk.jpg".
- We could disable content duplication detection for S3 to get around this, although this adds complexity and implies tradeoffs.

As we dig into this, I wonder if T2382 might be a better fix? Using S3 as a file server instead of just a datastore is exposes more complexity than I'd initially thought.

src/applications/files/config/PhabricatorFilesConfigOptions.php
145–147	This caveat is a little misleading -- we always serve files without authentication if you know the data URI for a file, no matter what the file policy is. So this doesn't actually change anything in terms of policies: even without this flag enabled, someone who knows the data URI can access the file, regardless of policy settings. Instead of having the user set this URI, can we figure it out ourselves (e.g., via the S3 API)? Are there reasons they would want some specific value here other than "yes/no"? Is the process of enabling 'Static Web Hosting' obvious, impossible to get wrong, and free of side effects (like also enabling directory listings)? Writing `NOTE:` will disable the specialized note formatting, just use `NOTE:`. Here are examples -- this is with asterisks: NOTE: Some note. This is without: NOTE: Some note.

This revision now requires changes to proceed.Mar 31 2014, 3:52 PM

Oh, you did deal with the filename thing.

I think the 'name' parameter is not normalized, and needs to go through PhabricatorFile::normalizeFileName() to be usable in a URI. For example, it may contain #, ?, /../, etc.

Unify configuration options

Removed a CloudFront reference

hach-que retitled this revision from Allow Phabricator to redirect to Amazon S3 / CloudFront for file downloads to Allow Phabricator to redirect to Amazon S3 for file downloads.Mar 31 2014, 11:38 PM

hach-que updated this object.

I'm finding this causes a few problems, especially because Amazon forces file downloads. This means it's impossible to link people to pictures uploaded on a Phabricator instance without them logging in.

I'm tempted to change this so that the direct public URI is mostly an internal thing, and we have an option in Phragment to use it when returning file URIs, or provide a Conduit API to resolve direct public URIs for a specified list of files. Exposing this functionality to all file links breaks a few workflows.

@epriestley Does this sound like a better solution?

Additionally, if this functionality was restricted to consumers of Conduit then it means we don't need to deal with appending the file name to Amazon storage or deal with the deduplication issue (as a consumer of Conduit, one assumes the program can save the file as whatever it likes).

Only use direct, public URLs for files on S3 in Phragment's Conduit APIs

Herald added a subscriber: hach-que.Apr 3 2014, 6:23 AM

Oops

joshuaspence added a subscriber: joshuaspence.Jun 20 2014, 5:29 AM

Rebase on master

Harbormaster completed remote builds in B2130: Diff 24556.Aug 9 2014, 5:32 AM

Harbormaster completed remote builds in B2130: Diff 24556.Aug 11 2014, 7:59 AM

chrisbolt added a subscriber: chrisbolt.Aug 12 2014, 1:33 AM

I think this no longer makes sense to ever bring upstream: we can now stream large files, and do not store large files as single S3 artifacts.

This revision now requires changes to proceed.May 13 2015, 1:55 PM

To be solved with T5517 instead.

Allow Phabricator to redirect to Amazon S3 for file downloads
AbandonedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 24556

src/applications/files/config/PhabricatorFilesConfigOptions.php

src/applications/files/engine/PhabricatorFileStorageEngine.php

src/applications/files/engine/PhabricatorLocalDiskFileStorageEngine.php

src/applications/files/engine/PhabricatorMySQLFileStorageEngine.php

src/applications/files/engine/PhabricatorS3FileStorageEngine.php

src/applications/files/engine/PhabricatorTestStorageEngine.php

src/applications/files/storage/PhabricatorFile.php

src/applications/phragment/conduit/PhragmentQueryFragmentsConduitAPIMethod.php

Allow Phabricator to redirect to Amazon S3 for file downloadsAbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 24556

src/applications/files/config/PhabricatorFilesConfigOptions.php

src/applications/files/engine/PhabricatorFileStorageEngine.php

src/applications/files/engine/PhabricatorLocalDiskFileStorageEngine.php

src/applications/files/engine/PhabricatorMySQLFileStorageEngine.php

src/applications/files/engine/PhabricatorS3FileStorageEngine.php

src/applications/files/engine/PhabricatorTestStorageEngine.php

src/applications/files/storage/PhabricatorFile.php

src/applications/phragment/conduit/PhragmentQueryFragmentsConduitAPIMethod.php

Allow Phabricator to redirect to Amazon S3 for file downloads
AbandonedPublic
Actions

Revision Contents
Changeset List