Page MenuHomePhabricator

`arc download` doesn't support chunked downloads
Closed, ResolvedPublic

Description

It seems that arc download doesn't support chunked downloads, whereas arc upload does support chunked uploads.

> arc --trace --conduit-uri='https://phabricator.mydomain.com' download F92760
libphutil loaded from '/home/joshua/workspace/github.com/phacility/libphutil/src'.
arcanist loaded from '/home/joshua/workspace/github.com/phacility/arcanist/src'.
Config: Reading user configuration file "/home/joshua/.arcrc"...
Config: Did not find system configuration at "/etc/arcconfig".
Working Copy: Reading .arcconfig from "/home/joshua/workspace/github.com/phacility/phabricator/.arcconfig".
Working Copy: Path "/home/joshua/workspace/github.com/phacility/phabricator" is part of `git` working copy "/home/joshua/workspace/github.com/phacility/phabricator".
Working Copy: Project root is at "/home/joshua/workspace/github.com/phacility/phabricator".
Config: Did not find local configuration at "/home/joshua/workspace/github.com/phacility/phabricator/.git/arc/config".
Loading phutil library from '/home/joshua/workspace/github.com/phacility/phabricator/src'...
>>> [0] <conduit> conduit.connect() <bytes = 586>
>>> [1] <http> https://phabricator.mydomain.com/api/conduit.connect
<<< [1] <http> 1,224,952 us
<<< [0] <conduit> 1,225,282 us
Getting file information...
>>> [2] <conduit> file.info() <bytes = 179>
>>> [3] <http> https://phabricator.mydomain.com/api/file.info
<<< [3] <http> 437,911 us
<<< [2] <conduit> 438,102 us
Downloading file 'phabricator.box' (1,024,403,951 bytes)...
>>> [4] <conduit> file.download() <bytes = 212>
>>> [5] <http> https://phabricator.mydomain.com/api/file.download
<<< [5] <http> 55,982,948 us
<<< [4] <conduit> 55,983,142 us

[2015-05-29 00:49:10] EXCEPTION: (HTTPFutureHTTPResponseStatus) [HTTP/500] Internal Server Error
>>> UNRECOVERABLE FATAL ERROR <<<

Maximum execution time of 30 seconds exceeded

/usr/src/phabricator/src/aphront/response/AphrontResponse.php:159


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻ at [<phutil>/src/future/http/BaseHTTPFuture.php:339]
arcanist(head=master, ref.master=8fe013b0ecb5), phabricator(head=master, ref.master=1aa8bc319b52), phutil(head=master, ref.master=693207bcd81d)
  #0 BaseHTTPFuture::parseRawHTTPResponse(string) called at [<phutil>/src/future/http/HTTPSFuture.php:415]
  #1 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:39]
  #2 Future::resolve(NULL) called at [<phutil>/src/future/FutureProxy.php:36]
  #3 FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:58]
  #4 ConduitClient::callMethodSynchronous(string, array) called at [<arcanist>/src/workflow/ArcanistDownloadWorkflow.php:99]
  #5 ArcanistDownloadWorkflow::run() called at [<arcanist>/scripts/arcanist.php:382]

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Arcanist.
joshuaspence added a subscriber: joshuaspence.

I have no idea why this works through the web UI...

I'm also seeing this issue, even just using curl -d api.token=... https://phabricator.url/api/file.download

This might be because the file in question is ~150MB, so it's probably trying to download the whole thing and base64 it (?!) before spitting it back.

I'm also seeing this issue, even just using curl -d api.token=... https://phabricator.url/api/file.download

I wouldn't expect curl to ever support this. The way arc download will support chunked files is by downloading chunks and then pieces that back together client side. This wouldn't be possible with pure curl.

@epriestley, I was thinking of implementing this as follows:

  1. Have file chunks inherit policies from the parent file, so that individual chunks can be downloaded with file.download.
  2. Return file PHIDs for chunks from file.querychunks so that we can iterate over all chunks.
  3. Write an ArcanistFileDownloader class similar to ArcanistFileUploader.

I was hoping to copy whatever "Download File" does, but its not obvious to me how this functionality works.

A possibly easier approach is to let the user just get an HTTP URI for the download (i.e. a normal, non-Conduit URI) and then call it in a standard way. Maybe by adding a needDownloadURIs parameter to file.info (or waiting for T7715 and adding it to file.query) -- the links are slightly expensive to generate because we have to do one-time token writes so it's probably better not to generate them by default.

This is much simpler, although it won't support parallel downloads or resumable downloads by default. However, we should accept and respect the "Range" HTTP header and be able to process it efficiently, so if we did want to build those features it would probably be less complex overall to just use "Range" headers on the client.

I think that would leave us with only two potential scale issues:

  • We'll still buffer the whole file in memory on the client. For very large files, we need to adjust HTTPSFuture to be able to stream response bodies directly to disk (this isn't too difficult).
  • We may still need a set_time_limit(-1) call somewhere in the pathway of PhabricatorFileDataController to actually prevent the 30s timeout.

Could the server do the chunk stitching and stream the response to the
client, rather than fetching it all up front?

Could the server do the chunk stitching and stream the response to the client, rather than fetching it all up front?

This is what it does for normal HTTP downloads.

Can you point me at the code where the chunk stitching happens?

PhabricatorFileChunkIterator does the stitching. Basically the flow is like this:

  • The controller loads the PhabricatorFile and does policy, etc., checks.
  • It asks the StorageEngine for an iterator, then configures it with start/end if there's a "Range" header.
  • It wraps the iterator in a FileResponse and returns that.

The FileResponse iterates the iterator and dumps the results out as a response stream.

The iterator loads chunks one at a time and hands them off to the response.

eadler added a project: Restricted Project.Aug 7 2016, 8:00 PM

After D17614, arc download uses file.search to retrieve a URI it can GET, then does a normal HTTP GET to that URI, retrieving the file content in the HTTP response body.