`arc download` doesn't support chunked downloads
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	joshuaspence
	May 29 2015, 12:50 AM

Description

It seems that arc download doesn't support chunked downloads, whereas arc upload does support chunked uploads.

> arc --trace --conduit-uri='https://phabricator.mydomain.com' download F92760
libphutil loaded from '/home/joshua/workspace/github.com/phacility/libphutil/src'.
arcanist loaded from '/home/joshua/workspace/github.com/phacility/arcanist/src'.
Config: Reading user configuration file "/home/joshua/.arcrc"...
Config: Did not find system configuration at "/etc/arcconfig".
Working Copy: Reading .arcconfig from "/home/joshua/workspace/github.com/phacility/phabricator/.arcconfig".
Working Copy: Path "/home/joshua/workspace/github.com/phacility/phabricator" is part of `git` working copy "/home/joshua/workspace/github.com/phacility/phabricator".
Working Copy: Project root is at "/home/joshua/workspace/github.com/phacility/phabricator".
Config: Did not find local configuration at "/home/joshua/workspace/github.com/phacility/phabricator/.git/arc/config".
Loading phutil library from '/home/joshua/workspace/github.com/phacility/phabricator/src'...
>>> [0] <conduit> conduit.connect() <bytes = 586>
>>> [1] <http> https://phabricator.mydomain.com/api/conduit.connect
<<< [1] <http> 1,224,952 us
<<< [0] <conduit> 1,225,282 us
Getting file information...
>>> [2] <conduit> file.info() <bytes = 179>
>>> [3] <http> https://phabricator.mydomain.com/api/file.info
<<< [3] <http> 437,911 us
<<< [2] <conduit> 438,102 us
Downloading file 'phabricator.box' (1,024,403,951 bytes)...
>>> [4] <conduit> file.download() <bytes = 212>
>>> [5] <http> https://phabricator.mydomain.com/api/file.download
<<< [5] <http> 55,982,948 us
<<< [4] <conduit> 55,983,142 us

[2015-05-29 00:49:10] EXCEPTION: (HTTPFutureHTTPResponseStatus) [HTTP/500] Internal Server Error
>>> UNRECOVERABLE FATAL ERROR <<<

Maximum execution time of 30 seconds exceeded

/usr/src/phabricator/src/aphront/response/AphrontResponse.php:159


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻ at [<phutil>/src/future/http/BaseHTTPFuture.php:339]
arcanist(head=master, ref.master=8fe013b0ecb5), phabricator(head=master, ref.master=1aa8bc319b52), phutil(head=master, ref.master=693207bcd81d)
  #0 BaseHTTPFuture::parseRawHTTPResponse(string) called at [<phutil>/src/future/http/HTTPSFuture.php:415]
  #1 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:39]
  #2 Future::resolve(NULL) called at [<phutil>/src/future/FutureProxy.php:36]
  #3 FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:58]
  #4 ConduitClient::callMethodSynchronous(string, array) called at [<arcanist>/src/workflow/ArcanistDownloadWorkflow.php:99]
  #5 ArcanistDownloadWorkflow::run() called at [<arcanist>/scripts/arcanist.php:382]

Revisions and Commits

rARC Arcanist
	D17614	rARC82b7cd778a28 Make "arc download" use "file.search" if available

Related Objects
Search...

		Status	Assigned	Task
		Open	None	T11357 Move Files to EditEngine and modern APIs
		Resolved	joshuaspence	T8348 `arc download` doesn't support chunked downloads

Event Timeline

joshuaspence created this task.May 29 2015, 12:50 AM

joshuaspence raised the priority of this task from to Needs Triage.

joshuaspence updated the task description. (Show Details)

joshuaspence added a project: Arcanist.

joshuaspence added a subscriber: joshuaspence.

I have no idea why this works through the web UI...

joshuaspence claimed this task.Jul 23 2015, 9:05 AM

I'm also seeing this issue, even just using curl -d api.token=... https://phabricator.url/api/file.download

This might be because the file in question is ~150MB, so it's probably trying to download the whole thing and base64 it (?!) before spitting it back.

In T8348#143186, @nornagon wrote:

I'm also seeing this issue, even just using curl -d api.token=... https://phabricator.url/api/file.download

I wouldn't expect curl to ever support this. The way arc download will support chunked files is by downloading chunks and then pieces that back together client side. This wouldn't be possible with pure curl.

@epriestley, I was thinking of implementing this as follows:

Have file chunks inherit policies from the parent file, so that individual chunks can be downloaded with file.download.
Return file PHIDs for chunks from file.querychunks so that we can iterate over all chunks.
Write an ArcanistFileDownloader class similar to ArcanistFileUploader.

I was hoping to copy whatever "Download File" does, but its not obvious to me how this functionality works.

A possibly easier approach is to let the user just get an HTTP URI for the download (i.e. a normal, non-Conduit URI) and then call it in a standard way. Maybe by adding a needDownloadURIs parameter to file.info (or waiting for T7715 and adding it to file.query) -- the links are slightly expensive to generate because we have to do one-time token writes so it's probably better not to generate them by default.

This is much simpler, although it won't support parallel downloads or resumable downloads by default. However, we should accept and respect the "Range" HTTP header and be able to process it efficiently, so if we did want to build those features it would probably be less complex overall to just use "Range" headers on the client.

I think that would leave us with only two potential scale issues:

We'll still buffer the whole file in memory on the client. For very large files, we need to adjust HTTPSFuture to be able to stream response bodies directly to disk (this isn't too difficult).
We may still need a set_time_limit(-1) call somewhere in the pathway of PhabricatorFileDataController to actually prevent the 30s timeout.

Could the server do the chunk stitching and stream the response to the
client, rather than fetching it all up front?

Could the server do the chunk stitching and stream the response to the client, rather than fetching it all up front?

This is what it does for normal HTTP downloads.

Can you point me at the code where the chunk stitching happens?

PhabricatorFileChunkIterator does the stitching. Basically the flow is like this:

The controller loads the PhabricatorFile and does policy, etc., checks.
It asks the StorageEngine for an iterator, then configures it with start/end if there's a "Range" header.
It wraps the iterator in a FileResponse and returns that.

The FileResponse iterates the iterator and dumps the results out as a response stream.

The iterator loads chunks one at a time and hands them off to the response.

epriestley added a project: Files.Jul 21 2016, 11:53 AM

Herald added a subscriber: eadler. · View Herald TranscriptJul 21 2016, 11:53 AM

epriestley added a parent task: T11357: Move Files to EditEngine and modern APIs.Jul 21 2016, 11:53 AM

epriestley moved this task from Backlog to vNext on the Files board.

epriestley moved this task from Backlog to API Changes on the Arcanist board.Jul 21 2016, 11:55 AM

eadler added a project: Restricted Project.Aug 7 2016, 8:00 PM

epriestley mentioned this in D17613: Only require POST to fetch file data if the viewer is logged in.Apr 4 2017, 4:10 PM

epriestley added a revision: D17614: Make "arc download" use "file.search" if available.Apr 4 2017, 4:53 PM

After D17614, arc download uses file.search to retrieve a URI it can GET, then does a normal HTTP GET to that URI, retrieving the file content in the HTTP response body.

epriestley closed this task as Resolved by committing rARC82b7cd778a28: Make "arc download" use "file.search" if available.Apr 4 2017, 11:16 PM

epriestley mentioned this in rP2896da384cb7: Only require POST to fetch file data if the viewer is logged in.

epriestley added a commit: rARC82b7cd778a28: Make "arc download" use "file.search" if available.

`arc download` doesn't support chunked downloadsClosed, ResolvedPublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

`arc download` doesn't support chunked downloads
Closed, ResolvedPublic
Actions

Related Objects
Search...