Page MenuHomePhabricator

Huge diffs (>8MB) should stream directly into file storage in Diffusion when downloaded
Closed, ResolvedPublic

Description

Hi, I have trouble downloading big diff files from server.

No configured storage engine can store this file. See "Configuring File Storage" in the documentation for information on configuring storage engines.

It seems that the T10186 is still not solved.
This is my settings:


and this is the debug of the error I encounter from the apache logs:

[Tue Feb 23 12:53:56.122521 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #1 <#2> PhabricatorFile::newFromFileData(string, array) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:218]
[Tue Feb 23 12:53:56.122525 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #2 <#2> PhabricatorFile::buildFromFileDataOrHash(string, array) called at [<phabricator>/src/applications/diffusion/controller/DiffusionCommitController.php:953]
[Tue Feb 23 12:53:56.122528 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #3 <#2> DiffusionCommitController::buildRawDiffResponse(DiffusionGitRequest) called at [<phabricator>/src/applications/diffusion/controller/DiffusionCommitController.php:31]
[Tue Feb 23 12:53:56.122532 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #4 <#2> DiffusionCommitController::handleRequest(AphrontRequest) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:237]
[Tue Feb 23 12:53:56.122535 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #5 phlog(Exception) called at [<phabricator>/src/aphront/handler/PhabricatorDefaultRequestExceptionHandler.php:32]
[Tue Feb 23 12:53:56.122538 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #6 PhabricatorDefaultRequestExceptionHandler::handleRequestException(AphrontRequest, Exception) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:632]
[Tue Feb 23 12:53:56.122541 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #7 AphrontApplicationConfiguration::handleException(Exception) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:242]
[Tue Feb 23 12:53:56.122544 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #8 AphrontApplicationConfiguration::processRequest(AphrontRequest, PhutilDeferredLog, AphrontPHPHTTPSink, MultimeterControl) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:149]
[Tue Feb 23 12:53:56.122550 2016] [php5:error] [pid 29189] [client 132.187.4.112:46528]   #9 AphrontApplicationConfiguration::runHTTPRequest(AphrontPHPHTTPSink) called at [<phabricator>/webroot/index.php:17]

Event Timeline

Hi! Can you tell me what version of Phabricator you're running?

The version is nomally reported in stack traces, but it looks like it either didn't end up there or you removed it. You can also find it in ConfigVersions in the web UI, or by grabbing the hash out of git show in phabricator/ on the command line.

See Contributing Bug Reports in the documentation for help on collecting information (like the current version) that we need in order to resolve bug reports. If you don't include this information we almost always have to ask you for it immediately and then wait for you to provide it, so you can speed up the process and get a solution to your problem faster by providing it in the first place. We really, truly need this information and aren't just asking you to do extra work for no reason!

Sorry I totally forget the version:
phabricator

023cfbb23af36b7f15be86a1f5950b62d68b246d (Mon, Feb 22)

arcanist

fcc11b3a278199fa15b90692aba65f54fef5d7f7 (Wed, Feb 10)

phutil

f43291e99d36045bc459e5133454c0d8fd8768ea (Jan 21 2016)
epriestley triaged this task as Wishlist priority.Feb 23 2016, 12:36 PM

Thanks, that's helpful.

T10186 and similar are about downloading individual files, while this is about downloading diffs, so the changes in T10186 aren't expected to affect this.

We should fix this, but fixing it is complicated (especially without impacting performance for small diffs) and I think it is very rare that users want to download individual diffs which are more than 8MB in size, so it will be difficult to prioritize.

Here's what's happening now:

  • Diffusion calls diffusion.rawdiffquery and fetches the entire diff text over Conduit.
  • Diffusion tries to save the diff in Files storage, but does not engage the chunk engine.
  • The diff is so large that it needs to be chunked, so the write (which does support chunking) fails.

Simply engaging the chunk engine isn't really a fix: it raises the limit from 8MB to "all the available memory on the machine" -- say, maybe a few GB -- but that's still an artificial limit.

What needs to happen instead is for the endpoints to work like diffusion.filecontentquery:

  • Diffusion calls diffusion.filecontentquery.
  • The call streams the result into Files storage, engaging the chunk engine, and returns only the PHID.
  • The client just redirects to the data.

However, we don't want to do this in the common case, because 99.99%+ of the time the diff is small and we'll be slowing down the other UI on the page which reads the data for no reason if we unconditionally route it through the file engine.

Thanks for the heads up. I will contact my user with this information. What you said make totally sense to me :).

epriestley renamed this task from Can not Download large diffs in diffusion "No configured storage engine can store this file" to Huge diffs (>8MB) should stream directly into file storage in Diffusion when downloaded.Feb 23 2016, 6:26 PM
epriestley added a project: Files.
epriestley raised the priority of this task from Wishlist to High.Jul 10 2016, 1:26 PM
epriestley added a project: Phacility.

We had a shard disruption in the cluster this morning that I don't have a clear smoking gun on, but one issue that looks questionable is on this pathway -- I see several requests to generate diffs of an extremely large (5M line) commit coming from the web.

The web pathway does not timeout these requests, but should. Fixing this may not really overlap with the streaming issue, but they're at least somewhat adjacent.

I suspended the instance in question since it was a test instance, and continuing to generate web requests for the 5M-line diff and disrupt other processes on the shard.

This is actually happening on the cluster/daemon pathway so it's not really related to the original issue here. However, the daemon client is timing out:

Daemon 6 STDE [Sun, 10 Jul 2016 13:52:49 +0000] [2016-07-10 13:52:49] EXCEPTION: (PhutilProxyException) Error while executing Task ID 32258. {>} (HTTPFutureCURLResponseStatus) [cURL/28] (http://172.30.0.20:80/api/diffusion.internal.gitrawdiffquery) <CURLE_OPERATION_TIMEDOUT> The request took too long to complete. at [<phutil>/src/future/http/HTTPSFuture.php:408]

This leads to degenerate behavior. A limited improvement would be to remove the timeout from this command.

This is largely the result of --find-copies-harder being explosively slow for large result sets in large repositories. With this flag, the runtime rises from <1s to >3m, at which point git itself bails out:

warning: only found copies from modified paths due to too many files.

This didn't really end up being related to the other issue here (streaming huge diffs into Files) but it's a little late for me to split it out now.

I deployed that stuff to the affected shard and things look OK now. The original issue described in this task wasn't affected.

D16460 is marked as fixing this, and we always send data via Files afterwards. This is different from the approach I'd outlined above (no mode where we continue to return text directly), but two additional considerations have arisen:

  • Even small diffs may not be utf8, and thus not encodable with JSON. This caused issues in T11524.
  • We now plan to build a changeset cache (T11482) which largely moots the performance concern.

We might revisit this after building the cache based on performance data (e.g., provide a way for the caller to request the diff inline, perhaps) but the performance impact here should be minimal and this should let us move forward on the UTF8/cache issues.