Page MenuHomePhabricator

Having trouble with chunked uploads
Closed, ResolvedPublic

Description

I'm having trouble getting the chunked file uploads to work (introduced in D12053). Specifically, I'm getting 413 Request Entity Too Large both when using /file/upload/ and arc upload. I have also setup the following:

  • Set the client_max_body_size for nginx to 32M
  • Set the post_max_size for PHP to 32M.
  • Set the upload_max_filesize for PHP to 32M.

Revisions and Commits

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Files.
joshuaspence added a subscriber: epriestley.
joshuaspence added a subscriber: joshuaspence.

I suspect this is an nginx thing.

Hmm, I'm not sure. We use nginx on this server -- client_max_body_size is technically set to 128m, but 32m should be more than enough.

My best guess is that the directive is in the wrong place, maybe? Ours is in the http { ... } block, and this StackOverflow answer suggests that at least some users have solved a similar issue by moving it around:

http://stackoverflow.com/questions/2056124/nginx-client-max-body-size-has-no-effect

Ah. I did try testing it out on this install with a 100MB file. Perhaps I shall try with a larger file instead.

My Phabricator Installation
> arc --conduit-uri=https://phabricator.example.com --trace upload large-file
libphutil loaded from '/home/joshua/dotfiles/modules/phabricator/libphutil/src'.
arcanist loaded from '/home/joshua/dotfiles/modules/phabricator/arcanist/src'.
Config: Reading user configuration file "/home/joshua/.arcrc"...
Config: Did not find system configuration at "/etc/arcconfig".
Working Copy: No candidate locations for .arcconfig from this working directory.
Working Copy: Path "/home/joshua" is not in any working copy.
>>> [0] <conduit> conduit.connect() <bytes = 586>
>>> [1] <http> https://phabricator.example.com/api/conduit.connect
<<< [1] <http> 3,184,008 us
<<< [0] <conduit> 3,184,671 us
Uploading 'large-file'...
>>> [2] <conduit> file.allocate() <bytes = 295>
>>> [3] <http> https://phabricator.example.com/api/file.allocate
<<< [3] <http> 614,302 us
<<< [2] <conduit> 614,989 us
>>> [4] <conduit> file.upload() <bytes = 209715421>
>>> [5] <http> https://phabricator.example.com/api/file.upload
<<< [5] <http> 1,327,965 us
<<< [4] <conduit> 1,328,623 us

[2015-03-17 09:41:15] EXCEPTION: (HTTPFutureHTTPResponseStatus) [HTTP/413] 
<html>
<head><title>413 Request Entity Too Large</title></head>
<body bgcolor="white">
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html> at [<phutil>/src/future/http/BaseHTTPFuture.php:337]
arcanist(), phutil()
  #0 BaseHTTPFuture::parseRawHTTPResponse(string) called at [<phutil>/src/future/http/HTTPSFuture.php:414]
  #1 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:39]
  #2 Future::resolve(NULL) called at [<phutil>/src/future/FutureProxy.php:36]
  #3 FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:57]
  #4 ConduitClient::callMethodSynchronous(string, array) called at [<arcanist>/src/workflow/ArcanistUploadWorkflow.php:119]
  #5 ArcanistUploadWorkflow::run() called at [<arcanist>/scripts/arcanist.php:378]
secure.phabricator.com
> arc --trace --conduit-uri=https://secure.phabricator.com upload large-file
libphutil loaded from '/home/joshua/dotfiles/modules/phabricator/libphutil/src'.
arcanist loaded from '/home/joshua/dotfiles/modules/phabricator/arcanist/src'.
Config: Reading user configuration file "/home/joshua/.arcrc"...
Config: Did not find system configuration at "/etc/arcconfig".
Working Copy: No candidate locations for .arcconfig from this working directory.
Working Copy: Path "/home/joshua" is not in any working copy.
>>> [0] <conduit> conduit.connect() <bytes = 586>
>>> [1] <http> https://secure.phabricator.com/api/conduit.connect
<<< [1] <http> 2,760,262 us
<<< [0] <conduit> 2,760,926 us
Uploading 'large-file'...
>>> [2] <conduit> file.allocate() <bytes = 296>
>>> [3] <http> https://secure.phabricator.com/api/file.allocate
<<< [3] <http> 998,750 us
<<< [2] <conduit> 999,510 us
Beginning chunked upload of large file...
>>> [4] <conduit> file.querychunks() <bytes = 217>
>>> [5] <http> https://secure.phabricator.com/api/file.querychunks
<<< [5] <http> 877,833 us
<<< [4] <conduit> 878,746 us
Uploading chunks (38 chunks to upload).
>>> [6] <conduit> file.uploadchunk() <bytes = 1398419>
>>> [7] <http> https://secure.phabricator.com/api/file.uploadchunk

It seems that my custom PhabricatorFileStorageEngine class is being selected instead of PhabricatorChunkedFileStorageEngine. If I manually decrease the priority of PhabricatorChunkedFileStorageEngine to 0, I am able to perform a chunked upload.

Ohhh. Set these options in your custom engine:

hasFilesizeLimit() = true
getFilesizeLimit() = 8MB

You can set getFilesizeLimit() to whatever you want, the largest limit any engine exposes is the chunking threshold.

For example, if it's set to 256MB, we'll upload files in single blobs if they're under 256MB, and split them into chunks if they're over 256MB.

(8MB just stores all 1-chunk and 2-chunk files as single files, and then starts chuking as soon as we'd have 3 chunks.)

So, setting a filesize limit allows me to upload files with an unlimited size?

Yeaaaahhhhhh...

getLargestFileStorableAsASingleBlobBeforeChunkingActivates__SetThisToUnder4MBToDisableChunking()

This API is partly odd because I want to disable chunking into MySQL by default, but allow users to enable it relatively easily. That motivates the <4MB -> 4MB chunk-enabling threshold. We could hard-code the 8MB single-file threshold but I suspect the total number of custom storage engines in the wild is about 3 so I figured I'd wait until the next iteration to figure out where the API should go (I want to the config into the Files web UI, which would let us expose [ ] Enable chunk storage for this engine or whatever).

So where does 32MB come from? If I set the filesize limit to 8MB, why do I need to set the nginx/PHP limits to 32MB?

It's just 8MB * headroom factor.

When uploaded via arc upload, chunks are sent as JSON-encoded base64 with additional data and headers, so that's at least >12MB as a request body. We might want to change the chunk size later, etc. 32MB is small enough that it should never be too large for any system, but large enough that we won't hit it even if we adjust chunk sizes and encoding schemes.

joshuaspence claimed this task.

OK. I'm having some more issues...

> arc upload some_file
Uploading 'some_file'...
Beginning chunked upload of large file...
Uploading chunks (63 chunks to upload).
Unable to use allocate method, trying older upload method.                    
  F79654 some_file: https://phabricator.mydomain.com/F79654

Done.

> arc upload --trace some_file
libphutil loaded from '/home/josh/workspace/github.com/joshuaspence/dotfiles/modules/phabricator/libphutil/src'.
arcanist loaded from '/home/josh/workspace/github.com/joshuaspence/dotfiles/modules/phabricator/arcanist/src'.
Config: Reading user configuration file "/home/josh/.arcrc"...
Config: Did not find system configuration at "/etc/arcconfig".
Working Copy: Reading .arcconfig from "/home/josh/workspace/git.mydomain.com/sys/puppet/.arcconfig".
Working Copy: Path "/home/josh/workspace/git.mydomain.com/sys/puppet" is part of `git` working copy "/home/josh/workspace/git.mydomain.com/sys/puppet".
Working Copy: Project root is at "/home/josh/workspace/git.mydomain.com/sys/puppet".
Config: Did not find local configuration at "/home/josh/workspace/git.mydomain.com/sys/puppet/.git/arc/config".
>>> [0] <conduit> conduit.connect() <bytes = 600>
>>> [1] <http> https://phabricator.mydomain.com/api/conduit.connect
<<< [1] <http> 1,178,754 us
<<< [0] <conduit> 1,179,246 us
Uploading 'some_file'...
>>> [2] <conduit> file.allocate() <bytes = 305>
>>> [3] <http> https://phabricator.mydomain.com/api/file.allocate
<<< [3] <http> 346,970 us
<<< [2] <conduit> 347,585 us
Beginning chunked upload of large file...
>>> [4] <conduit> file.querychunks() <bytes = 216>
>>> [5] <http> https://phabricator.mydomain.com/api/file.querychunks
<<< [5] <http> 329,929 us
<<< [4] <conduit> 330,327 us
Resuming upload (17 of 63 chunks remain).
>>> [6] <conduit> file.uploadchunk() <bytes = 5597458>
>>> [7] <http> https://phabricator.mydomain.com/api/file.uploadchunk
<<< [7] <http> 4,496,103 us
<<< [6] <conduit> 4,496,747 us
Unable to use allocate method, trying older upload method.
>>> [8] <conduit> file.info() <bytes = 212>
>>> [9] <http> https://phabricator.mydomain.com/api/file.info
<<< [9] <http> 1,318,127 us
<<< [8] <conduit> 1,318,592 us
  F79654 some_file: https://phabricator.mydomain.com/F79654

Done.

Hmm, anything in the error log on the server?

(That's also kind of buggy since it doesn't actually try the older method, I'll fix that.)

I suspect retrying it repeatedly will succeed eventually, since you got 17 chunks up, but I don't have any guesses about what the failure might be.

Nope, nothing in the error logs on the server.

Okay, let me fix the fallback behavior and get it printing whatever the chunk upload throws and we'll see where that gets us.

OK interesting...

> arc upload some_file
Uploading 'some_file'...
Beginning chunked upload of large file...
Resuming upload (17 of 63 chunks remain).
Exception                                                                     
[HTTP/502] 
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>
(Run with `--trace` for a full exception trace.

Checking the nginx logs on the server:

2015/03/17 23:16:04 [error] 13532#0: *1 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 59.100.120.194, server: phabricator.mydomain.com, request: "POST /maniphest/transaction/preview/21950/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.socket:", host: "phabricator.mydomain.com"
2015/03/18 22:24:33 [error] 14727#0: *80310 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 59.100.120.194, server: phabricator.mydomain.com, request: "POST /api/file.uploadchunk HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.socket:", host: "phabricator.mydomain.com"
2015/03/18 22:25:00 [error] 14727#0: *80431 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 59.100.120.194, server: phabricator.mydomain.com, request: "POST /api/file.uploadchunk HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.socket:", host: "phabricator.mydomain.com"
2015/03/18 22:25:45 [error] 14727#0: *80536 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 59.100.120.194, server: phabricator.mydomain.com, request: "POST /api/file.uploadchunk HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.socket:", host: "phabricator.mydomain.com"

Oh nice, I saw this in /var/log/syslog:

Mar 19 03:10:59 ip-10-154-233-161 kernel: [5119211.690207] php5-fpm[30535]: segfault at 102babac0 ip 00007f81f136ec80 sp 00007fffc0763038 error 4 in libc-2.15.so[7f81f122f000+1b4000]

I wonder if there's a way we can detect it. From the report and patch, it's not obvious to me how we could trigger it (to see if it segfaults), and we probably can't usually read /var/log/syslog (to see if there are logs of a segfault). I can't immediately find it in the PHP changelog, either, to just blacklist affected versions.

I wonder if it's worthwhile to try to build a mechanism that lets users hook up Phabricator access to things like syslog so we can attempt to detect things like, this SELinux/AppArmor breaking, Phabricator processes getting killed by the OOM killer, etc. There are a handful of errors like this...

epriestley claimed this task.

I don't think we have enough to go on here to try to build some syslog wizard yet, but maybe after we collect more problems.