Page MenuHomePhabricator

Phabricator HTTP repository hosting has fairly severe scalability limits
Open, Needs TriagePublic

Tags
Tokens
"Love" token, awarded by rafaelrabeloit."Love" token, awarded by 20after4."Manufacturing Defect?" token, awarded by Grimeh.
Assigned To
None
Authored By
squimmy, Feb 2 2014

Description

Hi,

I'm unable to clone a Mercurial repository hosted by my Phabricator instance that I initially pushed using the HTTP interface. I suspect this is some bizarre configuration error on my end and not a major problem with Phabricator, but i'm at a loss of how to approach this seeing as the only error I get is:

requesting all changes
abort: HTTP Error 500: Internal Server Error

None of the Phabricator daemons report any errors, which is highly annoying. I do have other repositories which work fine, it's just this one repository which i'm having problems with so I suspect this will be rather difficult to reproduce. Any ideas?

Thanks,

Matthew

Related Objects

Event Timeline

squimmy updated the task description. (Show Details)
squimmy raised the priority of this task from to Needs Triage.
squimmy added a project: Diffusion.
squimmy added a subscriber: squimmy.

Any PHP errors? (check your web server / fcgid logs)

Yeah, the daemons don't handle HTTP requests so they aren't expected to report errors. You should check your webserver error logs instead (usually /var/log/httpd/error.log or similar for Apache), or php-fpm logs if you're using nginx + php-fpm.

CHeck the Apache log, and i'm guessing this must have something to do with it...

[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191] [2014-02-06 13:12:48] ERROR 8: Undefined offset: 1 at [/root/phab/phabricator/src/applications/diffusion/conduit/ConduitAPI_diffusion_historyquery_Method.php:140]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #0 ConduitAPI_diffusion_historyquery_Method::getMercurialResult(Object ConduitAPIRequest) called at [/root/phab/phabricator/src/applications/diffusion/conduit/ConduitAPI_diffusion_abstractquery_Method.php:168]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #1 ConduitAPI_diffusion_abstractquery_Method::getResult(Object ConduitAPIRequest) called at [/root/phab/phabricator/src/applications/diffusion/conduit/ConduitAPI_diffusion_historyquery_Method.php:32]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #2 ConduitAPI_diffusion_historyquery_Method::getResult(Object ConduitAPIRequest) called at [/root/phab/phabricator/src/applications/diffusion/conduit/ConduitAPI_diffusion_abstractquery_Method.php:157]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #3 ConduitAPI_diffusion_abstractquery_Method::execute(Object ConduitAPIRequest) called at [/root/phab/phabricator/src/applications/conduit/method/ConduitAPIMethod.php:68]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #4 ConduitAPIMethod::executeMethod(Object ConduitAPIRequest) called at [/root/phab/phabricator/src/applications/conduit/call/ConduitCall.php:163]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #5 ConduitCall::executeMethod() called at [/root/phab/phabricator/src/applications/conduit/call/ConduitCall.php:90]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #6 ConduitCall::execute() called at [/root/phab/phabricator/src/applications/diffusion/query/DiffusionQuery.php:70]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #7 DiffusionQuery::callConduitWithDiffusionRequest(Object PhabricatorUser, Object DiffusionMercurialRequest, diffusion.historyquery, Array of size 6 starting with: { commit => default }) called at [/root/phab/phabricator/src/applications/diffusion/controller/DiffusionController.php:171]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #8 DiffusionController::callConduitWithDiffusionRequest(diffusion.historyquery, Array of size 6 starting with: { commit => default }) called at [/root/phab/phabricator/src/applications/diffusion/controller/DiffusionHistoryController.php:31]
[Thu Feb 06 08:12:48 2014] [error] [client 82.25.184.191]   #9 DiffusionHistoryController::processRequest() called at [/root/phab/phabricator/webroot/index.php:84]
[Thu Feb 06 08:16:19 2014] [error] [client 157.55.33.120] File does not exist: /var/www/robots.txt
[Thu Feb 06 13:10:17 2014] [error] [client 82.25.184.191] PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /root/phab/phabricator/src/applications/diffusion/controller/DiffusionServeController.php on line 450
[Thu Feb 06 13:10:17 2014] [error] [client 82.25.184.191] >>> UNRECOVERABLE FATAL ERROR <<<\n\nMaximum execution time of 30 seconds exceeded\n\n/root/phab/phabricator/src/applications/diffusion/controller/DiffusionServeController.php:450\n\n\n\xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb \xef\xb8\xb5 \xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf \xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb
[Thu Feb 06 13:12:06 2014] [error] [client 82.25.184.191] PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /root/phab/phabricator/src/applications/diffusion/controller/DiffusionServeController.php on line 450
[Thu Feb 06 13:12:06 2014] [error] [client 82.25.184.191] >>> UNRECOVERABLE FATAL ERROR <<<\n\nMaximum execution time of 30 seconds exceeded\n\n/root/phab/phabricator/src/applications/diffusion/controller/DiffusionServeController.php:450\n\n\n\xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb \xef\xb8\xb5 \xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf \xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb
epriestley renamed this task from Unable to clone Mercurial repository hosted by Phabricator to Phabricator HTTP repository hosting has fairly severe scalability limits.May 15 2014, 3:49 PM

I think there's another task somewhere, but two issues here are:

  1. We don't set_time_limit(), so long-running requests may exceed max_execution_time and be killed, as with the kill after 30 seconds.
  2. We don't stream results properly, so web processes consume unreasonably huge amounts of memory while cloning, and clones over 2GB hit PHP string length limits.

Yeah, we hit the 2GB PHP string size limit recently (String size overflow in [...]/ExecFuture.php on line 712).
A solution to this problem would be nice.

For now, the best workaround is to use SSH instead of HTTP.

◀ Merged tasks: T5961.

We've hit the

>>> UNRECOVERABLE FATAL ERROR <<<\n\nString size overflow\n\n/home/ec2-user/phabricator/libphutil/src/future/exec/ExecFuture.php:724

Are there any plans to support HTTP clone mode for large projects?

We plan to resolve this eventually but it's not a priority. SSH does not have these scalability limits and is the recommended workaround for large repositories.

Grimeh added a subscriber: Grimeh.Nov 12 2015, 9:53 PM
kaya added a subscriber: kaya.Mar 10 2016, 1:36 AM
eadler added a project: Restricted Project.Mar 10 2016, 6:20 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
aubort added a subscriber: aubort.Mar 23 2016, 3:31 PM

T7789 has some discussion, but there's now a clear pathway forward here which doesn't require anything too drastic. Specifically:

Disable enable_post_data_reading: Add a setup warning to instruct users to disable enable_post_data_reading in versions of PHP after PHP 5.4.0. In earlier versions, we could warn about scalability limits and suggest updating to PHP 5.4.0 or newer. Users could ignore the warning and accept limits on scalability.

Build $_POST only on demand: This means that $_POSTwon't be populated, but that's generally fine: we rebuild it ourselves in AphrontDefaultApplicationConfiguration->buildRequest() so we can fix problem behaviors in the native parsing. However, this needs to be moved to read $_POST data only once it is requested -- instead of automatically at the time of request construction -- because we don't want to read the whole stream if we're processing a VCS HTTP request. This may be a bit thorny/involved and require some cleanup of the initialization sequence, but shouldn't be technically difficult.

Stream Requests: We need to add a bit of glue to let AphrontRequestStream stream into ExecFuture. Each already supports streaming, they just don't expose compatible endpoints.

Stream Responses: We need to add a bit of glue to let ExecFuture stream a response. Again, both already support streaming, they just don't expose compatible endpoints. Only this step is required to relieve the clone limits, the other steps relieve push limits.

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:18 PM
epriestley moved this task from Backlog to v3 on the Diffusion board.Aug 26 2016, 11:10 AM
epriestley edited projects, added Diffusion (v3); removed Diffusion.

GHC appears to be running into this limitation. We would prefer to use anonymous HTTP access with our build bots; is there any hope that this will be fixed in the near-term?

Probably not. Because it's a scalability issue this does get some amount of internal priority, but we have a large amount of external prioritized work in queue ahead of it (see Prioritized).

urzds added a subscriber: urzds.Sep 8 2016, 6:12 AM
tjgould added a subscriber: tjgould.Feb 2 2017, 3:30 AM

A wrench in the gears here is that disabling enable_post_data_reading also disables $_FILES, which we use in about 20 places with vanilla HTTP <input type="file" /> controls. So on top of initialization order issues with $_POST, we also need to build a multipart/form-data parser that can populate a synthetic $_FILES or some similar construct.

Would auto_globals_jit help?

When enabled, the SERVER, REQUEST, and ENV variables are created when they're first used (Just In Time) instead of when the script starts.

Then you just have to make sure you figure out you're in a request where you shouldn't use the $_POST superglobal before you actually try to use it. For any case where it's okay to read it, just carry on normally. You'd be able to incrementally fix different subsystems: the unfixed ones would continue to use the superglobals and have scalability issues, but the fixed ones would be fixed and usable right away. You wouldn't have to eliminate all $_POST use from Phabricator, just any pre-routing use. Right?

Though, does XHProf read __profile__ every time before the rest of the request lifecycle, even in production? Cause that would kill this plan.

Would auto_globals_jit help?

From the documentation, it looks like doesn't apply to $_POST (only $_SERVER, $_REQUEST, and $_ENV), so it doesn't seem likely to be useful. Are you aware of different ($_POST-affecting) behavior in practice?

Our access to superglobals in PhabricatorStartup comes from D6672, and works around an issue where PHP's (fairly insane) sanitization filters are configured.

I suspect we can just remove $_POST there, since we'll be reading and parsing it ourselves now. If PHP is sufficiently crazy to sanitize the raw php://input stream (which I doubt, but you never know), we can just refuse to run with sanitization configured.

This potentially dovetails somewhat with T12907, where we need a stream parser for HTTP responses, although since we're parsing partial requests (vs full responses) here and may need to do some of the parsing very early I'm not sure if it actually makes sense to try to overlap things.

The __profile__ stuff is not important, and, in an extreme case, we could pass it via path (/!options=xprofile/T123 or similar) if strictly necessary.

I don't see any easy way to cheat through $_FILES, though. We could remove it, but we have a nonzero number of users who compile their own copy of Linux and use a window manager that doesn't support drag and drop and disable Javascript and don't believe a monitor should ever have more than 256 colors, so that would leave them out to dry. The conversion also isn't trivial, so even if we want to move in this direction eventually I think it's a significant scope expansion over just building a synthetic $_FILES.

I don't see any easy way to cheat through $_FILES, though. We could remove it, but we have a nonzero number of users who compile their own copy of Linux and use a window manager that doesn't support drag and drop and disable Javascript and don't believe a monitor should ever have more than 256 colors, so that would leave them out to dry. The conversion also isn't trivial, so even if we want to move in this direction eventually I think it's a significant scope expansion over just building a synthetic $_FILES.

twm4lyfe!