File Storage setup doc makes a hokey suggestion
Open, Needs TriagePublic

Description

https://secure.phabricator.com/book/phabricator/article/configuring_file_storage/

This page suggests the following PHP config change:

max_input_vars: When files are uploaded via HTML5 drag and drop file upload APIs, PHP parses the file body as though it contained normal POST parameters, and may trigger max_input_vars if a file has a lot of brackets in it. You may need to set it to some astronomically high value.

This seems like a really silly and inefficient workaround, when disabling enable_post_data_reading would probably also fix it. I wasn't sure if Phabricator would be okay with it, but @epriestley's comment on T4369 seems to indicate that it would be fine.

If this would work, the docs could be updated to describe the better solution.

If this wouldn't work, perhaps some code changes could be made that would fix any issues. I'd possibly be willing to dive in and do it if someone points me in the right direction. :)

Yes, I wasn't aware that enable_post_data_reading existed when I wrote this.

That's a better approach (and something we want to pursue for T4369 anyway) but probably requires a few changes to the request pathway.

I'm not sure exactly what's involved. Access to $_POST should be mostly isolated into AphrontRequest, but, at a minimum, PhabricatorStartup::normalizeInput() runs early and will force $_POST into existence.

Pathway forward is probably something in the vein of:

  • Go blame that stuff and try to figure out why it's as early as it is. It may be because there are other early reads of $_GET or $_REQUEST from before we build an AphrontRequest.
  • Move it to happen on-demand in AphrontRequest instead.
  • Fix all possible prior calls to superglobals?

This is probably hard as an initial task for a new contributor since it's so poorly defined. I suspect the individual issues are not difficult to resolve, but identifying them and choosing an approach may require a lot of intricate knowledge of the Phabricator initialization phase.

peterfaiman added a comment.EditedAug 29 2016, 1:10 PM

Go blame that stuff and try to figure out why it's as early as it is.

What do you mean that stuff? normalizeInput?

It's pretty poorly defined but it mostly involves reading code. Might be fun! I don't know if I'll have time to take a crack at it, but it could happen.

This stack overflow post suggests that with apache you could use something like:

<LocationMatch "^/file/dropupload/">
    php_value enable_post_data_reading Off
</LocationMatch>

But that still might cause issues.

On the bright side a casual search for _POST shows few results in the Phabricator code.

What do you mean that stuff? normalizeInput?

Yeah, normalizeInput() -- that seems like the biggest issue.

I think the StackOverflow fix won't work because we always run nomalizeInput() immediate after receiving requests.

Overall, the desired data flow is something like this:

  1. Turn enable_post_data_reading off for all pages.
  2. Don't inspect $_POST until we route the request to a Controller.
  3. Once we reach a Controller, it is allowed to either ask for POST data to be parsed normally (as though we had read $_POST) or read it as a stream instead (file data uploads, VCS operations). This decision could be inferred lazily ("did the controller try to access $_POST?") or could be explicit (e.g., some method that controllers can override to opt out of POST parsing, or a method call on AphrontRequest to put it into stream mode).
  4. Have the drop upload and HTTP VCS controllers take the "stream" path.

Currently, we won't get past step (2) because we immediately inspect $_POST in normalizeInput() at the beginning of every request.

We're fairly disciplined about use of $_POST and $_REQUEST (and some hits for them are in unreachable third-party libraries) but the cases where we do use them are likely tricky to resolve.

For example, DarkConsoleXHProfPluginAPI inspects $_REQUEST['__profile__'] to activate the profiler, which allows us to include __profile__ as POST data in a form submission to activate the profiler for the request which processes the form. Activating the profiler as soon as possible is desirable (so we don't miss performance information for code which runs before we build a Controller) but I don't see a way to preserve this behavior exactly when some controllers no longer parse $_POST and we activate the profiler before we know if this is a parsing controller or not.

I don't think there's a way to cheat in that case either, because the Controller that will handle a request is a function of a lot of state, not just the request path (are applications installed? what's the host name? Is the user logged in? Is the user disabled? and so on...).

This is pretty tractable because activating the profiler by submitting forms isn't an important capability, but each callsite probably has similar weirdness. For example, AphrontStandaloneHTMLResponse uses $_REQUEST and may be built if exceptions are raised before we reach a controller or even construct an AphrontRequest, I believe.