Page MenuHomePhabricator

Updates for Mercurial's HTTP protocol
ClosedPublic

Authored by cspeckmim on May 26 2023, 8:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 3, 2:56 PM
Unknown Object (File)
Tue, Dec 31, 10:32 PM
Unknown Object (File)
Mon, Dec 16, 7:58 AM
Unknown Object (File)
Thu, Dec 12, 6:51 AM
Unknown Object (File)
Wed, Dec 11, 9:16 PM
Unknown Object (File)
Wed, Dec 11, 12:18 PM
Unknown Object (File)
Fri, Dec 6, 7:55 AM
Unknown Object (File)
Dec 5 2024, 7:16 AM
Subscribers

Details

Summary

While testing https://secure.phabricator.com/D21864 I ran into some issues getting mercurial HTTP access working. Using wireshark I confirmed that my local mercurial 6.4 was not including command arguments as HTTP headers but in the querystring.

I didn't dig too deep into understanding when/why this started happening. The protocol documents this in wireprotocol.txt.

Command arguments can be sent multiple ways. The simplest is part of the URL query string using `x-www-form-urlencoded encoding (see Python's urllib.urlencode()`. However, many servers impose length limitations on the URL. So this mechanism is typically only used if the server doesn't support other mechanisms.

Based on that either the mercurial on the server is really old (it's 6.1.1 tho) or maybe some other parsing/info passing in Phab's handling of the wire protocol is causing the client to downgrade the wire protocol support.

Test Plan

Host mercurial repo using HTTP, test push/pull.

Diff Detail

Repository
rP Phabricator
Branch
cspeck-hg-http
Lint
Lint Passed
SeverityLocationCodeMessage
Advicesrc/applications/diffusion/controller/DiffusionServeController.php:914XHP16TODO Comment
Unit
Tests Passed
Build Status
Buildable 25817
Build 35645: arc lint + arc unit

Event Timeline

cspeckmim held this revision as a draft.
cspeckmim added inline comments.
src/applications/diffusion/controller/DiffusionServeController.php
898

Would iterating through $this->getRequest()->getRequestData() be better here? It looks like that would capture both GET query params and POST params encoded with x-www-form-urlencoded which is what hg expects. From the mercurial spec there's an additional header for POST requests to specify the length of the arguments in the POST body but allows for additional content after it. Blehhhhh...

This is a big mess, but here's some general guidance (see also inline):

The only thing I see that's materially wrong here (plausibly produces broken behavior for sensible requests) is the missing urlencode() stuff. I think everything else is optional. Instead of doing "{$key}={$value}" and then re-parsing the string, you could probably just avoid the whole urlencode() thing by doing $args_filtered[$key] = $value; ... return $args_filtered;


You can read the GET query string most-purely like this:

$query = idx($_SERVER, 'QUERY_STRING', '');
$pairs = id(new PhutilQueryStringParser())
  ->parseQueryStringToPairList($query);

You can read the POST query string most-purely by following the logic in AphrontApplicationConfiguration->readHTTPPOSTData().

It is acceptable and reasonable to read $_GET (and, if you want, $_POST) instead, but they will differ from the "pure, raw" input in two ways:

  • The pure raw input may encode the same value multiple times, e.g. a=1&a=2. The $_GET and $_POST arrays will discard all but one value when duplicates are present.
  • The processing to generate $_GET and $_POST decodes b[]=1&b[]=2 as an array with values [1, 2], but this is a PHP-specific rule.

So light cheating here by using $_GET/$_POST and/or storing the list as an array instead of a list of pairs may mangle or mutate requests from Mercurial in the general form a=1&a=2&b[]=1&b[]=2. It is likely that Mercurial never sends any requests in this form.

If you want to try to handle these cases, I think it would be reasonable to make AphrontApplicationConfiguration store the raw strings somewhere before it sends them to PhutilQueryStringParser so you don't have to copy/paste 100 lines out of readHTTPPOSTData().

But presumably we're more in "fix urlencode() and call it a day" territory here than "rewrite the whole stack to support pure passthrough for a case that probably never occurs in the wild".

src/applications/diffusion/controller/DiffusionServeController.php
901

This is probably missing urlencode() calls for keys and values which contain symbols like = and &.

This revision now requires changes to proceed.May 27 2023, 12:11 AM

If you want to try to handle these cases, I think it would be reasonable to make AphrontApplicationConfiguration store the raw strings somewhere before it sends them to PhutilQueryStringParser so you don't have to copy/paste 100 lines out of readHTTPPOSTData().

But presumably we're more in "fix urlencode() and call it a day" territory here than "rewrite the whole stack to support pure passthrough for a case that probably never occurs in the wild".

I will take the fix urlencode route for now but this is good to know

src/applications/diffusion/controller/DiffusionServeController.php
901

x_x I actually made a local change to urlencode both key and value but did not amend as I thought they would already be urlencoded coming from $_GET

Maybe better querystring parsing

This version looks correct to me in all non-pathological cases, thanks!

src/applications/diffusion/controller/DiffusionServeController.php
890

Nothing wrong here, but this is equivalent to the simpler if ($args_raw) -- in PHP, an array is "falsey" if it has zero elements and "truthy" if it has one or more elements.

899

I think this never has an effect, since we can only reach this line with $args_raw having the value array() anyway.

910

As above.

This revision is now accepted and ready to land.May 28 2023, 11:29 PM
This revision was automatically updated to reflect the committed changes.