Page MenuHomePhabricator

Fix issues with recent Mercurial wire protocol versions
Closed, ResolvedPublic

Description

This is a rebranding of T9548, which has a lot of philosophy. See PHI275.

Briefly, our parsing of the Mercurial wire protocol is buggy, and recent versions of Mercurial have extended the wire protocol so that we don't have proper behavior. Known issues:

Bundle2 Filtering: Since D14241/T9450, we attempt to filter the protocol and tell the client that we don't support bundle2. This probably doesn't work. A more desirable behavior is to support bundle2.

Here's some evidence this doesn't work:

$ hg push --debug
...
remote: 346
remote: capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Aerror%3Dabort%2Cunsupportedcontent%2Cpushraced%2Cpushkey%0Ahgtagsfnodes%0Alistkeys%0Aphases%3Dheads%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN

That looks a lot like the server claiming it supports bundle2.

This doesn't necessarily cause any problems on its own, and may fix some problems, but ideally we'd remove this code and then fix whatever that breaks.

Pushing A Large File: These steps can reproduce a specific wire protocol parsing issue:

Server version 4.4.1, client version 4.4.1, SSH.

$ head -c4096000 /dev/zero > 4MB.zero
$ hg add 4MB.zero
$ hg commit -m 'High quality change.'
$ hg push
$ hg push
pushing to ssh://local@localvault.phacility.com/diffusion/39/new-api-mercurial-repo/
searching for changes
remote: abort: stream ended unexpectedly (got 0 bytes, expected 4)
<... snip ...>
remote: ValueError: need more than 1 value to unpack
remote: Exception: Writing to a closed pipe!

Pushing a smaller change doesn't reproduce this issue, so Mercurial's protocol negotiation presumably considers the size of the change being transmitted in selecting the protocol format.

Event Timeline

epriestley triaged this task as Normal priority.Jan 4 2018, 7:59 PM
epriestley created this task.

Here's some evidence [that filtering bundle2] doesn't work:
we attempt to filter the protocol and tell the client that we don't support bundle2

Over SSH, I think we're trying to filter a "capabilities" command from the client, but the client doesn't send this command: the server does.

D14241 filters responses (server -> client) over HTTP, but filters requests (client -> server) over SSH. This is probably just an implementation error that escaped notice because Mercurial doesn't consistently drop into bundle2 mode.

We don't currently decode the half of the conversation the server is sending, and I'd like to avoid decoding it, so I'm just going to remove this nonfunctional filtering on the SSH pathway as a first step. This should result in no behavioral changes.

I think D18857 fixes the pipe issues. Here's the problem:

As we read data from hg (the client, over the wire -- running hg push), it includes a lot of raw data frames like <data-length>\n<that-many-bytes>. We try to retransmit these frames (to hg on the server) immediately to get them into the pipe and out of process memory, even if we have to repackage them. For example, if we receive a 1024-byte data frame, then 512 bytes of data, we'll emit a 512-byte data frame immediately and expect to emit another 512-byte data frame later.

In practice, this optimization is probably pointless since Mercurial never appears to choose a frame size larger than 4096 bytes, but it could be sort-of-important if some future version of Mercurial decides to, e.g., spool a file off disk in a single protocol frame. In this case, the client would be able to emit a very large frame length and then a very large amount of data, and we'd either need to begin transmitting partial frames or split frames apart as we do now. It's a little easier to do the accounting if we split frames apart and a little nicer for debugging since we only have to ever deal with "legitimate" protocol messages, although maybe this is the wrong call.

Regardless, currently, if we receive a frame length but no data is available on the wire yet, we'll emit a 0-length frame. I think this was previously harmless, but it is never helpful, and I think it's probably an "end of stream" signifier for the protocol's streaming sub-protocol after bundle2? I can't immediately prove this in the Mercurial source, but it seems reasonable/likely.

D18857 stops us from emitting these 0-length clutter frames, and appears to fix the protocol proxying.

Mercurial's protocol negotiation presumably considers the size of the change being transmitted in selecting the protocol format

Given the nature of the fix in D18857, this may well have just been "large files change the timings of data arriving on the wire".

After D18857, I'm not aware of any remaining, reproducible issues with Mercurial. If you're still encountering protocol issues after upgrading through D18857, let me know how to reproduce the problem you're seeing.

D18857 focused on SSH, so it's possible there are still some issues on HTTP, but I wasn't able to distill a set of reproduction steps that actually work to reproduce any sort of problem out of T9548 et al. (This may be because problems are masked by bundle2 capability filtering from D14241, which I currently assume functions correctly on the HTTP pathway).

epriestley claimed this task.

Presuming this is resolved since we've seen at least some confirmation that it fixed issues and aren't aware of any remaining outstanding problems.

For recent versions of mercurial to work over http I had to force httpheader=1024 in code which filters bundle2 capabilities.