Page MenuHomePhabricator

Support Mercurial's bundle2 wire protocol
Closed, ResolvedPublic

Description

In Mercurial 3.4 a new wire protocol is supported, bundle2. In 3.4 this wire protocol was enabled by default only for servers, and in 3.5 it was enabled by default for both clients and servers.

ProsCons
Support for atomic push/pull transactions-> Mercurial has gone quite a while without this (up to 3.4)
May be required for Changeset Evolution or other newer features-> Phabricator may not utilize Changeset Evolution at all
Very littled documentation about bundle2, mostly defers to reading source/comments
A more complex solution for bundle2 -> significant effort to support

Implementation Resouces

Some initial discussions in T9450, which resolved errors by dropping support for bundle2 altogether.


Discussions of bundle2, specifically mentioning atomic push/pull operations:

Event Timeline

cspeckmim updated the task description. (Show Details)
cspeckmim added projects: Mercurial, Diffusion.
cspeckmim added a subscriber: cspeckmim.

I suspect it's worthwhile for us to support bundle2 eventually. As far as I can tell from my investigations in T9450:

SSH: The pathway on SSH looks reasonably simple. I think what's happening is that normally, protocol frames look approximately like this:

  • <length in bytes>
  • <that many bytes>

It looks like bundle2 means that the content of a frame (HG20...) can sometimes tell us to switch into bundle2 mode, which uses different rules to encode protocol frames. This probably amounts to a couple more states in DiffusionMercurialWireProtocol to parse the frames of the sub-protocol and some extra test cases, but not much really significant work.

HTTP: We implement HTTP by invoking hg serve --stdio and then emitting its response as an HTTP response. The ruleset for doing this seems to be sort of implicit inside Mercurial, and this is not exactly what Mercurial does. Instead, it appears to emit different bundle2 responses when it thinks the client is SSH (--stdio) vs HTTP. The HTTP wire response format also differs between commands, and differs further after the introduction of bundle2. We have some shaky special casing to deal with this already.

Particularly, for SSH it looks like it emits errors (like commit hook messages) over stderr, while for HTTP it includes them as additional parts in the bundle2 response.

So we probably need to special case the unbundle command, decode the response, build additional parts from stderr, and then insert them into the response.


While I'm hesitant to throw stones here and don't really have any context on bundle2 or the wire protocol in general and there may be plenty of reasons that it works like it does, my general impression is that the Mercurial wire protocol is significantly under-designed, and is now on at least the 3rd and possibly 4th iteration of hacks to try to address the lack of actual protocol design.

The existing base protocol feels like very little thought went into it, and the implementation just took the shortest path to running hg commands in the remote that it could, accepting a very leaky protocol that requires a high level of application-specific knowledge on both the client and server.

By comparison, the Subversion protocol defines a grammar and message format which can be completely decoded with no knowledge of the application (and substantially understood with very little knowledge of the application).

At some point Mercurial ran into the limitations of the protocol and built the batch mode, wrapping the protocol in a meta-protocol. This was also not sufficiently designed and also just had actual long-standing bugs in it, like the ; escaping issue, although it looks like that was finally fixed in late June so we should probably remove (or special case) this code:

DiffusionMercurialWireProtocol.php
// NOTE: Mercurial has some code to escape semicolons, but it does not
// actually function for command separation. For example, these two batch
// commands will produce completely different results (the former will run
// the lookup; the latter will fail with a parser error):
//
//  lookup key=a:xb;lookup key=z* 0
//  lookup key=a:;b;lookup key=z* 0
//               ^
//               |
//               +-- Note semicolon.
//
// So just split unconditionally.

$cmds = explode(';', $cmds);

One cost of this approach that the protocol is difficult to extend (and batch doesn't really help), which is one of the motivators for bundle2. But rather than actually fixing the protocol as a whole, bundle2 is just adding a new sub-protocol and making only that extensible: most operations still go through the main protocol, and bundle2 is just a sub-mode that the protocol drops into opportunistically for this one command.

So we're now multiple iterations into a protocol approach where the problems don't seem to be considered carefully and resolved through program design, but rather by carving out a new super-protocol or sub-protocol which makes the protocol as a whole strictly more complex. It looks like the first bundle2 commits appeared in Mercurial about 12 months before it was enabled by default, so it's disappointing to me that more consideration isn't being given to design and architecture on these timescales.

Particularly, we have no reason to believe that the protocol will ever stop evolving in ways that fundamentally change the grammar, since none of the changes that have been made so far move in this direction. This, combined with the protocol having so much implicit, application-specific or otherwise variant (or broken or nonsensical) logic is wearying and makes it difficult to get excited about spending time implementing bundle2. It doesn't move the protocol forward, it just moves the protocol deeper into the dark mire it is already stuck in.

In fairness, the Mercurial upstream has been fairly responsive to bug reports (see, e.g., T5896) and I'm reasonably confident the semicolon issue would have been fixed promptly if I'd reported it. But I'd guess "please just throw your bad protocol away and write a new one, and also whatever your process for protocol design is needs to be thrown out" would be hard for them to prioritize, and we couldn't throw the current stuff away for like 5 years anyway.

Anyway, we have no real options but to continue implementing whatever they implement and the cost isn't that high in absolute terms, it's just that it's approximately the single most miserable part of Phabricator to work on.

So we probably need to special case the unbundle command, decode the response, build additional parts from stderr, and then insert them into the response.

It's possible we could ship and invoke a version of hgweb.cgi instead of using hg serve --stdio to avoid this, although I'm not sure how much more luck we'd have dealing with CGI than bundle2, especially since we probably have to parse bundle2 to deal with SSH anyway.

I have also started looking into other potential projects which may have had to deal with these issues, though I don't believe anything I've found has caught up to bundle2 yet.

  • Kallithea - probably not helpful - imports mercurial python library
  • Allura/ForgeHg - probably not helpful - imports mercurial python library
  • JavaHg - uses command server
  • chg - maintains command servers to avoid python spinup time

Thanks (again) for the in-depth description and analysis.

We have some CommandServer stuff too, although I don't think it has ever been reachable from the UI. I don't remember if the original use case was improving arc performance or improving import/parser performance.

https://secure.phabricator.com/diffusion/ARC/browse/master/src/hgdaemon/ArcanistHgProxyClient.php

I don't know if I still have the branch anywhere but I had an arc hg ... branch at one point which just worked like hg except substantially faster, looks very similar to what chg does. I don't think that'll get us much headway against the bundle2 / wireproto stuff, though.

Particularly, the command server wire protocol is completely different from the HTTP/SSH wire protocol (and pretty reasonable overall, although it's solving a much simpler problem).

I've been looking at this the last couple of days as it's a pain point for me -- I'd like to host my repositories on Diffusion, and I would like to stay with Mercurial rather than git..

Would a simpler solution to actually decoding bundle2 simply be to stop decoding the protocol stream and pass everything through untouched and unparsed as soon as we see the unbundle command?

I might be missing something in the Diffusion code, but it looks like the only reason to decode the protocol stream is to enforce read only vs read/write, and to determine when the repository needs to be scanned for changes (due to a write operation). The "unbundle" command is considered a write operation, and so as soon as we see it we can either stop the stream (read only) or simply start passing everything verbatim without parsing (read write) because we have already answered the question of whether we saw a write operation..

Or am I missing something obvious?

I'm not sure why phabricator is trying to parse the bundle stream (I don't know anything about phabricator), but you should be able to invoke hg with bundle2 disabled by setting experimental.bundle2-advertise to false in an hgrc (or using --config if you're shelling out to hg). Given that it's an experimental flag, it's not something I can promise will exist forever, but it might be a start.

@durin42 - that would probably be better than parsing the bundle stream but, is that documented anywhere? I tried looking fairly extensively for details on the bundle2 protocol and never came across that. Also, the bundle stream itself isn't what's being parsed/modified, it's the capabilities/outer protocol which I found at least some documentation, which is what the parsing/filtering is based on.

We've got a volunteer (slowly) working on documenting formats, but right now bundle2 is not documented extensively outside of the code and the wiki page already mentioned in this thread.

Can you provide me some idea of what you're trying to *do* by wrapping the protocol? I still think it's likely that something else is a significantly more elegant and maintainable solution.

I am totally on-board with finding something more elegant/maintainable, and would be able to help with some testing and possibly development. I'll try to look over the code in the near future to recall what it was doing, but I believe it's trying to determine whether the mercurial operations are read/write in order to determine whether the actor has necessary permissions to do so. I *believe* that's what it's doing, but it's been close to a year since I was looking through this.

Ah, okay. We can probably do a lot better with some sort of pre-transaction hook then (that was my gut feeling to begin with). Once you've got an idea what the use case is, maybe we can chat on the hg mailing list?

Thanks!

Today, we technically do not need to decode the protocol, but we will need to be able to distinguish between read and write requests to implement cluster read-only nodes (T10883) in the future and need to be able to rewrite the protocol to keep parity with fork-like/virtual-ref features that we plan to support in Git (T10691, T8092).

Working around this by using, e.g., a commit hook will take us further away from those features.

I'm not a phabricator developer so I'm not sure if supporting mercurial hooks is something they're willing to take upstream. I can join the hg mailing list, but wouldn't this conversation be specific to phabricator and beneficial to be carried out here?

It's not entirely phabricator specific, as others probably have similar use cases. I'm literally the only hg dev that knows this is an issue (someone flagged me on twitter a month ago, and it's only a strange inability to sleep tonight that got me deep enough in my stack taht I got to this), but if you go to our ML with a decent problem statement ("we need a hook to let us identify read vs write transactions and do our own ACL checking" sounds like a start?) you'll get more help than just me when I'm unable to sleep. :)

As for the virtual-forks thing, I've had some ideas about "overlay" repositories that might do what you need, and I think *maybe* someone at Mozilla has given thought to something similar, but it's all sketchy at best right now. There are some BitBucket employees on our ML, they might have ideas on that front too.

In the read/write case, it's too late once we hit a hook -- we need to be able to decide which cluster node to send the traffic to, and we need to know if it's a read or write request before we make that decision, since there may be a nearby read-only node and a distant read/write node. If we can examine the stream and make a determination that the request is read-only, we can freely pick a nearby read-only node.

(We do already implement most logic in a hook, we just can't put request routing logic there since it's too late to make a decision.)

While I'm hesitant to throw stones here and don't really have any context on bundle2 or the wire protocol in general and there may be plenty of reasons that it works like it does, my general impression is that the Mercurial wire protocol is significantly under-designed, and is now on at least the 3rd and possibly 4th iteration of hacks to try to address the lack of actual protocol design.

The existing base protocol feels like very little thought went into it, and the implementation just took the shortest path to running hg commands in the remote that it could, accepting a very leaky protocol that requires a high level of application-specific knowledge on both the client and server.

Speaking as a contributor to Mercurial, there is some truth to these statements :)

The existing protocol has evolved organically over 10+ years. And from my time being involved with the project, it feels like the protocol design is a secondary concern, taking a back seat to more pressing matters like "what is the quickest path to accomplishing X." I think a lot of this is due to sub-optimal choices several years coupled with Mercurial's commitment to backwards compatibility. Another stems from a simple fact: there has historically only been ~1 client and ~1 server implementations. When you don't have to worry about interop, it's really easy to ship the simplest thing that works.

Recently, a few groups have explored implementing alternate servers. And - surprise - they are finding all the wonkiness in the protocol's design and have subsequently started to press for a more well-designed protocol.

Anyway, I recently started documenting the wire protocol as part of the "internals" help pages that ship with Mercurial. You can find that work at https://www.mercurial-scm.org/repo/hg/file/default/mercurial/help/internals/wireprotocol.txt. (There are other docs at https://www.mercurial-scm.org/repo/hg/file/default/mercurial/help/internals/ you may find useful as well.) Unfortunately, there are currently no "internals" docs for bundle2.

I would echo what @durin42 said: please send an email to the mercurial-devel list documenting your wire protocol desires. I'll likely be making changes to the bundle exchange facilities in Mercurial 4.1 as part of supporting non-zlib compression formats (namely zstd). I've also explored changes to discovery to facilitate infinitely scalable servers using key-value stores for repository storage (see https://bugzilla.mozilla.org/show_bug.cgi?id=1055298). If the wire protocol is overhauled, I'll be doing a lot of the work myself and/or will be intimately involved in reviewing it. But without having a clear understanding of Phabricator's wants, it will be difficult to assess whether we're solving your problems. Creating a wiki page like https://www.mercurial-scm.org/wiki/PhabricatorWants would also be a good place to record everything that Phabricator wants from Mercurial.

@indygreg - Thanks for reaching out and discussing a way forward. I am not on the phabricator team but I do have an interest in its integration capabilities with mercurial. At best I might be able to summarize my understanding of some of phabricator's needs based on the history of this task, but I do not feel I could properly represent them on mercurial's mailing-list or wiki. For hosting repositories I believe this task would require needing to know if the mercurial operation is read or write -- the two primary reasons being read/write access control and cluster picking.

There is another scenario where understanding the bundle wire protocol might be necessary, which is for supporting extensions on hosted repositories: T6118: Support for Mercurial extensions in hosted repositories

You mention working on updates to the bundle wire protocol, as early as Mercurial 4.1. Are there any shareable timelines for which more discussion would be needed by?

It should be possible to determine whether an operation is read or write by the command name: a command either performs writes or it doesn't. The only write commands you should see are unbundle and pushkey. While it might be possible for batch to contain pushkey operations, I don't think that's done today (but I'd have to audit the code to be sure). One thing you should be able to rely on is the HTTP method: GET requests should never perform any significant writes (although some cache files may be written to as a result of resolving data, but that shouldn't be important). It should be perfectly valid to treat GET and POST differently. Although if you route requests to different endpoints within the same TCP/HTTP stream, inconsistent state of a read-only mirror from a master could lead to race conditions (e.g. push fails because a mirror is out of sync). If you route by HTTP request method, you should not need to know anything about the Mercurial wire protocol.

The SSH protocol doesn't have an outer protocol like HTTP does. So there you would need wire protocol knowledge to detect commands. It might be best to stick to HTTP.

It's worth noting that at Mozilla we have a single writable master for hg.mozilla.org accessed via SSH and read-only mirrors accessed via HTTPS. We use the pushurl sub-option of [paths] config entries to define an ssh:// and https:// endpoint for repos. We have a separate server/service that authenticates POST requests to allow pushes over HTTP while allowing unauthenticated GET requests for public access. The Apache server config can be seen at https://hg.mozilla.org/hgcustom/version-control-tools/file/08e27bfb0aaa/ansible/roles/hg-reviewboard/templates/vhost.conf.j2.

Regarding wire protocol updates timeline, my immediate goal is for the getbundle command to support non-zlib responses. I'll probably just hack that into the capabilities system. I'd love to do a more complete protocol rewrite. But it's one of those things I can't justify unless the existing protocol fails hard. So there's no timeline for that.

Today, we already decode the SSH wire protocol and figure out what's going on inside it up to the bundle2 stuff -- here's the, uh, "protocol spec":

https://secure.phabricator.com/diffusion/P/browse/master/src/applications/diffusion/protocol/DiffusionMercurialWireProtocol.php

We're paranoid and examine the component commands in "batch" commands, since we also do initial policy enforcement at this layer and don't want anyone sneaking a write command into a "batch" workflow by being clever on the client.

This is the actual "parser":

https://secure.phabricator.com/diffusion/P/browse/master/src/applications/diffusion/ssh/DiffusionMercurialWireClientSSHProtocolChannel.php

In Git, we support multi-master hosting today, and support SSH in both Git and Mercurial, prior to clients that use bundle2. We're likely within striking distance of multi-master support on Mercurial. I'd like have parity in our Mercurial support, so "don't support multi-master" and "don't support SSH" aren't very attractive as approaches -- if anything, they're potentially worse than "don't support recent versions of Mercurial" for many installs.

(Does Mercurial actually enforce that the client isn't allowed to hide a write command in a batch command in a GET request? I haven't looked at the code in a while, but it seems potentially dangerous to rely HTTP-level configuration given the complexity and irregularity of the protocol, and, uh, the CVE-2014-9462 bug was a thing.)

The only thing we really care about is parsing the protocol. I'll get it to parse, it's just going to take me a day or two of misery and cursing and wringing of hands compared to, e.g., the relatively regular-ish SVN grammar which took a couple of hours to get working pretty well and doesn't require us to enumerate every different command to parse a stream or recognize and support a bouquet of sub-protocols. But it's perfectly understandable that regularizing the protocol grammar isn't a priority, particularly since I imagine very few other projects are trying to decode the protocol.

As a user of both Phabricator and Mercurial, opening up HTTP isn't an option we want to consider and would instead stick to having the repository hosted elsewhere.

In terms of Phabricator supporting Mercurial extensions (T6118), would it also be necessary to understand the wire protocol as well? Specifically for something like the largefiles extension which I believe adds new commands that read/write the large files. We turned on largefiles for our repositories and have later come to the conclusion that we should not have, however we can't turn it off without converting our repository (re-writing the entire history).

I'd love to do a more complete protocol rewrite. But it's one of those things I can't justify unless the existing protocol fails hard. So there's no timeline for that.

I understand that situation :s. I was primarily asking about timeline as my interpretation of your previous comment was that there was rewrite/overhaul of the wire protocol planned and I was concerned about whether we would be too late in starting a conversation about Phabricator's needs.

Heads up: Mercurial 4.1 (to be released ~February 1) introduces media type negotiation in the HTTP protocol. https://www.mercurial-scm.org/repo/hg/rev/753b9d43ca81 captures the wire protocol changes and the commit message explains why things are done the way they are.

HTTP bodies may now be compressed with non-zlib compression formats. Mercurial 4.1 now ships with zstd and will prefer zstd over zlib on the wire protocol. And, Mercurial 4.1 supports pluggable compression engines via extensions, so random servers may have random compression formats being used over the wire protocol. There is a config option (https://www.mercurial-scm.org/repo/hg/file/98bfce9bd5e5/mercurial/help/config.txt#l1548) to control which compression formats are advertised over the wire protocol. Worst case, an intermediate proxy like Phabricator can sniff the capabilities string coming from the server and remove things it doesn't know how to speak.

There is currently no config option to disable advertisement of the new HTTP media types. Although I may add that before 4.1.

There are no changes to the SSH wire protocol in 4.1. The SSH protocol is still horrible.

Thanks!

We probably would not use an option to turn the feature off (versus just supporting it) so don't worry about adding it on our behalf. At the point where I was set up to convincingly test that turning the feature off worked -- and we had navigated bundle2 in some way -- I'd probably be most of the way toward proper support anyway.

From a cursory read of the linked commit, everything seems pretty reasonable to me. I'm also definitely not in the "you should use 'Accept'" camp of HTTP protocol design, and firmly agree with the rationale presented there and the choice not to overload that header.

Just curious if there is any news on this front?

See Planning for information on timelines and priorities.

My solution to this problem was to compile version 3.4.2 of mercurial and install with a different PREFIX, and only use it for pushing to phabricator. That way I can use the latest version of mercurial for everything else.

In my ~/.bash_profile:

hg() {
    if [ "$1" = "push" ]; then
        /usr/local/bin/mercurial-3.4.2/bin/hg "$@"
    else
        /usr/bin/hg "$@"
    fi
}

This is pretty sad that this bug is still open. I haven't found a way to use Phabricator with recent Mercurial, so I'm sticking on 4.0.1 ATM.

https://www.mercurial-scm.org/repo/hg/file/default/contrib/phabricator.py is a Mercurial extension that allows you to send a series of changesets to Phabricator by calling Conduit APIs directly. The extension is currently tailored for the use cases of the Mercurial project itself, isn't distributed with Mercurial, and may require Mercurial 4.3. And it obviously bypasses Phabricator's built-in Mercurial server. But if you are willing to live with the caveats, you may find it a suitable workaround.

There are enough contributors to Mercurial that use Phabricator that if you want this extension to be part of the Mercurial distribution, I could potentially see that happening. It is best to express desires for this with the Mercurial project - not on this Phabricator instance.

In addition, Mozilla is rolling out Phabricator and will almost certainly not be using Phabricator's built-in Mercurial server for various reasons. Whatever tools are written to bridge the gap between a Mercurial server and Phabricator will be open source. But AFAIK that code hasn't been written yet. Ping me in 3 months if you don't hear anything.

https://www.mercurial-scm.org/repo/hg/file/default/contrib/phabricator.py is a Mercurial extension that allows you to send a series of changesets to Phabricator by calling Conduit APIs directly. The extension is currently tailored for the use cases of the Mercurial project itself, isn't distributed with Mercurial, and may require Mercurial 4.3. And it obviously bypasses Phabricator's built-in Mercurial server. But if you are willing to live with the caveats, you may find it a suitable workaround.

If I understand correctly this bypasses the use of arc diff to create revisions. However the issue is mostly with the bundle2 protocol and pushing from mercurial to phabricator's mercurial server. Is there any way this extension helps in that regard?

arc diff works mostly fine for me.

epriestley claimed this task.

T13036 is a more focused rebrand of this issue, since this has a lot of philosophy and not a lot of actionable reproduction steps. The three line fix in D18857 may have resolved these issues.

Mercurial 4.6 has added hg help internals.bundle2 which will render a book to your console with details about the format/protocol. I did a high-speed skim and then tried out hg help internals.changegroup on a whim which produced an even longer book about the stream protocol (SSH only maybe?).

Indeed hg help internals will render the table of contents