Page MenuHomePhabricator

Evaluate virtualizing Git refs by proxying the protocol
Open, NormalPublic

Description

We may be able to implement T8092 by proxying the protocol, without needing to embed an implementation of Git. We do this to some degree in Mercurial and SVN already, with success. Although this is complex, it's potentially much less complex than embedding a Git implementation.

Event Timeline

epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added a project: Harbormaster.
epriestley added a subscriber: epriestley.

After tinkering a bit, I think this is viable. The Git wire protocol is relatively straightforward to proxy and rewrite at the ref level. However, we'll need to proxy both SSH and HTTP traffic, so we need to fix T4369 at a minimum before we can pursue this.

eadler added a project: Restricted Project.Aug 5 2016, 4:44 PM

Very soon now, git is getting an exciting new wire protocol. Highlights are improving performance on repos with unholy amount of refs, and being "easier to expand".

exciting new wire protocol

My plan for now is to do v1 support only, since: (a) we'll need v1 for 15 years anyway for everyone running Ubuntu 3 on original Xbox hardware in their corporate enterprise cluster; and (b) I can't immediately tricky my git into v2 anyway; and (c) it looks easier.

The v1 protocol looks like it's pretty one-shot and straightforward: whether we're running upload-pack or receive-pack, the server immediately sends a complete list of refs to the client when the client connects. This is sort of a weird way for the protocol to work for 10+ years (?), also considering that this is the "smart" protocol, but it makes our job easier, since it looks like we can (as a starting point, at least) just parse the first few frames of the protocol, delete/rewrite some refs, and then drop into passthru mode.

This will just hide the refs from the client. A "malicious" client could still use want commands to fetch the underlying commits. However, this is fine: we aren't planning to treat different views of the same repository as having different permissions.

The want/need stuff seems ref-independent, so editing the initial list of refs looks like it fixes the whole read pathway with no other changes.

The "push" part is a little messier since the client sends what it's pushing, then sends PACK data, then the server acknowledges what was written. We need to parse all of that so we can rewrite refs in the first part (client thinks it's pushing A, tell the server it's pushing secret/A) and the last part (server acknowledges a write to secret/A, we tell the client the server acknowledge a write to A).

When there are no refs in a repository, the server does not appear to send a capabilities frame:

! git-upload-pack -- '/Users/epriestley/dev/core/repo/local/12/'

< Write [4 bytes]
<  30303030                                                                                                     0000

> Read [4 bytes]
>  30303030                                                                                                     0000

_ <End of Session>

This makes our job a lot easier but also is absolutely bananas?