Page MenuHomePhabricator

Make the "git upload-pack" proxy more robust
ClosedPublic

Authored by epriestley on Apr 16 2019, 4:23 PM.

Details

Summary

Depends on D20381. Ref T8093. This makes minor improvements to the protocol proxy to handle cases where we add, remove, or replace refs and may need to move the "capabilities" section.

Rather than invoking a callback on every ref: parse the whole ref list into a data structure, mutate it if necessary (in a future diff), then dump it back into wire format.

This allows us to shift the capabilities data (which needs to be coupled with the first ref) around if we modify the first ref, and reorder the reflist alphabetically like git does.

When the server has no refs, Git sends no capabilities data. This is easy to emulate, just surprising.

Test Plan

Tested the cases not covered by D20381:

  • Fetching where the fetch actually fetches data.
  • ls-remote when we hide the first ref (capabilities data properly moves to the first visible ref).
  • ls-remote when the remote is empty (we just drop the capabilities frame completely).

Diff Detail

Repository
rP Phabricator
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley created this revision.Apr 16 2019, 4:23 PM
epriestley requested review of this revision.Apr 16 2019, 4:25 PM
epriestley added inline comments.Apr 16 2019, 4:26 PM
src/applications/diffusion/protocol/DiffusionGitWireProtocolCapabilities.php
9–13

For now, we don't care about parsing/modifying these. We may once we look at protocol v2 support.

amckinley accepted this revision.Apr 17 2019, 9:26 PM
amckinley added inline comments.
src/applications/diffusion/protocol/DiffusionGitUploadPackWireProtocol.php
3–4

Not sure if there's a better place to put this, but it would be nice to have a little comment here like "Researched from the upstream git implementation written in C, primarily from https://blah.com/repo/git/wire-protocol.c" or whatever.

177–179

"// it through unmodified, but not until we know where this final frame should go."

This revision is now accepted and ready to land.Apr 17 2019, 9:26 PM
epriestley added a comment.EditedApr 18 2019, 11:59 AM
This comment has been deleted.
src/applications/diffusion/protocol/DiffusionGitUploadPackWireProtocol.php
3–4

For the moment I've just reverse engineered things by looking at the bytes on the wire and the man page:

https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols

...but I'll go dig up wire-whatever.c (which might really be wire-protocol.sh given that this is Git) at some point and make sure it says the same thing.

  • Update the "final frame" comment to make the "pass it through, but later" part more clear.
This revision was automatically updated to reflect the committed changes.

There's not much of a wire-proto.c to reference. Here's the code that generates the entire thing we're parsing (in Git 2.21-rc1):

upload-pack.c
...

static int send_ref(const char *refname, const struct object_id *oid,
		    int flag, void *cb_data)
{
	static const char *capabilities = "multi_ack thin-pack side-band"
		" side-band-64k ofs-delta shallow deepen-since deepen-not"
		" deepen-relative no-progress include-tag multi_ack_detailed";
	const char *refname_nons = strip_namespace(refname);
	struct object_id peeled;

	if (mark_our_ref(refname_nons, refname, oid))
		return 0;

	if (capabilities) {
		struct strbuf symref_info = STRBUF_INIT;

		format_symref_info(&symref_info, cb_data);
		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
			     oid_to_hex(oid), refname_nons,
			     0, capabilities,
			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
				     " allow-tip-sha1-in-want" : "",
			     (allow_unadvertised_object_request & ALLOW_REACHABLE_SHA1) ?
				     " allow-reachable-sha1-in-want" : "",
			     stateless_rpc ? " no-done" : "",
			     symref_info.buf,
			     allow_filter ? " filter" : "",
			     git_user_agent_sanitized());
		strbuf_release(&symref_info);
	} else {
		packet_write_fmt(1, "%s %s\n", oid_to_hex(oid), refname_nons);
	}
	capabilities = NULL;
	if (!peel_ref(refname, &peeled))
		packet_write_fmt(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
	return 0;
}

...

This is the only thing that jumps out to me:

	if (!peel_ref(refname, &peeled))
		packet_write_fmt(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);

peel_ref() accesses a global named the_repository, and can return 0 to indicate that the object was peeled (?) or various other values (-1, -2) to indicate various error conditions. I can't figure out what this is actually doing or how to trigger it.

Notable is that git already has a pseudo-virtualization feature baked in, GIT_NAMESPACE:

https://git-scm.com/docs/gitnamespaces

I think we want/need all this fancy virtualization stuff anyway because we want to do more than just namespace the repository, though (for example, master should still be available in the "build" view of the repository).

The reproduction case for this may be "have a ref which is an annotated tag". Let me see if that works.

The reproduction case for this may be "have a ref which is an annotated tag". Let me see if that works.

Yes, it is:

$ git tag -m hi annotag
$ git push origin annotag
$ git ls-remote origin
...
ad0a62a12a00e478485ba56c0a1f4fece1d16cd8	refs/tags/annotag
eae726bc814e71b61c3f58c891866647ad6a649f	refs/tags/annotag^{}
...

The refs/tags/annotag has the hash of the tag. The refs/tags/annotag^{} is the hash of the object the tag points at.

For now, we parse this correctly (see inline). In the future, we might want to apply some extra parsing to this.

src/applications/diffusion/protocol/DiffusionGitUploadPackWireProtocol.php
199

This (and the copy below) are sufficiently liberal to match ^{}.