Page MenuHomePhabricator

Fix isValidGitShallowCloneResponse
ClosedPublic

Authored by dzduvall on Sat, Oct 24, 10:43 PM.

Details

Summary

Changes the heuristic method by which non-zero exit statuses from git-http-backend are found to be due to packfile negotiation during shallow fetches, etc.

Instead of checking git-http-backend stderr for a generic "hung up" error message, see if the pack-result response contains a terminating flush packet ("0000"). This should give a greater assurance that the request was handled correctly and the response is complete.

Test Plan

Run GIT_CURL_VERBOSE=1 git fetch --depth 1 https://host.example/source/repo.git HEAD to ensure it completes and includes two successful POST requests during packfile negotiation (the last one actually receives the packfile).

Diff Detail

Repository
rP Phabricator
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Minor nitpicks inlines, I'll just tweak them locally as I land this.

(Since this wasn't submitted with arc, the Git "Author" information has been lost and the commit will be incorrectly attributed to me as an author. If you submit further patches, you can use arc to retain authorship information if you'd like.)

src/applications/diffusion/controller/DiffusionServeController.php
937

For clarity and consistency in this codebase, prefer explicit parentheses when using the result of an expression as a value. That is, this:

$x = ($y > $z);

...is preferable to:

$x = $y > $z;

This should be a lint rule, but this change didn't come through arc and missed lint anyway, so it's sort of moot.

The existing code also incorrectly doesn't follow this convention, and returns an expression later on (return x && y instead of return (x && y)).

937

For correctness, prefer === over ==. This should also be a lint rule, but would be ambitious at this point, see also "Increase Strictness" in T2312.

937

$stdout may be fewer than 5 characters long. In PHP 7 and newer, substr() returns "" in this case. In versions older than PHP 7, it returns false.

It seems at least plausible that this may emit a warning in some future version of PHP (or possibly a manufactured version of PHP under T2312). Even if it doesn't, the ambiguity of the input and variance of behavior across PHP versions is probably worth avoiding by either testing for the string length before taking a slice of it or using substr_compare() instead, where the semantics of out-of-bounds ranges are less ambiguous.

937

The byte stream has "<length><raw bytes>" sections and may naturally contain \n0000 outside of a flush packet (that is, none of these bytes are guaranteed to be escaped if they appear elsewhere in the protocol).

I think this is still a better approach than matching error text on stderr. Phabricator can parse significant chunks of the Git protocol to some degree now (see D20381; this capability was built after the code here was written) and we could imagine pursuing a full-power version of this change where we parse the entire protocol. I intend to do this eventually, but I think this change is a fine stopgap in the meantime.

937

There is no guarantee that the frame prior to the 0-length frame terminates with "\n". Locally, if I fetch a HEAD I already have (git fetch --depth 1 origin HEAD), I get this over the wire (I've added visual newlines for clarity; newlines in the raw wire bytes are represented with \r and \n):

Expires: Fri, 01 Jan 1980 00:00:00 GMT\r\n
Pragma: no-cache\r\n
Cache-Control: no-cache, max-age=0, must-revalidate\r\n
Content-Type: application/x-git-upload-pack-result\r\n
\r\n
0034
shallow 2d34f0e1ea69324a12f7e4a3f277b13f104370a9
0000

The documentation has examples with a reflist frame that does terminate with \n, but the frame format in general is just <length><stuff>, and not all frames terminate like this.

A check against just 0000 (versus \n0000) seems like the simplest adjustment.

This revision is now accepted and ready to land.Fri, Oct 30, 8:32 PM
This revision was automatically updated to reflect the committed changes.

Minor nitpicks inlines, I'll just tweak them locally as I land this.

(Since this wasn't submitted with arc, the Git "Author" information has been lost and the commit will be incorrectly attributed to me as an author. If you submit further patches, you can use arc to retain authorship information if you'd like.)

Thanks for the review/land! I'll be sure to use arc next time.

FWIW, I've submitted a patch to git-upload-pack to correct the behavior, and someone suggested in the review that Phabricator/Diffusion might be better off ignoring the exit code as the client can handle any truncation on its end (since the protocol contains explicit line length headers). In testing the behavior, I noticed that Apache/mod_cgi indeed doesn't care about git-http-backend's non-zero exit so perhaps that's why this hasn't been an issue for most folks—I was certainly surprised it hadn't been more widely reported.

Here's the start of the thread.

https://lore.kernel.org/git/20201030090902.GA3268509@coredump.intra.peff.net/T/#t
https://lore.kernel.org/git/20201030223504.45978-1-dan@mutual.io/T/#u

Ah, thanks for the pointers!

I'm generally pretty hesitant to ignore CLI exit codes because they can signal a whole host of "obviously not our problem" conditions, like this recent report on Discourse:

https://discourse.phabricator-community.org/t/svn-repository-does-not-import/4074/5

The user appears to have some kind of linker error in (a copy of?) their svn binary in their third-party-container environment. This sort of thing is super easy to identify when Phabricator is strict about errors, but is can sometimes be a crazy wild goose chase (since there's basically zero chance I'll be able to reproduce the issue) if it isn't.

The bright side is that the number of cases where we currently need to trust a binary that exited with an error is very small, something like ~3-5 cases across all of Arcanist and Phabricator. Some of them are very narrow, like this one:

// NOTE: Between versions 2.1 and 2.1.1, Mercurial changed the behavior
// of "hg pull" to return 1 in case of a successful pull with no changes.
// This behavior has been reverted, but users who updated between Feb 1,
// 2012 and Mar 1, 2012 will have the erroring version. Do a dumb test
// against stdout to check for this possibility.
// See: https://github.com/phacility/phabricator/issues/101/

Testing that stderr or stdout "look right" to discard the error is gross (and, as here, often fragile) but I think it's better on the balance than just ignoring the error entirely: users have broken binaries or systems far more often than these tests legitimately break because of changes in the underlying tool.

(Philosophically, see also Command Line Exit Codes, perhaps.)