HomePhabricator

Fix handling of gzip in VCS responses

Description

Fix handling of gzip in VCS responses

Summary:
Fixes T10264. I'm reasonably confident that this is the chain of events here:

First, prior to 8269fd6e, we would ignore "Content-Encoding" when reading inbound bodies. So if a request was gzipped, we would read a gzipped body, then give git-http-backend a gzipped body with "Content-Encoding: gzip". Everything matched normally, so that was fine, except in the cluster.

In the cluster, we'd accept "gzip + compressed body" and proxy it, but not tell cURL that it was already compressed. cURL would think it was raw data, so it would arrive on the repository host with a compressed body but no "Content-Encoding: gzip". Then we'd hand it to git in the same form. This caused the issue in 8269fd6e: handing it compressed data, but no "this is compressed" header.

To fix this, I made us decompress the encoding when we read the body, so the cluster now proxies raw data instead of proxying gzipped data. This fixed the issue in the cluster, but created a new issue on non-cluster hosts. The new issue is that we accept "gzip + compressed body" and decompress the body, but then pass the original header to git-http-backend. So now we have the opposite problem from what we originally had: a "gzip" header, but a raw body.

To fix this, we could do two things:

  • Revert 8269fd6e, then change the proxy request to preserve "Content-Encoding" instead.
  • Stop telling git-http-backend that we're handing it compressed data when we're handing it raw data.

I did the latter here because it's an easier change to make and test, we'll need to interact with the raw data later anyway, to implement repository virtualization in connection with T8238.

Test Plan: See T10264 for users confirming this fix.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10264

Differential Revision: https://secure.phabricator.com/D15258