Page MenuHomePhabricator

On Git cluster read failure, retry safe requests
ClosedPublic

Authored by epriestley on Tue, Sep 3, 4:50 PM.

Details

Summary

Depends on D20775. Ref T13286. When a Git read request fails against a cluster and there are other nodes we could safely try, try more nodes.

We DO NOT retry the request if:

  • the client read anything;
  • the client wrote anything;
  • or we've already retried several times.

Although some requests where bytes went over the wire in either direction may be safe to retry, they're rare in practice under Git, and we'd need to puzzle out what state we can safely emit.

Since most types of failure result in an outright connection failure and this catches all of them, it's likely to almost always be sufficient in practice.

Test Plan
  • Started a cluster with one up node and one down node, pulled it.
  • Half the time, hit the up node and got a clean pull.
  • Half the time, hit the down node and got a connection failure followed by a retry and a clean pull.
  • Forced $err = 1 so even successful attempts would retry.
  • On hitting the up node, got a "failure" and a decline to retry (bytes already written).
  • On hitting the down node, got a failure and a real retry.
  • (Note that, in both cases, "git pull" exits "0" after the valid wire transaction takes place, even though the remote exited non-zero. If the server gave Git everything it asked for, it doesn't seem to care if the server then exited with an error code.)

Diff Detail

Repository
rP Phabricator
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley updated this revision to Diff 49539.Tue, Sep 3, 4:50 PM
epriestley created this revision.
  • Check read bytes first, since this number is more helpful/impressive on git fetch.
  • Only count bytes as read when we emit them to the wire. If we're proxying the git upload-pack ref-list and are still buffering it, we can safely retry the request without any peril.
Harbormaster completed remote builds in B23365: Diff 49539.
epriestley requested review of this revision.Tue, Sep 3, 4:53 PM
This revision was not accepted when it landed; it landed in state Needs Review.Tue, Sep 3, 5:08 PM
This revision was automatically updated to reflect the committed changes.