Page MenuHomePhabricator

On Git cluster read failure, retry safe requests
ClosedPublic

Authored by epriestley on Sep 3 2019, 4:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 17, 9:31 PM
Unknown Object (File)
Sun, Nov 17, 8:28 PM
Unknown Object (File)
Tue, Nov 12, 4:36 AM
Unknown Object (File)
Tue, Nov 12, 2:39 AM
Unknown Object (File)
Mon, Nov 4, 10:45 AM
Unknown Object (File)
Mon, Nov 4, 10:45 AM
Unknown Object (File)
Mon, Nov 4, 10:45 AM
Unknown Object (File)
Mon, Nov 4, 10:45 AM
Subscribers
None

Details

Summary

Depends on D20775. Ref T13286. When a Git read request fails against a cluster and there are other nodes we could safely try, try more nodes.

We DO NOT retry the request if:

  • the client read anything;
  • the client wrote anything;
  • or we've already retried several times.

Although some requests where bytes went over the wire in either direction may be safe to retry, they're rare in practice under Git, and we'd need to puzzle out what state we can safely emit.

Since most types of failure result in an outright connection failure and this catches all of them, it's likely to almost always be sufficient in practice.

Test Plan
  • Started a cluster with one up node and one down node, pulled it.
  • Half the time, hit the up node and got a clean pull.
  • Half the time, hit the down node and got a connection failure followed by a retry and a clean pull.
  • Forced $err = 1 so even successful attempts would retry.
  • On hitting the up node, got a "failure" and a decline to retry (bytes already written).
  • On hitting the down node, got a failure and a real retry.
  • (Note that, in both cases, "git pull" exits "0" after the valid wire transaction takes place, even though the remote exited non-zero. If the server gave Git everything it asked for, it doesn't seem to care if the server then exited with an error code.)

Diff Detail

Repository
rP Phabricator
Branch
fallback2
Lint
Lint Passed
SeverityLocationCodeMessage
Advicesrc/applications/diffusion/ssh/DiffusionGitUploadPackSSHWorkflow.php:131XHP16TODO Comment
Unit
Tests Passed
Build Status
Buildable 23365
Build 32095: Run Core Tests
Build 32094: arc lint + arc unit

Event Timeline

epriestley created this revision.
  • Check read bytes first, since this number is more helpful/impressive on git fetch.
  • Only count bytes as read when we emit them to the wire. If we're proxying the git upload-pack ref-list and are still buffering it, we can safely retry the request without any peril.
This revision was not accepted when it landed; it landed in state Needs Review.Sep 3 2019, 5:08 PM
This revision was automatically updated to reflect the committed changes.