(stable) Fix an issue where subprocesses could have data left on stdout/stderr


(stable) Fix an issue where subprocesses could have data left on stdout/stderr

Ref T9724. This is likely a fix for that issue, which could cause hangs on git clone and similar requests.

It appears that we were sometimes leaving data on the subprocess pipes. Specifically, here's what would happen:

  • we'd have full or nearly-full read buffers on the ExecFuture;
  • we'd update the subprocess;
  • it would read only some of the data, but not be able to read everything (since the buffers were full, or became full after reading a little bit);
  • however, since the subprocess had exited, we'd terminate the subprocess and close everything down.

In this situation, we would never read the final bytes off stdout or stderr.

This could lead to git clone correctly hanging, waiting for those last few bytes, which would never come.

Test Plan:

  • Ran git clone ... against a large repository (Phabricator) in a loop for a long time.
  • Before the patch, it would very occasionally hang randomly, on the "Receiving objects: ..." step (on my local install, I could get this maybe 2-3% of the time?)
  • After the patch, I can't get it to hang anymore and have been running it for ~10x longer than it ever needed before.

Beyond this, I added a bunch of debugging/logging which seemed to confirm that my analysis/fix are correct. Specifically:

  • I added a log when we were tearing down the process to see if either stdout or stderr were not feof()'d. When things hung, this message logged; normally, it did not.
  • I added logging to show the final bytes read from the subprocess and the final bytes sent to the client. The correct final bytes look something like ...\002Total 190735 (delta 132738), reused 186896 (delta 129329)\n0000. When the error occurred, different final bytes were observed.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9724

Differential Revision: https://secure.phabricator.com/D14920