Page MenuHomePhabricator

Writing to streams can run into issues with EINTR
Closed, ResolvedPublic

Description

See https://discourse.phabricator-community.org/t/sporadic-git-cloning-hang-over-ssh/2233/. This is probably a real issue but the reproduction case is "clone 9000 times in a loop under a full moon" and the proposed patch involves "preg_match()" against English language error messages. Hopefully we can do better on both dimensions.

See also https://bugs.php.net/bug.php?id=39598 in the PHP upstream.

Event Timeline

epriestley created this task.

Here's a script which can hit these behaviors without cloning 9000 working copies in a loop:

#!/usr/bin/env php
<?php

pcntl_async_signals(true);
error_reporting(E_ALL | E_STRICT);
set_error_handler('onerror');
pcntl_signal(SIGHUP, 'onsignal');

$stdout = fopen('php://stdout', 'w');
if (!$stdout) {
  throw new Exception(pht('Unable to open stdout.'));
}

stream_set_blocking($stdout, 0);
$one_megabyte = str_repeat('.', 1024 * 1024);

xlog('This PID', getmypid());

while (true) {
  xlog('Trying a write', '...');

  $result = fwrite($stdout, $one_megabyte);
  xlog('fwrite()', $result);

  $read = array();
  $write = array($stdout);
  $except = array();

  xlog('Starting a', 'stream_select()');
  $result = stream_select($read, $write, $except, 5);
  xlog('stream_select()', $result);
  xlog('error_get_last()', error_get_last());
  xlog('posix_get_last_error()', posix_get_last_error());

  sleep(1);
}


function xlog($label, $value) {
  error_log($label.': '.json_encode($value));
}

function onerror($errno, $errstr) {
  xlog($errno, $errstr);
}

function onsignal($signo) {
  xlog('Got signal', $signo);
}

The general idea here is that you run it like:

$ ./select_errors.php | sleep 15

...and then maybe send it some signals with:

$ kill -HUP <pid>

The general behavior here is:

  • The first fwrite() will write some bytes. These bytes will fill up the buffer on our stdout / sleep's stdin.
  • Since sleep does not read stdin, the buffer will remain full. Subsequent fwrite() calls will write 0 bytes. This corresponds to EAGAIN.
  • After 5 seconds (whatever you pass to sleep X), the sleep process will exit. This will break the pipe, and put us in an EPIPE condition.
  • If you HUP the process in another terminal at the right time, you can interrupt the stream_select() call. This is EINTR.

My observations are:

  • In EAGAIN, the write writes 0 bytes and the stream_select() says we're not writable. This is reasonable.
  • In EPIPE, the write writes 0 bytes but the stream_select() says we're writable. This is super not great and the original problem we're working around.
  • In EINTR, the write writes 0 bytes and the stream_select() returns an error. This is the new condition we aren't handling properly.

That is:

fwrite(): errorfwrite(): 0fwrite(): 1+
stream_select(): errorTrust fwrite() error.NEW! Assume EINTR.Trust fwrite() success.
stream_select(): 0Trust fwrite() error.Legitimate EAGAIN.Trust fwrite() success.
stream_select(): 1Trust fwrite() error.Old. Heuristically guess whether this is a real EAGAIN that we raced against or an EPIPE.Trust fwrite() success.

That table isn't very helpful, I guess.

The patch is straightforward, at least, although we have to assume that the only condition which causes fwrite() to return 0 and then stream_select() to return an error is EINTR. That seems reasonable, but this is a huge mess so who knows.

None of error_get_last() or posix_get_last_error() do anything. I don't have socket_last_error() and we don't depend on socket support, and the Discourse thread suggests this doesn't matter anyway.

I wouldn't exactly say it's a huge mess. It's just an interrupted system call, which is fairly well documented. See for example http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch10lev1sec5.html. It doesn't exactly help that PHP gives you a broken abstraction over the underlying C implementation.

The "mess" part is the broken PHP. In C, where you can tell what happened, the behavior is all reasonable.

That upstream bug report is really something else.

It's true that it has been 12 years, but I'm sure fwrite() will start returning an error code when it encounters a permanent, fatal EPIPE error condition soon.

It's true that it has been 12 years, but I'm sure fwrite() will start returning an error code when it encounters a permanent, fatal EPIPE error condition soon.

In PHP 7.4 fwrite() will return 0 for EAGAIN/EWOULDBLOCK and false for other errors, in line with the requested behavior.
https://bugs.php.net/bug.php?id=39598

There's a light at the end of the tunnel!

I believe our behavior is correct under this change to PHP behavior without modifications, since we already trusted return values from fwrite() other than 0.