Page MenuHomePhabricator

PHP 7 does not reasonably support asynchronous signal handling until `async_signals` lands
Closed, DuplicatePublic

Description

When I try to stop/restart my daemons they don't seem to want to stop, and I saw this error over and over again in the error log.

[04-Jul-2016 16:25:20 America/New_York] arcanist(head=master, ref.master=4d4d16f25985), phabricator(head=master, ref.master=c7e7f113fd97), phutil(head=master, ref.master=dde2f74f2793)
[04-Jul-2016 16:25:20 America/New_York]   #0 ob_end_flush() called at [<phutil>/src/daemon/PhutilDaemon.php:259]
[04-Jul-2016 16:25:20 America/New_York]   #1 PhutilDaemon::endStdoutCapture() called at [<phutil>/src/daemon/PhutilDaemon.php:94]
[04-Jul-2016 16:25:20 America/New_York]   #2 PhutilDaemon::__destruct()
[04-Jul-2016 16:25:23 America/New_York] [2016-07-04 16:25:23] ERROR 8: ob_end_flush(): failed to delete and flush buffer. No buffer to delete or flush at [/home/phabricator/libphutil/src/daemon/PhutilDaemon.php:259]
[04-Jul-2016 16:25:23 America/New_York] arcanist(head=master, ref.master=4d4d16f25985), phabricator(head=master, ref.master=c7e7f113fd97), phutil(head=master, ref.master=dde2f74f2793)
[04-Jul-2016 16:25:23 America/New_York]   #0 ob_end_flush() called at [<phutil>/src/daemon/PhutilDaemon.php:259]
[04-Jul-2016 16:25:23 America/New_York]   #1 PhutilDaemon::endStdoutCapture() called at [<phutil>/src/daemon/PhutilDaemon.php:94]
[04-Jul-2016 16:25:23 America/New_York]   #2 PhutilDaemon::__destruct()

Here is some console output:

+ /home/phabricator/phabricator/bin/phd stop
Interrupting process 27311...
Interrupting process 27533...
Process 27311 exited.
Terminating process 27533...
Killing process 27533...
There are processes running that look like Phabricator daemons but have no corresponding PID files:

27534 php ./exec_daemon.php PhabricatorRepositoryPullLocalDaemon
27535 php ./exec_daemon.php PhabricatorTriggerDaemon
27537 php ./exec_daemon.php PhabricatorTaskmasterDaemon


Stop these processes by re-running this command with the --force parameter.

Event Timeline

Mnkras created this task.Jul 4 2016, 8:27 PM
Mnkras updated the task description. (Show Details)Jul 4 2016, 8:27 PM
Mnkras updated the task description. (Show Details)Jul 4 2016, 8:33 PM

I can't reproduce this. Here's what I did:

I ran bin/phd start:

$ ./bin/phd start
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/Users/epriestley/dev/core//log/local/phd/daemons.log".)

    PhabricatorRepositoryPullLocalDaemon (Static)
    PhabricatorTriggerDaemon (Static)
    PhabricatorTaskmasterDaemon (Autoscaling: group=task, pool=4, reserve=0)

Done.

Then, I ran bin/phd stop:

$ ./bin/phd stop
Interrupting process 90097...
Process 90097 exited.
$

The daemons stopped as expected.

Do you have any idea what I can do differently to reproduce the issue?

Im really not sure, when I run restart this is what my daemon page looks like:

which is not right, I verified its running as the daemon user,

I can't reproduce that either:

I ran bin/phd restart. I observed the daemons page, and saw this:

What can I do differently to reproduce the issue?

@Mnkras gave me access to the host.

By the time I logged in, the existing daemons had exited (did not exist in ps auxwww) and the heartbeats had failed to fire for long enough to move them to "Unknown" in the web UI:

bin/phd status reported this:

root@phab:/home/phabricator/phabricator# ./bin/phd status
Log Daemon       Host            Overseer Started                Class                                       Arguments
530 31505:zgmahu localhost       31505    Jul 4 2016, 6:45:32 PM <DEAD> PhabricatorRepositoryPullLocalDaemon 
531 31505:mep6lm localhost       31505    Jul 4 2016, 6:45:32 PM <DEAD> PhabricatorTriggerDaemon             
532 31505:6a6jxy localhost       31505    Jul 4 2016, 6:45:32 PM <DEAD> PhabricatorTaskmasterDaemon          
533 31714:jug2ib localhost       31714    Jul 4 2016, 6:46:57 PM PhabricatorRepositoryPullLocalDaemon        
534 31714:5pgmm7 localhost       31714    Jul 4 2016, 6:46:57 PM PhabricatorTriggerDaemon                    
535 31714:ccfjp2 localhost       31714    Jul 4 2016, 6:46:57 PM PhabricatorTaskmasterDaemon                 
529 27912:kbqbvw phab.krasnow.me 27912    Jul 4 2016, 4:32:09 PM PhabricatorTaskmasterDaemon                 
528 27912:un7bsw phab.krasnow.me 27912    Jul 4 2016, 4:32:09 PM PhabricatorTriggerDaemon                    
527 27912:cd7iub phab.krasnow.me 27912    Jul 4 2016, 4:32:09 PM PhabricatorRepositoryPullLocalDaemon

I ran bin/phd stop and saw similar behavior:

# ./bin/phd stop
Interrupting process 31505...
Interrupting process 31714...
Process 31505 exited.
Terminating process 31714...
Killing process 31714...
There are processes running that look like Phabricator daemons but have no corresponding PID files:

31715 php ./exec_daemon.php PhabricatorRepositoryPullLocalDaemon
31716 php ./exec_daemon.php PhabricatorTriggerDaemon
31717 php ./exec_daemon.php PhabricatorTaskmasterDaemon


Stop these processes by re-running this command with the --force parameter.

I ran bin/phd stop --force:

# ./bin/phd stop --force
Interrupting process 31714...
Process 31714 exited.

This left no daemons running on the host.


I ran bin/phd debug task; the dameon does not respond to ^C:

# sudo -u phab-daemon ./bin/phd debug task
Launching daemons (in debug mode):
(Logs will appear in "/var/tmp/phd/log/daemons.log".)

    PhabricatorTaskmasterDaemon (Static)


    phabricator/scripts/daemon/ $ ./phd-daemon '--verbose'

2016-07-04 7:16:36 PM [INIT] Starting process.
^C^C^C^C

With --trace, the daemon isn't doing any work, and also isn't hung (just normal queries to look at the queue).

pcntl and posix are both available.

The debug daemon does not appear to respond to an external SIGTERM (via kill <pid> either).

The PhutilNiceDaemon daemon also does not respond to ^C.

This trivial script works correctly:

<?php

declare(ticks=1);

pcntl_signal(SIGINT, 'onsignal');

function onsignal($sig) {
  echo "got {$sig}\n";
  exit;
}

while(1) {
  sleep(1);
}

I changed the SIGINT handler to exit immediately, still unresponsive to SIGINT:

   public function didReceiveGracefulSignal($signo) {
+    echo "Got Int!\n";
+    exit;
+

Removing the SIGINT handler entirely made it responsive.

I'm going to try disabling daemon behavior until I can make the installed handler responsive again.

I disabled the entire actual daemon behavior, still unresponsive.

I disabled the other signal handlers, still unresponsive.

I made the constructor just do this, still unresponsive:

public function __construct(array $argv) {
  pcntl_signal(SIGINT,  array($this, 'didReceiveGracefulSignal'));

 while(1) {
   sleep(1);
 }
}

I no-op'd the constructor and put the logic in run(), still unresponsive.

I put this logic in phd-daemon, now responsive:

 pcntl_signal(SIGINT, 'onsignal');

 while(1) {
   sleep(1);
 }

...

I put the global signal handler in run(), unresponsive.

I put the global signal handler in the script and the sleep() in run(), unresponsive.

I put the global signal handler in the script, made run() return immediately, put the sleep in the global script, responsive.

Here's a minimal test case, which requires two files:

test.php
# cat test.php 
<?php

declare(ticks = 1);

pcntl_signal(SIGINT, 'onsignal');

function onsignal($sig) {
  echo "Got SIGINT.\n";
  exit;
}

require_once 'C.php';

$o = new C();
$o->sleep();
C.php
<?php

class C {
  public function sleep() {
    while (1) {
      sleep(1);
    }
  }
}

This program is unresponsive to SIGINT.

Here's the local PHP:

# php -v
PHP 7.0.8-3+deb.sury.org~trusty+1 (cli) ( NTS )
Copyright (c) 1997-2016 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies
    with Zend OPcache v7.0.8-3+deb.sury.org~trusty+1, Copyright (c) 1999-2016, by Zend Technologies

I produced this test script to try to see about version issues via 3v4l.org, but you can't write files there. Requiring C as part of a different file is important, so I don't see a way to use that.

<?php

$c = <<<EOPHP
<?php

class C {

  public function sleep() {
    while(1) {
      sleep(1);
    }
  }
}

EOPHP;

$d = <<<EOPHP
<?php

declare(ticks = 1);

require_once 'C.php';

pcntl_signal(SIGINT, 'onsignal');

function onsignal(\$signal) {
  exit;
}

\$o = new C();
\$o->sleep();
EOPHP;

file_put_contents('C.php', $c);
file_put_contents('D.php', $d);

$spec = array(
  array('pipe', 'r'),
  array('pipe', 'w'),
  array('pipe', 'w'),
);

$pipes = null;

$proc = proc_open('php -f D.php', $spec, $pipes);

$status = proc_get_status($proc);
$pid = $status['pid'];

echo "Interrupting subprocess.\n";
posix_kill($pid, SIGINT);

echo "Waiting for subprocess to exit.\n";
while ($status['running']) {
  usleep(10000);
  $status = proc_get_status($proc);
}

echo "Done.\n";

Here's a smaller, self-contained test case:

<?php

declare(ticks = 1);

pcntl_signal(SIGINT, 'onsignal');

function onsignal($signal) {
  echo "Got SIGINT.\n";
  exit;
}

$c = <<<EOPHP
class C {

  public function sleep() {
    while(1) {
      usleep(10000);
      posix_kill(getmypid(), SIGINT);
    }
  }
}

EOPHP;

eval($c);

$o = new C();
$o->sleep();

Locally, on 5.5.22, this exits.

On the remote server, this hangs.

3v4.org also doesn't support pcntl_signal() though, so this isn't really much help in that regard.

I'm building PHP 7.0.8 locally. In the meantime, I found these possibly similar reports:

https://github.com/udokmeci/yii2-beanstalk/issues/25
https://github.com/Behat/Behat/issues/832

Here's an upstream report:

https://bugs.php.net/bug.php?id=71448

This is closed WONTFIX:

For PHP 7.1 we may make automatic pcntl signal handling work without ticks, as we now have a new internal mechanism for handling ordinary timeouts which we can reuse.

So it appears like this behavior was intentionally changed by PHP 7.

If this holds more generally, I don't see any way we can write daemons which respond to signals given the PHP 7 behavior.

Yes, this reproduces on a local build of PHP 7.0.8. Here's what I did:

  • Downloaded PHP 7.0.8.
  • Ran ./configure --enable-pcntl.
  • Ran make
  • Ran ./sapi/cli/php -f test.php, where test.php is this script:
<?php

declare(ticks = 1);

pcntl_signal(SIGINT, 'onsignal');

function onsignal($signal) {
  echo "Got SIGINT.\n";
  exit;
}

$c = <<<EOPHP
class C {

  public function sleep() {
    while(1) {
      usleep(10000);
      posix_kill(getmypid(), SIGINT);
    }
  }
}

EOPHP;

eval($c);

$o = new C();
$o->sleep();

This hangs indefinitely.

I think all we can do is refuse to start under PHP 7. If PHP 7.1 includes usable signal handling, we can start working with that once it lands.

The PHP council of elders is currently voting on an asynchronous signals proposal (as of June 24, about a week ago):

https://wiki.php.net/rfc/async_signals

It appears to have unanimous support so far.

So this is almost certainly the pathway forward, but presumably not until PHP 7.1.

We could continue running on PHP 7.0 if there's some way to simulate the old declare(ticks = 1); behavior, but I don't see one other than putting this in every file, which I don't think is reasonable, particularly given that PHP 7.1 appears to provide a real mechanism.

We could make some effort to do synchronous signal processing in the daemons, but it will always be much worse than asynchronous processing. Since it appears that we'll be in the clear again after PHP 7.1, blacklisting PHP 7 until the async_signals proposal lands seems like the cleanest way forward.

epriestley renamed this task from Daemons fail to stop and must be terminated to PHP 7 does not reasonably support asynchronous signal handling until `async_signals` lands.Jul 5 2016, 12:38 AM
<?php

pcntl_signal(SIGINT, 'onsignal');

function onsignal($signal) {
    echo "Got SIGINT.\n";
      exit;
}

$c = <<<EOPHP
class C {

  public function sleep() {
    while(1) {
      usleep(10000);
      posix_kill(getmypid(), SIGINT);
      pcntl_signal_dispatch();
    }
  }
}

EOPHP;

eval($c);

$o = new C();
$o->sleep();

works for me

note that in the above we're explicitly handling signals. The signal_dispatch call must be placed inside of some activity loop.

Yes, that's synchronous signal handling.

We could make some effort to do synchronous signal processing in the daemons, but it will always be much worse than asynchronous processing.

That won't work if usleep(10000) is really execx('git pull %s', $huge_repo);, etc.

I don't think the worseness and support burden of trying to limp along with synchronous handling is worthwhile given the async_signals proposal.

I was writing my reply while you were writing the above. Just figured I'd mention it.

@Mnkras, I think I've undone everything I did and terminated all sessions on the host. You can revoke my key at your convenience.

(And thanks for providing access to an environment where the issue reproduces!)

Mnkras added a comment.Jul 5 2016, 5:55 AM

Of course! Thanks for poking around!

J5lx added a subscriber: J5lx.Jul 29 2016, 3:21 PM
thoughtpolice added a subscriber: thoughtpolice.EditedSep 1 2016, 11:36 PM

FWIW, PHP 7.1.0 RC1 was released today, and it includes the async signal change. The following amended script from T11270#184087 seems to work:

<?php

pcntl_async_signals(1);

pcntl_signal(SIGINT, 'onsignal');

function onsignal($signal) {
  echo "Got SIGINT.\n";
  exit;
}

$c = <<<EOPHP
class C {

  public function sleep() {
    while(1) {
      usleep(10000);
      posix_kill(getmypid(), SIGINT);
    }
  }
}

EOPHP;

eval($c);

$o = new C();
$o->sleep();
aseipp@ubuntu:~/src/php-7.1.0RC1$ ./sapi/cli/php -f test.php 
Got SIGINT.
aseipp@ubuntu:~/src/php-7.1.0RC1$

However, it seems the ability to set pnctl.async_signals in php.ini was removed, so the call to pcntl_async_signals(1); is mandatory - but otherwise I suppose you don't have to include it in every file (like declare(ticks=1), which has more subtle semantics that just changes the current file), just in some preamble that will always necessarily be run?

J5lx awarded a token.Dec 15 2016, 7:11 PM

Now that PHP 7.1 is released you could think what will need to be changed and give a priority to this task.

chad added a comment.Jan 12 2017, 2:50 PM

I can't think of a reason to prioritize this. Specifically, it seems like since Phabricator for most (all?) companies is a business critical piece of software, you'd always choose to run it on the most reliable / stable version of PHP. Is there a reason we should consider that not to be the case?

See Planning if you have questions about how we prioritize tasks.

I haven't explain correctly, what I mean is that this task is "Needs Triage" now but you can know how many work you will need to do (more or less) then you can set its priority (low, hight, whatever).

I'm just going to merge this into T9640, it isn't meaningfully different from an implementation perspective and is the major compatibility issue.