Page MenuHomePhabricator

In "ExecFuture", "proc_get_status()" may be passed NULL, possibly leading to a daemon stall
Closed, ResolvedPublic

Description

See PHI1821. Details here are spotty, but I dug this out of the daemon log prior to what appeared to be a daemon stall:

[2020-07-22 10:34:15] EXCEPTION: (RuntimeException) proc_get_status() expects parameter 1 to be resource, null given at [<arcanist>/src/error/PhutilErrorHandler.php:263]
arcanist(head=stable, ref.master=3b6b523c2b23, ref.stable=2565cc7b4d1d), libcore(), phabricator(head=stable, ref.stable=2710a1e7aa42), services(head=stable, ref.master=772620edd80c, ref.stable=911b948c8e1f)
  #0 PhutilErrorHandler::handleError(integer, string, string, integer, array)
  #1 proc_get_status(NULL) called at [<arcanist>/src/future/exec/ExecFuture.php:893]
  #2 ExecFuture::procGetStatus() called at [<arcanist>/src/future/exec/ExecFuture.php:94]
  #3 ExecFuture::getPID() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonHandle.php:351]
  #4 PhutilDaemonHandle::startDaemonProcess() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonHandle.php:152]
  #5 PhutilDaemonHandle::update() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonPool.php:190]
  #6 PhutilDaemonPool::updatePool() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonOverseer.php:174]
  #7 PhutilDaemonOverseer::run() called at [<phabricator>/scripts/daemon/launch_daemon.php:24]

Whether or not this directly causes any problems, proc_get_status() should not be reachable with a NULL process handle.

Related Objects

Event Timeline

epriestley triaged this task as Normal priority.Jul 23 2020, 2:58 PM
epriestley created this task.

The call sequence here is approximately:

$this->future = $this->newExecFuture();
$this->future->start();

$this->pid = $this->future->getPID();

As an aside:

There's a partial behavioral mismatch here after changes in T11968, where some Future methods have blurry behavior and responsibilities.

In particular, when start() is called like this (an external caller explicitly starts a future), and starting the future raises an exception, it should set the exception state on the Future. It currently does not.

A piece of muddiness here is that there are two different "start" methods ("start()" and "startFuture()") which do slightly different things, don't strictly compose one another, and aren't clear about their behavior.

The actual semantic difference we want is one method that means "startThisFutureAndThrowExceptions()", and one method that means "startThisFutureAndHoldExceptionsUntilResolution()"; both should set exception state on the future. The latter is an internal API.

Since "start()" is external and widely used, I'm hesitant to change its name, and I don't have a much better name for "startFuture()" at the moment, but I'm at least going to realign the semantics of these methods.

This issue doesn't impact behavior here, since the future never actually raises an exception.


Instead, we can reproduce the behavior by making the future fail to start a subprocess immediately:

diff --git a/src/future/exec/ExecFuture.php b/src/future/exec/ExecFuture.php
index 0264d4d3..c0e0cee7 100644
--- a/src/future/exec/ExecFuture.php
+++ b/src/future/exec/ExecFuture.php
@@ -545,6 +545,15 @@ final class ExecFuture extends PhutilExecutableFuture {
    * @task internal
    */
   public function isReady() {
+    $this->recordResult(
+      array(
+        1,
+        '',
+        'fake failure',
+      ));
+
+    return true;
+
     // NOTE: We have a soft dependencies on PhutilErrorTrap here, to avoid
     // the need to build it into the Phage agent. Under normal circumstances,
     // this class are always available.
$ ./bin/phd debug task
Launching daemons (in debug mode):
(Logs will appear in "/Users/epriestley/dev/core//log/local/phd/daemons.log".)

    (Pool: 1) PhabricatorTaskmasterDaemon


    phabricator/scripts/daemon/ $ ./phd-daemon --verbose -l local

2020-07-23 8:33:02 AM [OVER] Started new daemon overseer (with PID "42883").
2020-07-23 8:33:02 AM [AUTO] Scaling pool "debug" up to 1 daemon(s).
2020-07-23 8:33:03 AM [INIT] Starting process.
[2020-07-23 08:33:03] EXCEPTION: (RuntimeException) proc_get_status() expects parameter 1 to be resource, null given at [<arcanist>/src/error/PhutilErrorHandler.php:263]
arcanist(head=master, ref.master=65cda1596f25, custom=1), corgi(head=master, ref.master=72dc914db4d4), instances(head=master, ref.master=790fd1d96b6d), libcore(), phabricator(head=master, ref.master=fcb75d0503c6, custom=10), services(head=master, ref.master=35f9c1919842)
  #0 PhutilErrorHandler::handleError(integer, string, string, integer, array)
  #1 proc_get_status(NULL) called at [<arcanist>/src/future/exec/ExecFuture.php:902]
  #2 ExecFuture::procGetStatus() called at [<arcanist>/src/future/exec/ExecFuture.php:94]
  #3 ExecFuture::getPID() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonHandle.php:351]
  #4 PhutilDaemonHandle::startDaemonProcess() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonHandle.php:152]
  #5 PhutilDaemonHandle::update() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonPool.php:190]
  #6 PhutilDaemonPool::updatePool() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemonOverseer.php:174]
  #7 PhutilDaemonOverseer::run() called at [<phabricator>/scripts/daemon/launch_daemon.php:24]

This exception can cause the daemons to stop, by propagating out of run() in the Overseer. This isn't great, but I think it's ultimately the correct behavior -- this is a completely unexpected condition, and we're better off shutting down than continuing. The error is also logged properly.

I believe the above changes resolved this, and it hasn't cropped up again.