Page MenuHomePhabricator

`phd` can't handle PID properly when there are multiple phabricator daemons.
Closed, WontfixPublic

Description

Summary

I'm running multiple Phabricator daemons on two independent servers (one is Ubuntu 14.04, and other is 16.04) more than a year.
For updating Phabricator daily, I have been run a script which stops them all (not by --force), update Phabricator and restart it. This script had been worked well for a year.

However, recently (I don't know when the problem starts to appear, because I reboot the servers frequently in this month...), phd daemon does not shutdown properly like

$ ./bin/phd status
Log Daemon       Host  Overseer Started                 Class                                Arguments
679 8006:2jacsm3 RND02 8006     Apr 12 2017, 4:53:39 AM PhabricatorTaskmasterDaemon
678 8006:akgm6dl RND02 8006     Apr 12 2017, 4:53:39 AM PhabricatorTriggerDaemon
677 8006:3pbxj62 RND02 8006     Apr 12 2017, 4:53:39 AM PhabricatorRepositoryPullLocalDaemon

$ ./bin/phd stop
There are processes running that look like Phabricator daemons but have no corresponding PID files:

7909 php ./phd-daemon
8006 php ./phd-daemon

When I tried only one Phabricator, there is no problem ( I can't reproduce it).
But, When there are more than two running Phabricator services, phd says there are no corresponding running Phabricator daemon, even bin/phd status response correct PID.

I've checked for each pid directory, and I confirm the PID file exist and those daemon.*** files describe proper PID.
But, when I give stop signal to phd, they say they have no corresponding PID files.

Of course, there were same messages like There are processes ... no corresponding PID files: when there are multiple Phabricator daemons, but phd also could find corresponding process and stop it which is described in /var/tmp/phd*/pid/. However, now it couldn't do it properly.

Reproducing Steps:

Repeat the script again and again

  • Stops each Phabricator daemon
  • (update Phabricators)
  • Starts each Phabricator daemon

Expected results:

  • Every times, each daemon stops and start again.
  • No duplicated daemons for each environment/configuration.

Actual result:

  • Even bin/phd status for each environment says they have the corresponding daemon, bin/phd stop does not stop it.
  • There are more than one daemons for each environment.
  • When there are more than two or three daemons (mostly two) for each environment, bin/phd stop stops a daemon sometimes, but not every duplicated daemons (I'm not sure the condition...)

Configurations/environmets

Phabricator version

phabricator cd7547dc5760bd0fde42f38118dcb9af3ddc17a0 (Wed, Apr 12)
arcanist a59cfca5f190c44403dfc7449c678a2aa1626bb4 (Wed, Apr 5)
phutil fb9e0642c4ea9065e68d2cd2b250c0fa71190e7b (Tue, Apr 11)

Phabricator installed paths

Have independent/specified pid-directory and log-directory for each environment

  • /opt/phabA/phabricator/
    • /var/tmp/phdA/pid
    • /var/tmp/phdA/log
  • /opt/phabB/phabricator/
    • /var/tmp/phdB/pid
    • /var/tmp/phdB/log

same for aphlict, but aphlict has no problem.

The script

echo "Stops Phabricators"

su userA -c "cd /opt/phabA/phabricator; ./bin/phd status;./bin/phd stop;./bin/aphlict stop"
su userB -c "cd /opt/phabB/phabricator; ./bin/phd status;./bin/phd stop;./bin/aphlict stop"

sleep 4

echo "Update Phabricators"

# for each
# git pull ...
# bin/storage upgrade --force

echo "Restart Phabricators"

su userA -c "cd /opt/phabA/phabricator; ./bin/phd start;./bin/aphlict start"
# sleep for DB connection issue
sleep 10
su userB -c "cd /opt/phabB/phabricator; ./bin/phd start;./bin/aphlict start"

Because I had a memory issue for the server, I periodically counted every process, and I sure that there was only exact number of Phabricator daemon instances for a year.

More logs

First time

$ sudo bash reproduce.sh
There are no running Phabricator daemons.
There are no running Phabricator daemons.
Reading configuration from: phabricator/conf/aphlict/aphlict.default.json
Stopping Aphlict Server (8982)...
Aphlict Server (8982) exited normally.
There are no running Phabricator daemons.
There are no running Phabricator daemons.
Reading configuration from: phabricator/conf/aphlict/aphlict.custom.json
Stopping Aphlict Server (8481)...
Aphlict Server (8481) exited normally.
Storage is up to date. Use "storage status" for details.
Synchronizing static tables...
Verifying database schemata on "localhost:3306"...
Found no adjustments for schemata.
There are no running Phabricator daemons.
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/var/tmp/phdA/log/daemons.log".)

    (Pool: 1) PhabricatorRepositoryPullLocalDaemon
    (Pool: 1) PhabricatorTriggerDaemon
    (Pool: 4) PhabricatorTaskmasterDaemon

Done.
Reading configuration from: phabricator/conf/aphlict/aphlict.default.json
Aphlict is not running.
Writing logs to: /var/log/aphlictA.log
Aphlict Server started.
Storage is up to date. Use "storage status" for details.
Synchronizing static tables...
Verifying database schemata on "databasehost"...
Found no adjustments for schemata.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9626 php ./exec_daemon.php PhabricatorTaskmasterDaemon


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/var/tmp/phdB/log/daemons.log".)

    (Pool: 1) PhabricatorRepositoryPullLocalDaemon
    (Pool: 1) PhabricatorTriggerDaemon
    (Pool: 4) PhabricatorTaskmasterDaemon

Done.
Reading configuration from: phabricator/conf/aphlict/aphlict.custom.json
Aphlict is not running.
Writing logs to: /var/log/aphlictB.log
Aphlict Server started.

Second time

$ sudo bash reproduce.sh
Log Daemon       Host  Overseer Started                 Class                                Arguments
997 9619:mm4c5h3 RND02 9619     Apr 12 2017, 5:45:20 AM PhabricatorTaskmasterDaemon
996 9619:funyzsh RND02 9619     Apr 12 2017, 5:45:20 AM PhabricatorTriggerDaemon
995 9619:csufoho RND02 9619     Apr 12 2017, 5:45:20 AM PhabricatorRepositoryPullLocalDaemon
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
Reading configuration from: phabricator/conf/aphlict/aphlict.default.json
Stopping Aphlict Server (9640)...
Aphlict Server (9640) exited normally.
Log Daemon       Host  Overseer Started                 Class                                Arguments
688 9716:f5req5t RND02 9716     Apr 12 2017, 5:45:31 AM PhabricatorTaskmasterDaemon
687 9716:rfrbgrx RND02 9716     Apr 12 2017, 5:45:31 AM PhabricatorTriggerDaemon
686 9716:kj6ny5f RND02 9716     Apr 12 2017, 5:45:31 AM PhabricatorRepositoryPullLocalDaemon
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
Reading configuration from: phabricator/conf/aphlict/aphlict.custom.json
Stopping Aphlict Server (9737)...
Aphlict Server (9737) exited normally.
Storage is up to date. Use "storage status" for details.
Synchronizing static tables...
Verifying database schemata on "localhost:3306"...
Found no adjustments for schemata.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/var/tmp/phdA/log/daemons.log".)

    (Pool: 1) PhabricatorRepositoryPullLocalDaemon
    (Pool: 1) PhabricatorTriggerDaemon
    (Pool: 4) PhabricatorTaskmasterDaemon

Done.
Reading configuration from: phabricator/conf/aphlict/aphlict.default.json
Aphlict is not running.
Writing logs to: /var/log/aphlictA.log
Aphlict Server started.
Storage is up to date. Use "storage status" for details.
Synchronizing static tables...
Verifying database schemata on "databasehost"...
Found no adjustments for schemata.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon
9843 php ./phd-daemon
9850 php ./exec_daemon.php PhabricatorTaskmasterDaemon


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/var/tmp/phdB/log/daemons.log".)

    (Pool: 1) PhabricatorRepositoryPullLocalDaemon
    (Pool: 1) PhabricatorTriggerDaemon
    (Pool: 4) PhabricatorTaskmasterDaemon

Done.
Reading configuration from: phabricator/conf/aphlict/aphlict.custom.json
Aphlict is not running.
Writing logs to: /var/log/aphlictB.log

Third time

$ sudo bash reproduce.sh
Log  Daemon       Host      Overseer Started                 Class                                Arguments
998  9843:m7qiaxl localhost 9843     Apr 12 2017, 5:46:19 AM PhabricatorRepositoryPullLocalDaemon
1000 9843:7i2c3wx RND02     9843     Apr 12 2017, 5:46:20 AM PhabricatorTaskmasterDaemon
999  9843:evldzyb RND02     9843     Apr 12 2017, 5:46:20 AM PhabricatorTriggerDaemon
997  9619:mm4c5h3 RND02     9619     Apr 12 2017, 5:45:20 AM PhabricatorTaskmasterDaemon
996  9619:funyzsh RND02     9619     Apr 12 2017, 5:45:20 AM PhabricatorTriggerDaemon
995  9619:csufoho RND02     9619     Apr 12 2017, 5:45:20 AM PhabricatorRepositoryPullLocalDaemon
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon
9843 php ./phd-daemon
9940 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
Reading configuration from: phabricator/conf/aphlict/aphlict.default.json
Stopping Aphlict Server (9864)...
Aphlict Server (9864) exited normally.
Log Daemon       Host  Overseer Started                 Class                                Arguments
691 9940:um3qovz RND02 9940     Apr 12 2017, 5:46:30 AM PhabricatorTaskmasterDaemon
690 9940:ddskuvn RND02 9940     Apr 12 2017, 5:46:30 AM PhabricatorTriggerDaemon
689 9940:f6cwbq7 RND02 9940     Apr 12 2017, 5:46:30 AM PhabricatorRepositoryPullLocalDaemon
688 9716:f5req5t RND02 9716     Apr 12 2017, 5:45:31 AM PhabricatorTaskmasterDaemon
687 9716:rfrbgrx RND02 9716     Apr 12 2017, 5:45:31 AM PhabricatorTriggerDaemon
686 9716:kj6ny5f RND02 9716     Apr 12 2017, 5:45:31 AM PhabricatorRepositoryPullLocalDaemon
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon
9843 php ./phd-daemon
9940 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
Reading configuration from: phabricator/conf/aphlict/aphlict.custom.json
Stopping Aphlict Server (9961)...
Aphlict Server (9961) exited normally.
Storage is up to date. Use "storage status" for details.
Synchronizing static tables...
Verifying database schemata on "localhost:3306"...
Found no adjustments for schemata.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon
9843 php ./phd-daemon
9940 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/var/tmp/phdA/log/daemons.log".)

    (Pool: 1) PhabricatorRepositoryPullLocalDaemon
    (Pool: 1) PhabricatorTriggerDaemon
    (Pool: 4) PhabricatorTaskmasterDaemon

Done.
Reading configuration from: phabricator/conf/aphlict/aphlict.default.json
Aphlict is not running.
Writing logs to: /var/log/aphlictA.log
Aphlict Server started.
Storage is up to date. Use "storage status" for details.
Synchronizing static tables...
Verifying database schemata on "databasehost"...
Found no adjustments for schemata.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

9619 php ./phd-daemon
9716 php ./phd-daemon
9843 php ./phd-daemon
9940 php ./phd-daemon
10360 php ./phd-daemon
10367 php ./exec_daemon.php PhabricatorTaskmasterDaemon


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Launching daemons:
(Logs will appear in "/var/tmp/phdB/log/daemons.log".)

    (Pool: 1) PhabricatorRepositoryPullLocalDaemon
    (Pool: 1) PhabricatorTriggerDaemon
    (Pool: 4) PhabricatorTaskmasterDaemon

Done.
Reading configuration from: phabricator/conf/aphlict/aphlict.custom.json
Aphlict is not running.
Writing logs to: /var/log/aphlictB.log
Aphlict Server started.

P.S

  • I've tried changing bin/phd start to bin/phd restart, but the results were same. Also, I kill all daemons, cleans PID files for each daemon, and reboot, but I get same results.
  • I host multiple Phabricator because those two team should break apart and would use different office/servers later.

Event Timeline

I'v checked that

  • From 699ab153e375 (stable) Promote 2017 Week 14, I have no problem
  • From 944f7da48670 (stable) Correct two parameter strictness issues with file uploads , I could reproduce the problem.

Nevermind, this wasn't the conditions. It seems to be dependent another things.
But, sometimes, bin/phd stop works, even it wasn't work at a minute before.
Like: (2203 is not corresponding right now, but 2675 and 2753 is (duplicated).)

$ bin/phd stop
There are processes running that look like Phabricator daemons but have no corresponding PID files:

2203 php ./phd-daemon
2675 php ./phd-daemon
2753 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.

$ bin/phd stop
Interrupting process 2753...
Process 2753 exited.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

2203 php ./phd-daemon
2675 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.
$ bin/phd stop
There are processes running that look like Phabricator daemons but have no corresponding PID files:

2203 php ./phd-daemon
2675 php ./phd-daemon


Stop these processes by re-running this command with the --force parameter.

$ cat /var/tmp/phdB/pid/daemon.2675
{"pid":2675,"start":1491981968,"config":{"daemonize":true,"log":"\/var\/tmp\/phdB\/log\/daemons.log","piddir":"\/var\/tmp\/phdB\/pid","daemons":[{"class":"PhabricatorRepositoryPullLocalDaemon","label":"pull"},{"class":"PhabricatorTriggerDaemon","label":"trigger"},{"class":"PhabricatorTaskmasterDaemon","label":"task","pool":4,"reserve":0}]},"daemons":[]}
$ cat /var/tmp/phdA/pid/daemon.2203
{"pid":2203,"start":1491981713,"config":{"daemonize":true,"log":"\/var\/tmp\/phdA\/log\/daemons.log","piddir":"\/var\/tmp\/phdA\/pid","daemons":[{"class":"PhabricatorRepositoryPullLocalDaemon","label":"pull"},{"class":"PhabricatorTriggerDaemon","label":"trigger"},{"class":"PhabricatorTaskmasterDaemon","label":"task","pool":4,"reserve":0}]},"daemons":[]}
epriestley triaged this task as Wishlist priority.Apr 12 2017, 11:53 AM
epriestley added a project: Daemons.
epriestley added a subscriber: epriestley.

Running multiple different versions of Phabricator on a single host is not currently supported. We should probably handle this situation better than we do, and there is no technical reason we can't support this, but this use case is very rare.

epriestley claimed this task.

We've run instances in the Phacility cluster for a long time now, but this is generally not something we really support or plan to support since there's no real customer interest in instancing Phabricator.