Page MenuHomePhabricator

Stale daemons
Closed, ResolvedPublic

Description

When running ./bin/phd restart command I always get following output:

Interrupting daemon 'PhabricatorRepositoryPullLocalDaemon' (18696)...
Interrupting daemon 'PhabricatorGarbageCollectorDaemon' (18701)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (18741)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (18759)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (18781)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (18800)...
Daemon 18759 exited.
Daemon 18781 exited.
Daemon 18696 exited.
Daemon 18701 exited.
Daemon 18741 exited.
Daemon 18800 exited.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

0 4417 php /home/sites/phabricator/web/phabricator/scripts/daemon/phd-daemon PhabricatorTaskmasterDaemon --daemonize --log=/var/tmp/phd/log/daemons.log --phd=/var/tmp/phd/pid
18819 php /home/sites/phabricator/web/libphutil/scripts/daemon/exec/exec_daemon.php PhabricatorTaskmasterDaemon --load-phutil-library=/home/sites/phabricator/web/arcanist/src --load-phutil-library=/home/sites/phabricator/web/phabricator/src --log=/var/tmp/phd/log/daemons.log --


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Preparing to launch daemons.
NOTE: Logs will appear in '/var/tmp/phd/log/daemons.log'.

Launching daemon "PhabricatorRepositoryPullLocalDaemon".
Launching daemon "PhabricatorGarbageCollectorDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Done.

Note the daemon that reported as out of sync with PID files. Then run ./bin/phd restart --force as advised but that doesn't help. Next time I run ./bin/phd restart I still get same notice, but process id of daemon is different.

I'm only getting this on one of 2 Phabricator installed. I have Manifest disabled, maybe this is somehow related.

Event Timeline

aik099 raised the priority of this task from to Needs Triage.
aik099 updated the task description. (Show Details)
aik099 added a project: Diffusion.
aik099 added a subscriber: aik099.

Nope. Just reported what I saw. Since I only had this problem on 1 of 2 Phabricator installs I thought this might be something to do with configuration.

Okay thanks. There was a bug that I think rP2fdd7f0f3d2e316a15bf7a903b7c963b04290af8. Once you update let us know if this persists. :) Closing for now in optimistic spirits.

Thanks for the report!

It got worse in fact. Now I have 2 of daemons ported as hanging.

You updated libphutil when you updated, yes? rPHUae43ce54f61f200028b44f2400ca68008e297f60 is needed too.

Yes. I'm using update_phabricator.sh script from documentation.

can you give me the latest output from running ./bin/phd restart --force and ./bin/phd restart ?

Yes. No change.

I guess this might have something to do, that Manifest application is turned off in Phabricator installation in question. Since only Manifest daemons are reported in that error message.

It would be really helpful to see the actual output from running those two commands back to back.

can you give me the latest output from running ./bin/phd restart --force and ./bin/phd restart ?

Sorry. I misinterpreted your reply. I thought you asked if I've run these commands.

Here is the output of ./bin/phd restart --force:

Daemon 'Rogue overseer' has no PID!
Interrupting daemon 'Rogue daemon' (28924)...
Interrupting daemon 'Rogue daemon' (28925)...
Daemon 28924 exited.
Daemon 28925 exited.
Freeing active task leases...
Freed 0 task lease(s).
Preparing to launch daemons.
NOTE: Logs will appear in '/var/tmp/phd/log/daemons.log'.

Launching daemon "PhabricatorRepositoryPullLocalDaemon".
Launching daemon "PhabricatorGarbageCollectorDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Done.

Here is the output of ./bin/phd restart:

Interrupting daemon 'PhabricatorRepositoryPullLocalDaemon' (3061)...
Interrupting daemon 'PhabricatorGarbageCollectorDaemon' (3066)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3086)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3116)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3175)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3187)...
Daemon 3061 exited.
Daemon 3066 exited.
Daemon 3086 exited.
Daemon 3116 exited.
Daemon 3175 exited.
Daemon 3187 exited.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

0 4417 php /home/sites/phabricator/web/phabricator/scripts/daemon/phd-daemon PhabricatorTaskmasterDaemon --daemonize --log=/var/tmp/phd/log/daemons.log --phd=/var/tmp/phd/pid


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Preparing to launch daemons.
NOTE: Logs will appear in '/var/tmp/phd/log/daemons.log'.

Launching daemon "PhabricatorRepositoryPullLocalDaemon".
Launching daemon "PhabricatorGarbageCollectorDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Done.

@chad - did you hear of other reports or something?

@aik099 - so the actual process that is still running is the same

0 4417 php /home/sites/phabricator/web/phabricator/scripts/daemon/phd-daemon PhabricatorTaskmasterDaemon --daemonize --log=/var/tmp/phd/log/daemons.log --phd=/var/tmp/phd/pid

This appears in your *original* description as well as from re-running it.

Can you try a sudo ./bin/phd stop --force then restart the daemons normally? My current theory is that particular daemon can't be killed by your user account.

Ah sorry, it sounded like the issue was still open - didn't want to lose track.

Here is the output of sudo ./bin/phd stop --force:

Interrupting daemon 'PhabricatorRepositoryPullLocalDaemon' (3278)...
Interrupting daemon 'PhabricatorGarbageCollectorDaemon' (3283)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3315)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3347)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3364)...
Interrupting daemon 'PhabricatorTaskmasterDaemon' (3400)...
Daemon 3278 exited.
Daemon 3315 exited.
Daemon 3283 exited.
Daemon 3347 exited.
Daemon 3364 exited.
Daemon 3400 exited.
Daemon 'Rogue overseer' has no PID!

and then ./bin/phd restart gives this:

There are no running Phabricator daemons.
There are processes running that look like Phabricator daemons but have no corresponding PID files:

0 4417 php /home/sites/phabricator/web/phabricator/scripts/daemon/phd-daemon PhabricatorTaskmasterDaemon --daemonize --log=/var/tmp/phd/log/daemons.log --phd=/var/tmp/phd/pid


Stop these processes by re-running this command with the --force parameter.
Freeing active task leases...
Freed 0 task lease(s).
Preparing to launch daemons.
NOTE: Logs will appear in '/var/tmp/phd/log/daemons.log'.

Launching daemon "PhabricatorRepositoryPullLocalDaemon".
Launching daemon "PhabricatorGarbageCollectorDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Launching daemon "PhabricatorTaskmasterDaemon".
Done.

Looks like there is another player around: Supervisor Daemon. When Phabricator kills the daemons the Supervisor thinks that they are dead and reruns them manually. I guess that happens right before rouge daemon detection.

I'll check with our System Administrator to see if that's the case.

I've found the problem. It appears that there are 10 rouge daemons and one was detected at a time. I've killed all "php" processed executed by Phabricator using "sudo" and now problem is gone. Several restarts never cause rouge daemons to appear.

Ah okay, I found the code bug here. Patch in a little...

Thanks @aik099 for all the help. The code did have yet another bug in it, which I think D10386 resolves. Post D10386 someone in this situation should get a warning to try again with sudo, and then using sudo should actually work.

btrahan triaged this task as Normal priority.Aug 29 2014, 6:32 PM
btrahan added a project: Daemons.