In T10786, I mentioned seeing a phd-daemon crash on our QA box, which lead me to want to use process supervision. @epriestley expressed interest in getting more details. We just observed another crash, but this time on our production phab instance. I tried to grab the log info that I could, but unfortunately there doesn't appear to have been a stack trace from the actual crash.
The rough timeline appears to be:
- at 17:19 UTC, a bunch of No such daemon errors appear in /var/tmp/phd/log/daemons.log.
- at some point after this, phd-daemon disappears
- I have daemontools set to poll the output of phd status every 5 minutes
- at roughly 17:24 UTC, daemontools notices that there are no daemons running, and executes a phd start
I'm including all of daemons.log around the time window (
). I didn't find anything else that was relevant in other logs.Here is the version information of our install:
{ "phabricator":"8b56e0082bff4807ea1ea275741d5f145804e0ee", "libphutil":"76425eaa812572cc02487db79f2dd43d93e3085f", "arcanist":"1439aaa871837031faa1ef26b81f1fb08e4a41e7" }
Please let me know if there is any other data that you would like, or any more instrumentation that I can add.
Thanks!