Page MenuHomePhabricator

Restart daemons automagically
Closed, ResolvedPublic

Description

I've often wondered if we can restart the daemons automagically. Essentially, I was imaging running a very lightweight "watchdog" daemon which could detect configuration changes and then restart the other daemons accordingly.

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Daemons.
joshuaspence added a subscriber: joshuaspence.

I'm sure this question has been asked previously, but why can't the daemons read the updated configuration themselves? I imagine that the configuration would be cached, but surely it could be purged periodically?

@epriestley, I'm happy to work on this if you give me some pointers

This is more possible after the centralization of the overseer and the introduction of phd reload and the SIGHUP handling. Specifically, we could do this:

  • Make the overseer periodically emit some sort of timer event (e.g., once a minute).
  • Have an event listener in Phabricator query the database to look for config changes.
  • If it detects a config change, SIGHUP the process.

However, this implies a fairly long delay. Better would be:

  • Let the overseer connect to the notification server.
  • Push a notification.

This could also let the overseer react to "queue is no longer empty" events and send SIGUSR2 to daemons to wake them up.

This is somewhat more interesting to solve in the upstream now because the beahavior is worse in the Phacility cluster (users must go to the admin console to restart daemons, which isn't obvious). In general, though, I still think this is an enormous amount of work for a tiny amount of benefit.

One possible minor followup to maybe keep on the radar is showing some extra warnings in a few cases:

  • When editing the small amount of config which can't auto-restart (phd.user, phd.start-taskmasters). I think phd.user already has an existing warning anyway.
  • Some notification stuff requires restarts but this isn't really material to the phd stuff and we haven't seen issues with it.
  • We're now essentially training users that the daemons auto-restart, but they don't when you bin/config. So maybe that should just have a "restart the daemons" message.

We can wait for these to actually be real issues, though.

This also seems to work properly on this host, although I hit the "daemons are not running" setup warning once, presumably by racing their restart. If this is an issue we can tweak how that warning works, I think.

  • When editing the small amount of config which can't auto-restart (phd.user, phd.start-taskmasters). I think phd.user already has an existing warning anyway.

Yeah, I would like to handle this properly but I'm not sure if it is worthwhile doing so?

  • Some notification stuff requires restarts but this isn't really material to the phd stuff and we haven't seen issues with it.

Did you mean that some config changes requires Aphlict to be restarted?

  • We're now essentially training users that the daemons auto-restart, but they don't when you bin/config. So maybe that should just have a "restart the daemons" message.

I think that this is worthwhile fixing but I think that doing so involves implementing some form of config.hash or config.id. In the general case we can't really just tell users to run ./bin/phd reload because there could be daemons running on multiple hosts.

This also seems to work properly on this host, although I hit the "daemons are not running" setup warning once, presumably by racing their restart. If this is an issue we can tweak how that warning works, I think.

Yeah, I've hit the setup warning a couple of times myself.