Page MenuHomePhabricator

Ubuntu20 systemd restart script does not reliably execute on Ubuntu20/m4 chassis hosts
Closed, ResolvedPublic

Description

See PHI2190, etc. When Phacility hosts on the m4.large chassis restart by surprise, they don't (or, at least, don't reliably) run bin/host upgrade --hold-versions and remount volumes.

The trigger system switched from "upstart" to "sysetmd" in rCORE75c788bd and I probably tested it at the time, but evidently missed something.

I'm going to kick this host (secure-01a4c8f6) a bit and see how far I get in figuring out what's going on.

Revisions and Commits

Event Timeline

epriestley created this task.
epriestley added a commit: Restricted Diffusion Commit.Apr 19 2022, 4:43 PM
epriestley added a commit: Restricted Diffusion Commit.Apr 19 2022, 5:15 PM

...probably tested...

Uh... probably not.

It seems like systemd scripts need to be explicitly enabled (sudo systemctl enable phacility.service), and must (?) have an Install block with a WantedBy directive (or some other entanglement with the systemd graph of services, perhaps).

The --hold-versions upgrade process also doesn't work because it tries to service ... restart various services, and some services (at a minimum, mysql) have slam-protection on restarts built in. So this fails and the script aborts.

I've deployed a new version which entangles itself with systemd, and uses a new --no-service-restarts flag to service ... start rather than service ... restart. It also uses After to try to run after the other services start.

... service ... start rather than service ... restart ...

No good, service ... start is not idempotent. I changed things to just assume that After works; kicking things again.

epriestley added a commit: Restricted Diffusion Commit.Apr 19 2022, 5:27 PM
epriestley added a commit: Restricted Diffusion Commit.
epriestley added a commit: Restricted Diffusion Commit.Apr 19 2022, 5:41 PM

No dice. We need bin/upgrade to run before mysql because it has to mount the data volume. So now I'm trying this:

  • Keep the --no-service-restarts flag, but it only covers sshd now, which we reasonably want the system to control autostart for -- if a bad Phabricator version breaks ssh, it's a big mess to fix it.
  • Pull MySQL and Apache out of systemd-managed autostart. Start them after performing volume mounts instead.
  • Hope that service start X is smart enough to respect After when run from inside another service start ... context?

Kicking things again.

Hey, it worked once. Good enough for me!