Change Details

See PHI1315, where an instance restarted without any apparent notification. This happened once before previously, although I'm not immediately able to dig up the issue. Some user on Reddit suggests this is routine, and/or fabricates an imaginary story about it being routine: https://www.reddit.com/r/aws/comments/ax944r/is_it_possible_that_amazon_itself_rebooted_my_ec2/ehsgb7y/ ...although other users in that thread suggest this isn't expected: > Whenever aws schedules an ec2 for migration/reboot due to underlying hardware/software issues you will receive usually an email with the notice and also you will have a notification in the console. I checked these places for any kind of "host is restarting because X" information: - AWS email. - AWS notifications bell in the menu. - CloudTrail logs. - `/var/log/syslog` - `/var/log/dmesg.*` - `last reboot` - `/var/log/boot.log` No luck finding any actual cause for the restart. --- Prior to these recent cases, all restarts have been scheduled/announced. However, now that this has happened more than once, it seems like something we should handle better. The major issues are: - After restart, volumes don't automatically remount. - After restart, services don't automatically start. These can be remedied by running `bin/host upgrade` on restart. However, this //also// upgrades application libraries (e.g., Phabricator), which we //don't// want. The shortest path to a reasonable fix is probably: - Add a flag like `bin/host upgrade --keep-deployed-version` to disable the `git pull` steps. - Run this on host startup. - Kick some hosts to make sure it works. Ideally, this would also include a flag like `--and-complain` to generate a notification/alert.

See PHI1315PHI1315, where an instance restarted without any apparent notification. This happened once before previously, although I'm not immediately able to dig up the issue. Some user on Reddit suggests this is routine, and/or fabricates an imaginary story about it being routine: https://www.reddit.com/r/aws/comments/ax944r/is_it_possible_that_amazon_itself_rebooted_my_ec2/ehsgb7y/ ...although other users in that thread suggest this isn't expected: > Whenever aws schedules an ec2 for migration/reboot due to underlying hardware/software issues you will receive usually an email with the notice and also you will have a notification in the console. I checked these places for any kind of "host is restarting because X" information: - AWS email. - AWS notifications bell in the menu. - CloudTrail logs. - `/var/log/syslog` - `/var/log/dmesg.*` - `last reboot` - `/var/log/boot.log` No luck finding any actual cause for the restart. --- Prior to these recent cases, all restarts have been scheduled/announced. However, now that this has happened more than once, it seems like something we should handle better. The major issues are: - After restart, volumes don't automatically remount. - After restart, services don't automatically start. These can be remedied by running `bin/host upgrade` on restart. However, this //also// upgrades application libraries (e.g., Phabricator), which we //don't// want. The shortest path to a reasonable fix is probably: - Add a flag like `bin/host upgrade --keep-deployed-version` to disable the `git pull` steps. - Run this on host startup. - Kick some hosts to make sure it works. Ideally, this would also include a flag like `--and-complain` to generate a notification/alert.