See PHI1566. Just noting this for future tooling considerations: AWS may take several hours to terminate an instance.
In PHI1566, repo023 is inconsistent/degraded (AWS claims it's okay and it's doing some stuff, but not enough stuff to be a functional repository shard). I issued a "Reboot" which hung for several minutes (which I've seen before), then issued a "Terminate" which has hung for about 8 minutes so far (which I haven't seen before).
This documentation suggests that this is routine enough to document, at least:
If your instance remains in the shutting-down state for several hours, Amazon EC2 treats it as a stuck instance and forcibly terminates it.
If it appears that your instance is stuck terminating and it has been longer than several hours, post a request for help to the Amazon EC2 forum. To help expedite a resolution, include the instance ID and describe the steps that you've already taken.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesShuttingDown.html
Some notes:
- The host uptime was 983 days. Possibly, we should just assume AWS hosts decay over time and need to be cycled every X days (maybe once a month). We've been seeing more issues in this general vein recently with high-uptime hosts; most of our hosts are high-uptime so this might just be AWS getting worse over time, but high-uptime hosts are likely unusual.
- For automation, a strategy of "launch a new instance, swap over, terminate the old instance" is probably desirable anyway, but particularly given that termination is bounded above at multiple hours.