After an AWS event, Phacility hosts may come up with swap only partially configured
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Apr 30 2021, 4:43 PM

Description

See PHI2063. See PHI2062. See PHI2089.

I failed to manually reboot an instance in response to a scheduled AWS event. I normally do this during weekly deploy windows, but didn't deploy in the event window and forgot that I'd received an event notification. This is a problem on its own, but isn't fundamentally a technical problem.

In theory, AWS could also apply this kind of reboot without a schedule notification, so even if there was no operator error this could still have caused issues.

Since I didn't manually do the reboot, AWS rebooted the instance automatically. Technically, there were two instances with events at similar times. It seems like the initial reboot took much longer than a manual reboot does, which caused PHI2062. Both instances came back up without swap, which later caused PHI2063.

The code that sets up swap just tests for the existence of /mnt/swap and assumes swap is properly configured if it exists:

$swapfile = '/mnt/swap';

if (Filesystem::pathExists($swapfile)) {
  return;
}

Normally, this test appears to produce the correct result. These instances came up into a state where /mnt/swap existed but swap was not configured.

This test should be more surgical and examine swap state -- possibly by parsing swapon -a or /proc/swaps.

It's less clear why the initial reboot took so long ("about an hour", from PHI2062) and I'm not sure this can be reproduced from the AWS console.

Revisions and Commits

		Restricted Diffusion Commit
rARC Arcanist
	D21733	rARC59b273fd15d3 (stable) Promote 2021 Week 49
	D21733	rARCc23222438b30 (stable) Provide an API for parsing swap information from "/proc/meminfo"
	D21733	rARCc53bb21bbd3e Provide an API for parsing swap information from "/proc/meminfo"

Event Timeline

epriestley triaged this task as Low priority.Apr 30 2021, 4:43 PM

epriestley created this task.

epriestley updated the task description. (Show Details)Apr 30 2021, 4:50 PM

epriestley updated the task description. (Show Details)Jun 1 2021, 5:52 AM

epriestley added a revision: D21733: Provide an API for parsing swap information from "/proc/meminfo".Nov 22 2021, 1:30 PM

epriestley added a commit: rARCc53bb21bbd3e: Provide an API for parsing swap information from "/proc/meminfo".Nov 22 2021, 1:45 PM

epriestley added a commit: rARCc23222438b30: (stable) Provide an API for parsing swap information from "/proc/meminfo".

epriestley added a commit: Restricted Diffusion Commit.

This appears resolved: the workflow now tests that /proc/meminfo reports an appropriate value for TotalSwap.

Losing swap on restart is apparently normal behavior so this may have always happened: swapon is temporary, and to permanently configure swap you are expected to use a text editor to edit /etc/fstab. It is truly a wonder that computers work at all.

epriestley added a commit: rARC59b273fd15d3: (stable) Promote 2021 Week 49.Dec 1 2021, 9:21 PM

After an AWS event, Phacility hosts may come up with swap only partially configuredClosed, ResolvedPublicActions

Description

Revisions and Commits

Event Timeline

After an AWS event, Phacility hosts may come up with swap only partially configured
Closed, ResolvedPublic
Actions