MySQL may take several seconds after restart to begin listening on domain socket
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Oct 14 2017, 12:28 PM

Description

In remote deploy, we currently restart MySQL and then try to connect to it shortly afterward.

Sometimes, probably when MySQL has a large amount of data (in this instance, one affected host has 76GB of data), the socket may not be listening by the time the restart command exits, leading to this error:

[db010] [2017-10-14 12:19:13] EXCEPTION: (CommandException) Command failed with error #1!
[db010] COMMAND
[db010] echo 'DELETE FROM mysql.user WHERE User = "root" AND Host != "localhost"' | mysql -uroot
[db010] 
[db010] STDOUT
[db010] (empty)
[db010] 
[db010] STDERR
[db010] ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

We should probably make sure the socket is listening before continuing past the service mysqld restart.

In this case, the two affected hosts (db010 and db014) haven't been purged in a while and have some large test instances, so I expect I can just reduce the data size to something manageable with the current workflow fairly easily (bin/host destroy --instance-kinds test --instance-statuses suspended,disabled). I'm running the destruction workflows now.

Revisions and Commits

Restricted Differential Revision

Restricted Diffusion Commit

Event Timeline

epriestley created this task.Oct 14 2017, 12:28 PM

Herald added a subscriber: eadler. · View Herald TranscriptOct 14 2017, 12:28 PM

I expect I can just reduce the data size to something manageable with the current workflow fairly easily

This worked correctly.

epriestley added a revision: Restricted Differential Revision.Oct 23 2017, 6:03 PM

We hit one of more of these (db024) last week so D18725 should fix it.

epriestley added a commit: Restricted Diffusion Commit.Oct 23 2017, 8:09 PM

I pushed secure004 with that patch and it worked fine, although mysql came back up fast enough that we didn't have to wait. Since I don't have a way to actually trigger this condition, I'm going to assume this is resolved until we have evidence otherwise.

MySQL may take several seconds after restart to begin listening on domain socketClosed, ResolvedPublicActions

Description

Revisions and Commits

Event Timeline

MySQL may take several seconds after restart to begin listening on domain socket
Closed, ResolvedPublic
Actions