The /core/data volume on secure001 filled, which paused MySQL.
It was filled by an unusual number of binlogs. We have binlogs set to GC, but MySQL generated approximately 100GB of them in a day or so. The GC config seems correct, the fill rate was just unusually high.
I remedied the issue by:
- Connecting to the replica on secure002.
- Making sure replication was caught up (with SHOW SLAVE STATUS) and verifying that the current binlog on the replica was a recent binlog.
- Removing most of the older binlogs with rm.
- Restarting mysql with sudo service mysql restart.
- (The replica was being a little slow to resume (via SHOW SLAVE STATUS) so I also did a STOP SLAVE; START SLAVE; on the replica.)
Using mysqlbinlog <log>, it looks like these logs are mostly updates from the rebuild-identities script. My guess is that it's running in a loop and that I have a bug in PhabricatorQueryIterator.