Oct 26 2022
I patched and partially deployed this in early August. Another unattended MySQL upgrade went out on Monday night, also didn't restart MySQL on affected hosts, and caused some downtime on hosts that didn't have the patch (to "disable unattended upgrades"). I've now deployed this everywhere, and am presuming this is fixed until evidence arises to the contrary.
Oct 25 2022
Jul 29 2022
Apr 20 2022
There's nothing particularly useful or actionable here now, so closing it out. (I believe this was the most severe incident Phacility ever experienced while actively maintained.)
This hasn't caused any more problems in like 4 years, so I guess it's kind of whatever.
This isn't really resolved, but almost certainly does not make sense to pursue given the Phacility wind-down.
Almost every host currently in production was provisioned with Piledriver and things have been stable for quite a while, so I'm calling this resolved. See elsewhere for issues with Ubuntu20, mail, etc.
Moved the rest of this to T13640.
Apr 19 2022
I deployed this and it seems to be working properly.
Hey, it worked once. Good enough for me!
No dice. We need bin/upgrade to run before mysql because it has to mount the data volume. So now I'm trying this:
... service ... start rather than service ... restart ...
Apr 1 2022
This has some rough edges that I'm not going to deal with for now:
Dec 19 2021
See T12847. All the technical parts of this are now solved except for billing, but since Phacility is winding down I no longer plan to pursue it.
I resolved this in rCORE320b2854.
Only one instance was impacted by this and I just credited them until 2099. I don't currently expect to pursue this.
I no longer expect to pursue this.
- Hosts in the repo class are now build by Piledriver (see T13630), which automatically creates the rbak device entries, so this error isn't likely to occur again.
- I also don't expect to launch any more hosts.
I compacted secure onto new hardware (T13671) and shut down saux001 ("Land Revision") and sbuild001 (Harbormaster remote builds). I think all the remaining work is covered under T13630 (largely, just a handful of large database migrations remain).
I just swapped configs over without merging the LBs, since it wasn't immediately obvious to me what the Application vs Classic state of the world is and swapping was good enough.
The aphlict/notify stuff still needs to be tweaked. I think the snlb + slb setup can be merged into a single slb with "TCP (Secure)" forwarding now.
Databases are moved and secure is out of read-only mode. I'm going to adjust repository configuration, then I should be able to tear down secure001.
I'm going to put secure back into read-only mode now and move the databases to the new host.
I brought up the new host and pointed the slb001 load balancer at it. The database is still on the old host, and the new host doesn't have repositories yet, but the basics seem to be working.
Dec 18 2021
Merging 003 into 001 worked fine with a few expected tricks (e.g., when secure is in read-only mode, you can't push a change to take it out of read-only mode, since pushing is a write). Next up is launching a modern m4.large secure-pool host and then migrating the data.
I'm putting secure into read-only mode now, with the intent of completing steps 1-5 above.
Dec 17 2021
Dec 16 2021
It would still be nice to have this from a completeness/correctness perspective, but other changes have made it less valuable:
Dec 11 2021
I put all the database migration stuff everywhere and it appears stable. I'm hooking up Postmark as an outbound pathway now. If I get that working, I'll let it sit for a while and start migrating databases.
Dec 10 2021
Finally, there are other some MySQL version issues which can be avoided with:
Dec 9 2021
The new core/ support for the API is partially deployed; the new services/ support isn't anywhere yet.
Dec 8 2021
Dec 4 2021
The latest version of Phabricator itself is everywhere.
I'm going to hold it until the weekend and try deploying then if things look calm on my end.