Wed, Jan 30
Success! D20046 worked to fix the "profiler not sticking across form posts" issue on secure. 🐈
Mon, Jan 28
Yeah, I think the issue is:
The "keep the profiler on across form submissions" code isn't working on secure.phabricator.com, even though it works locally and __profile__=page appears on the "Request" tab.
Nov 3 2018
I think everything here is now fully cycled, synchronized, and cleaned up.
Taking care of these now. I expect everything to be pretty routine.
Oct 22 2018
Plus: db018.phacility.net, repo001.phacility.net, db024.phacility.net.
Oct 19 2018
One more of these just came in for repo003.
Oct 8 2018
I think this is all done but want to let things run against bastion007 for a bit before I tear down bastion005.
I also needed to copy the old master.key from bastion005 to bastion007 in /core/lib/keystore/.
I turned bastion.phacility.net and bastion-external.phacillity.net into CNAME records and pointed them at the new bastions.
There's a minor deadlock on bastion deployment with the current scripts: during deploy, we run deploy-key to copy the deploy key from the bastion to the target host during deployment, so that we don't need to put the entire keystore on normal cluster nodes, and so that we don't need to have the keystore on the control host (staff laptop) outside the cluster.
Oct 6 2018
I cycled all the hosts except bastion. saux001 needs to be vetted a bit (it handles "Land Revision" from the web UI) but it isn't critical if it needs a bit more work.
I 'm going to get these underway once the deploy finishes.
Oct 1 2018
"Use the API" seemed to work OK. Of those instances, only bastion005 is at all unusual.
Sep 11 2018
This is now live.
Deploying the changes to web now.
Sep 10 2018
I've issued all instances a 24-hour service credit for the disruption. This should be reflected on your next invoice.
Here's the request rate leading up to the rate limiting:
D19653 (above) changes the per-"Host" rate limit to require "X-Forwarded-For" be present in the request. This should exempt ELB requests from these limits.
... [Mon Sep 10 20:48:43.928021 2018] [:error] [pid 21570] [client 172.30.0.171:16516] Array\n(\n [f] => \n [h] => 172.30.0.60\n)\n ...
... in production today as a next step.
This should have the pleasant side effect of letting us drop the goofy hard-coded internal rate limiting IP list.
There are four rate limits, and I don't currently have enough information to figure out which one triggered. The rate limits are:
Sep 5 2018
Aug 25 2018
That one seemed straightforward.
Doing admin001 now.
Kicking secure001 now.
(It not being covered is covered by T12879.)
I think the only thing on secure or admin which isn't properly covered by deploy automation is the crontab on secure001:
I'm going to do admin001 and secure001 today.
Aug 18 2018
Think I got through the easy ones without any issues. I suspect admin and secure may be a little more involved so I'm going to leave the cat in the bag for the moment.
I'm going to stop/start at least some of these now.
Aug 15 2018
Oh, right, "meetings". I've heard of those!
Aug 13 2018
Hahah you beat me to it!