See T12798. I think I'm ready to start breaking production. My overall plan is:
- Bring up repo025 (I'm just going to do this normally -- in the main subnet, with a public IP -- see T12816).
- Provision and deploy it, but don't open it for new instance allocation.
- Close db012 and repo012 to instance allocation so new instances can not be allocated there.
- Merge bin/instances move and the various improvements to bin/host restore, etc., to stable.
- Update code on repo012, repo025, and admin to pick up these changes.
- Use the new staff tools to forcibly allocate a new instance on db012 / repo012.
- Add a test repository and push some code to it.
- Use the new tools to move the instance from repo012 to repo025.
- Push more code, make sure everything works and moved properly and the writes go to repo025, not repo012.
- Move all suspended and disabled instances from repo012 to repo025.
Then, tomorrow during the normal deploy window:
- Deploy normally.
- Move all the remaining live instances.
- Force allocate a new instance on db012 / repo12 and put some repository data on it so we can try to recover the shard on June 19th after AWS helps us run an operational drill by creating an "abrupt" shard failure on the host.