(Of course, it'll probably just work the first time now...)
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Feb 13 2024
Feb 12 2024
The export process is already robust at a coarse level: the dump is retained on disk and the process can be retried at the "upload the whole file again" level, then picked up with bin/host export using the --database or --database-file flags (probably with --keep-file).
The (anonymized) error the process encountered occurred while transferring the dump to central storage was:
Nov 13 2023
See D21862.
Next issue: can't pull from secure.
Issue 3:
With bin/provision events working again:
Oct 26 2022
I patched and partially deployed this in early August. Another unattended MySQL upgrade went out on Monday night, also didn't restart MySQL on affected hosts, and caused some downtime on hosts that didn't have the patch (to "disable unattended upgrades"). I've now deployed this everywhere, and am presuming this is fixed until evidence arises to the contrary.
Oct 25 2022
Jul 29 2022
Apr 20 2022
There's nothing particularly useful or actionable here now, so closing it out. (I believe this was the most severe incident Phacility ever experienced while actively maintained.)
This hasn't caused any more problems in like 4 years, so I guess it's kind of whatever.
This isn't really resolved, but almost certainly does not make sense to pursue given the Phacility wind-down.
Almost every host currently in production was provisioned with Piledriver and things have been stable for quite a while, so I'm calling this resolved. See elsewhere for issues with Ubuntu20, mail, etc.
Moved the rest of this to T13640.
Apr 19 2022
I deployed this and it seems to be working properly.
Hey, it worked once. Good enough for me!
No dice. We need bin/upgrade to run before mysql because it has to mount the data volume. So now I'm trying this:
... service ... start rather than service ... restart ...
...probably tested...
Apr 1 2022
This has some rough edges that I'm not going to deal with for now:
Dec 19 2021
See T12847. All the technical parts of this are now solved except for billing, but since Phacility is winding down I no longer plan to pursue it.
I resolved this in rCORE320b2854.
After T13630:
Only one instance was impacted by this and I just credited them until 2099. I don't currently expect to pursue this.
I no longer expect to pursue this.
- Hosts in the repo class are now build by Piledriver (see T13630), which automatically creates the rbak device entries, so this error isn't likely to occur again.
- I also don't expect to launch any more hosts.
I just swapped configs over without merging the LBs, since it wasn't immediately obvious to me what the Application vs Classic state of the world is and swapping was good enough.
The aphlict/notify stuff still needs to be tweaked. I think the snlb + slb setup can be merged into a single slb with "TCP (Secure)" forwarding now.
Databases are moved and secure is out of read-only mode. I'm going to adjust repository configuration, then I should be able to tear down secure001.
I'm going to put secure back into read-only mode now and move the databases to the new host.
I brought up the new host and pointed the slb001 load balancer at it. The database is still on the old host, and the new host doesn't have repositories yet, but the basics seem to be working.
Dec 18 2021
Merging 003 into 001 worked fine with a few expected tricks (e.g., when secure is in read-only mode, you can't push a change to take it out of read-only mode, since pushing is a write). Next up is launching a modern m4.large secure-pool host and then migrating the data.
I'm putting secure into read-only mode now, with the intent of completing steps 1-5 above.
Dec 17 2021
Dec 16 2021
It would still be nice to have this from a completeness/correctness perspective, but other changes have made it less valuable:
Dec 11 2021
I put all the database migration stuff everywhere and it appears stable. I'm hooking up Postmark as an outbound pathway now. If I get that working, I'll let it sit for a while and start migrating databases.
Dec 10 2021
Finally, there are other some MySQL version issues which can be avoided with:
Dec 9 2021
The new core/ support for the API is partially deployed; the new services/ support isn't anywhere yet.
Dec 8 2021
Dec 4 2021
The latest version of Phabricator itself is everywhere.
I'm going to hold it until the weekend and try deploying then if things look calm on my end.
Dec 2 2021
I'm satisfied that we aren't violating our commitment to our customers by continuing to use Mailgun as a service provider...
Dec 1 2021
While waiting to deploy db stuff, I was planning to look at pruning dead data out of S3 -- but, on closer examination, the total S3 bill is something like $1/day, so no priority on that whatsoever.
Piledriver also needs to be able to provision database hosts, but these are more-or-less a trivial subset of repository hosts.
- Make InstancesStateQuery use a dictionary when building the database ref information internally.
Piledriver also needs to be able to provision database hosts, but these are more-or-less a trivial subset of repository hosts.
I completed all the repository migrations over the weekend and seemingly haven't run into any issues.
Nov 22 2021
This appears resolved: the workflow now tests that /proc/meminfo reports an appropriate value for TotalSwap.
Nov 21 2021
Just for completeness, vault used to be an HAProxy host serving as an SSH load balancer, but this responsibility moved to lb001 once ELBs became able to listen on inbound port 22 and TCP forward, so there is no longer a vault class of machines.
Nov 20 2021
The new provisioning process for repository shards is:
Nov 19 2021
Piledriver was built before the FutureGraph stuff settled in T11968; it runs into the same general set of sequencing problems and yield would likely be a good approach.
Nov 18 2021
I can't figure out how to delete...
I got rid of everything I could, and nothing appears to be affected.
We have a lot of leftover VPC cruft that I'm going to nuke, notably meta and admin VPCs that (as far as I can tell) have nothing in them, and then a bunch of subnets (meta.private-a, meta.private-b, block-public-222, admin.public-a, admin.public-b, meta.public-a, meta.public-b, block-private-3) and some NGWs etc. I'm like 99% sure this stuff is all leftover from testing years ago and nothing depends on it, but I guess we'll see what happens when I delete all of it.
Here's the last known state of the world from T12816:
Nov 17 2021
Closing this in favor of T13630, which covers the same ground.
Nov 15 2021
I'm planning to simply delete the Discourse forum without preserving any content.
Jul 21 2021
Jul 9 2021
For now, this has been working fine as a simple CLI flow.
Jun 1 2021
This is approximately working now, although the "button" is currently this mess:
See T13656 for followup.
Instances technically have a formal "Deleted" status -- but it isn't really used by anything, nothing ever puts them into that status, and there are no instances in that status. For consistency with existing CLI workflows, I'm going to rename this to "Destroyed".
A related issue is that I think nothing currently destroys S3 data. For most instances this isn't significant, but it isn't helping anything. This should likely be part of the database destruction step, although it can probably interact with the S3 bucket directly.