May 28 2020
Piledriver would also benefit from having some functional equivalent of destroying an Almanac resource. This can be implemented as a piledriver.destroyed property, but a formal disabled state would be cleaner. PHI1331 is vaguely related.
- When Piledriver destroys a resource pile, it's helpful if it can read the entire authoritative state from sources by using only a pile ID.
- EC2 can do this with "DescribeTags".
- Almanac currently can not. Almanac types should support searching by property value.
- This could be directly on almanac.*.search.
- Or this could be generic, via T12799.
May 26 2020
Effectively mooted by T13542.
I've written some Terraform-class tooling which can likely automate all the actual hardware allocations. This needs some more work, but I believe the tricky stuff (mostly: representing resources and allowing templating to reference resources which haven't been built yet) is at least working.
Subnet/NAT issues in T12816.
The major offender here (services per instance) was fixed by updating caching, and I destroyed all the old services. This is perhaps spiritually continued in T13542.
Continued in T13542. I wrote a Terraform/CloudFormation-style service in PHP over the last couple of days.
May 22 2020
Since many of these options probably don't have "right answers", I'm trying this reasonable-seeming variation on some repositories which seem like they'll benefit from a repack:
May 21 2020
See PHI1748. I ran a query against a subset of instances to determine how widespread usage of "Dark Mode" is, to help inform a decision to either implement the mode properly (see T12311) or remove the mode. The query was of this form:
May 18 2020
See T13537 for a subtle issue where digestWithNamedKey() keys were cached in APCu on the web tier. Importing instance data may require restarting the web tier until the import process can either dump these caches or version them (versioning may be easier).
So I'm going to turn web off and on again and see if that fixes things; my expectation is that it will.
The specific issue I'm trying to debug is fairly bizarre.
May 15 2020
May 13 2020
May 12 2020
The SSH username change wasn't sufficient because there's a hard-coded piece of logic to select the username by instance name:
May 8 2020
May 5 2020
There was a related AccountIdentifier issue with InstancesShadowUserQuery: we loaded shadow users based on accountID, but this is no longer consistently populated after T13493.
AccountIdentifiers do not sync during setup after T13493.
bin/services sync --instance X exits with no error if X does not exist.
May 1 2020
Apr 17 2020
Mar 2 2020
Does using --max-pack-size to reduce the maximum packfile size really let Git "checkpoint" after each packfile, so the process is effectively resumable?
PHI1655 identifies a specific case where enormous packfiles may create problems:
Feb 3 2020
Both of these hosts restarted cleanly.
AWS is also rebooting web007.
Jan 30 2020
Jan 21 2020
The logic here appears to be that gc.auto is set to some value (by default: 6,700). If the number of loose objects exceeds this threshold (technically, if the number of loose objects in objects/17/ is more than 1/256th of this value), it triggers a repack (in a comment, git repack -d -l).
See PHI1613, where an install hit this warning (and resolved it by running git prune):
Jan 15 2020
This went through cleanly.
Nov 26 2019
there is no way to bin/host query against the set of instances using a particular repository shard service