This depends on all the previous Drydock diffs, so when those are reviewed / landed I'll split this one up into individual diffs.
- rP Phabricator
Lint Passed Severity Location Code Message Advice src/applications/drydock/blueprint/DrydockAmazonEC2HostBlueprintImplementation.php:480 XHP16 TODO Comment Advice src/applications/harbormaster/step/HarbormasterLeaseHostBuildStepImplementation.php:43 XHP16 TODO Comment
No Test Coverage
- Build Status
Buildable 3956 Build 3969: [Placeholder Plan] Wait for 30 Seconds
Notes for myself.
Due to eventual consistency, we can allocate an elastic IP successfully but then have to wait until it replicates to other machines serving up the AWS API.
When the target is a Windows machine, ssh can sometimes successfully connect during the initial EC2 setup and wait forever, which would leave the resource in a pending status. This ensures that each check can only be at most 60 seconds so that it will eventually connect successfully.
This ensures we don't crash during resource cleanup if the resource no longer exists. We should probably do this for TerminateInstances, but I haven't seen that throw an exception if the instance has already been terminated.
mkdir fails on Windows if the directory already exists. More so, this command may fail due to Windows being unreliable, so attempt up to 10 times while we wait for things to stabilize (this only appears to be an issue on newly started instances on EC2).
|32–45 ↗||(On Diff #25197)|
I need to revert this; $minimum_lease_resource_id will always be the current resource ID if there's no resource with fewer leases.
I potentially need to revert this. I have had situations where EC2 instances haven't allocated properly and I've wanted to clean them up (by closing them), but there's no way in the interface to close pending or allocating resources.
Allows bin/drydock ssh to be used with pending or allocating resources.
|38 ↗||(On Diff #25197)|
Provides higher detail in the Drydock log (useful when a lot of logs are occurring within a short timespan).
This prevents the Drydock worker from picking up non-pending leases. Previously it could attempt to pick up a Drydock allocation task again after the daemons have been restarted, even if the first allocation was half way through allocating (such that it had allocated an EC2 instance). Instead of losing that partial state, just skip leases that have already been partially worked on.
We need to allow leasing against allocating resources so that we don't get broken resources if the entire resource pool is currently used up by allocating resources.