Page MenuHomePhabricator

Various improvements and bug fixes to Drydock / Harbormaster
AbandonedPublic

Authored by hach-que on Sep 11 2014, 3:16 PM.
Tags
None
Referenced Files
F12838326: D10479.id25905.diff
Thu, Mar 28, 6:05 PM
Unknown Object (File)
Fri, Mar 22, 5:24 AM
Unknown Object (File)
Jan 30 2024, 4:50 AM
Unknown Object (File)
Jan 28 2024, 12:15 PM
Unknown Object (File)
Jan 28 2024, 12:04 AM
Unknown Object (File)
Jan 27 2024, 11:35 PM
Unknown Object (File)
Jan 27 2024, 10:49 PM
Unknown Object (File)
Jan 27 2024, 10:49 PM

Details

Reviewers
epriestley
Group Reviewers
Blessed Reviewers
Summary

This depends on all the previous Drydock diffs, so when those are reviewed / landed I'll split this one up into individual diffs.

Test Plan

Tested on a production system :)

Diff Detail

Event Timeline

hach-que retitled this revision from to Various improvements and bug fixes to Drydock / Harbormaster.
hach-que updated this object.
hach-que edited the test plan for this revision. (Show Details)

Notes for myself.

src/applications/drydock/blueprint/DrydockAmazonEC2HostBlueprintImplementation.php
274–296

Due to eventual consistency, we can allocate an elastic IP successfully but then have to wait until it replicates to other machines serving up the AWS API.

372–386

When the target is a Windows machine, ssh can sometimes successfully connect during the initial EC2 setup and wait forever, which would leave the resource in a pending status. This ensures that each check can only be at most 60 seconds so that it will eventually connect successfully.

463–484

This ensures we don't crash during resource cleanup if the resource no longer exists. We should probably do this for TerminateInstances, but I haven't seen that throw an exception if the instance has already been terminated.

538–556

mkdir fails on Windows if the directory already exists. More so, this command may fail due to Windows being unreliable, so attempt up to 10 times while we wait for things to stabilize (this only appears to be an issue on newly started instances on EC2).

src/applications/drydock/blueprint/DrydockMinMaxBlueprintImplementation.php
32–45 ↗(On Diff #25197)

I need to revert this; $minimum_lease_resource_id will always be the current resource ID if there's no resource with fewer leases.

src/applications/drydock/controller/DrydockResourceCloseController.php
26–28

I potentially need to revert this. I have had situations where EC2 instances haven't allocated properly and I've wanted to clean them up (by closing them), but there's no way in the interface to close pending or allocating resources.

src/applications/drydock/management/DrydockManagementSSHWorkflow.php
42–43

Allows bin/drydock ssh to be used with pending or allocating resources.

src/applications/drydock/view/DrydockLogListView.php
38

Provides higher detail in the Drydock log (useful when a lot of logs are occurring within a short timespan).

src/applications/drydock/worker/DrydockAllocatorWorker.php
42–47

This prevents the Drydock worker from picking up non-pending leases. Previously it could attempt to pick up a Drydock allocation task again after the daemons have been restarted, even if the first allocation was half way through allocating (such that it had allocated an EC2 instance). Instead of losing that partial state, just skip leases that have already been partially worked on.

130–135

We need to allow leasing against allocating resources so that we don't get broken resources if the entire resource pool is currently used up by allocating resources.

hach-que edited edge metadata.

More bug fixes

Remove changes to DrydockLogListView.php

epriestley added a reviewer: epriestley.

Depends on premature code, contains major errors (ex => $ex) fixed in later diffs, adds sleeps, etc.

This revision now requires changes to proceed.Aug 8 2015, 6:36 PM