Page MenuHomePhabricator

Various improvements and bug fixes to Drydock / Harbormaster

Authored by hach-que on Sep 11 2014, 3:16 PM.


Group Reviewers
Blessed Reviewers

This depends on all the previous Drydock diffs, so when those are reviewed / landed I'll split this one up into individual diffs.

Test Plan

Tested on a production system :)

Diff Detail

Event Timeline

hach-que retitled this revision from to Various improvements and bug fixes to Drydock / Harbormaster.
hach-que updated this object.
hach-que edited the test plan for this revision. (Show Details)

Notes for myself.


Due to eventual consistency, we can allocate an elastic IP successfully but then have to wait until it replicates to other machines serving up the AWS API.


When the target is a Windows machine, ssh can sometimes successfully connect during the initial EC2 setup and wait forever, which would leave the resource in a pending status. This ensures that each check can only be at most 60 seconds so that it will eventually connect successfully.


This ensures we don't crash during resource cleanup if the resource no longer exists. We should probably do this for TerminateInstances, but I haven't seen that throw an exception if the instance has already been terminated.


mkdir fails on Windows if the directory already exists. More so, this command may fail due to Windows being unreliable, so attempt up to 10 times while we wait for things to stabilize (this only appears to be an issue on newly started instances on EC2).

32–45 ↗(On Diff #25197)

I need to revert this; $minimum_lease_resource_id will always be the current resource ID if there's no resource with fewer leases.


I potentially need to revert this. I have had situations where EC2 instances haven't allocated properly and I've wanted to clean them up (by closing them), but there's no way in the interface to close pending or allocating resources.


Allows bin/drydock ssh to be used with pending or allocating resources.

38 ↗(On Diff #25197)

Provides higher detail in the Drydock log (useful when a lot of logs are occurring within a short timespan).


This prevents the Drydock worker from picking up non-pending leases. Previously it could attempt to pick up a Drydock allocation task again after the daemons have been restarted, even if the first allocation was half way through allocating (such that it had allocated an EC2 instance). Instead of losing that partial state, just skip leases that have already been partially worked on.


We need to allow leasing against allocating resources so that we don't get broken resources if the entire resource pool is currently used up by allocating resources.

hach-que edited edge metadata.

More bug fixes

Remove changes to DrydockLogListView.php

epriestley added a reviewer: epriestley.

Depends on premature code, contains major errors (ex => $ex) fixed in later diffs, adds sleeps, etc.

This revision now requires changes to proceed.Aug 8 2015, 6:36 PM