Ref T2015. Currently the 'Lease has been broken!' message is not very helpful when diagnosing when "Lease Host" targets fail. This adds more logging around why leases move to the broken status, and exposes this information easily.
Details
- Reviewers
hach-que - Group Reviewers
Blessed Reviewers - Maniphest Tasks
- T2015: Implement Drydock
Tested this by setting blueprints to have a maximum of 0 and attempting to lease.
Diff Detail
- Repository
- rP Phabricator
- Branch
- drydock-logging
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 2553 Build 2557: [Placeholder Plan] Wait for 30 Seconds
Event Timeline
I would greatly appreciate this feature.
Currently I have two blueprints, one working-copy and one host, both of which have one resource each (manually created using the drydock create-resource command), but still I get:
exception 'Exception' with message 'Lease has been broken!' in /opt/phabricator/phabricator/src/applications/drydock/storage/DrydockLease.php:171 Stack trace: #0 /opt/phabricator/phabricator/src/applications/drydock/storage/DrydockLease.php(197): DrydockLease::waitForLeases(Array) #1 /opt/phabricator/phabricator/src/applications/harbormaster/step/HarbormasterLeaseHostBuildStepImplementation.php(32): DrydockLease->waitUntilActive() #2 /opt/phabricator/phabricator/src/applications/harbormaster/worker/HarbormasterTargetWorker.php(52): HarbormasterLeaseHostBuildStepImplementation->execute(Object(HarbormasterBuild), Object(HarbormasterBuildTarget)) #3 /opt/phabricator/phabricator/src/infrastructure/daemon/workers/PhabricatorWorker.php(91): HarbormasterTargetWorker->doWork() #4 /opt/phabricator/phabricator/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php(158): PhabricatorWorker->executeTask() #5 /opt/phabricator/phabricator/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php(20): PhabricatorWorkerActiveTask->executeTask() #6 /opt/phabricator/libphutil/src/daemon/PhutilDaemon.php(179): PhabricatorTaskmasterDaemon->run() #7 /opt/phabricator/libphutil/scripts/daemon/exec/exec_daemon.php(119): PhutilDaemon->execute() #8 {main}
Drydock is currently not supported, and it's advised that you don't use it unless you know how it works. These diffs are probably still like 6 months away (if not more) from being reviewed.
The recommendation is to not use Harbormaster / Drydock (but what you use instead is entirely up to you).
By storing the reason for the lease breaking in a field, we can show the actual reason for the lease moving into this state inside Harbormaster. If we just used log() to put the broken reason in the Drydock logs, we wouldn't be able to show more useful information inside Harbormaster.
For example, the EC2 blueprint uses this to indicate when EC2 spot instances fail to lease because the spot price is too low. Just logging this information to Drydock and showing "Lease was broken" in Harbormaster doesn't give users actionable information to go on. By storing it in a separate field, we can show the actual "The spot price is too low to allocate a host in EC2" message in Harbormaster.
D14212 theoretically does this. I didn't add the "broken reason" stuff, but logs are a lot more structured now, so I think we could either just show recent logs or show recent logs of some subset of log types (e.g., "leasebrokenlogtype"). I also don't like shoving strings into the database if we can avoid it, since they aren't translatable (old logs weren't either, but new logs are as translatable as possible).
I think this is just a display logic issue for Harbormaster now, though.