Page MenuHomePhabricator

Allow for better monitoring of Drydock Blueprints
Closed, DuplicatePublic

Description

Use Case: I have gotten to a point when everything is configured properly™ in drydock and failed leases are a rare occurrence. However every once in a while someone will create a build plan that destroys the state of a working copy in such a way that all following leases fail. In this case I noticed when I was looking through the harbormaster buildables and saw a high number of failures all failing on leasing a working copy. I was able to ssh in and fix the issue however some kind of notification system would be nice so I could catch issues like this sooner.

Possibly a forced email would be a sufficient way to alert people of drydock failures?

Event Timeline

T8153 possibly also related (detect someone broke the resource and destroy it so it gets recreated).

Is that plausible in the scenarios you've experienced, or is the brokenness something Drydock can't reasonably detect (e.g., indistinguishable from a test failure)?

Didn't see that ticket but just read over this and that would be what I would need. I think I am also seeing a very rare edge case as the specific issue I am seeing every day or so is that git clean -ffxd fails with warning: failed to remove /something but every time I go in to look around after everything looks completely normal and then running git clean -ffxd runs successfully.