If an exception occurs when saving the artifact (e.g. due to a duplicate key exception), then we need to release the lease otherwise it will be orphaned and hold resources open.
This fixes it so that the lease is released if needed.
Differential D13751
[harbormaster/core] Fix issue where exception during artifact save would result in orphaned lease hach-que on Jul 29 2015, 1:02 AM. Authored by Tags None Referenced Files
Subscribers
Details
If an exception occurs when saving the artifact (e.g. due to a duplicate key exception), then we need to release the lease otherwise it will be orphaned and hold resources open. This fixes it so that the lease is released if needed. Not tested.
Diff Detail
Event TimelineComment Actions
This pattern is inherently race-prone and unsafe. Another process can load the object between reload() and save(), then either our changes or the changes made by that process are lost. setBrokenReason() isn't in HEAD and this use is upstream-controlled but untranslatable. Resources and leases should tend to self-heal, and errors other than duplicate names should be sufficiently rare that we probably don't need to special case this, and duplicate names should theoretically only occur through error/misconfiguration. (I'm not sure duplicate names should even be an error.) We could provide a race-safe way to quick-release a resource, or build artifacts in multiple phases, but this particular case seems not worth resolving until other failure modes are more clearly understood. If this is the only case of this, I'd lean toward not handling this specially. Comment Actions The exception I was seeing here was duplicate artifact keys. When an artifact happened to have a hash collision, the creation of the artifact would fail and the lease would be left open (because even though the build fails afterwards, the artifact isn't registered). This would result in Drydock resources being kept open by leases that were no longer being used. Comment Actions
I think this is fine -- Drydock needs to be able to recover from this anyway, since the process holding the lease may also be struck by lightning. If this is rare and mostly the result of human error, it seems fine to leave leases around until they naturally expire. As written, this patch can't fix this problem on its own -- even ignoring race issues -- because the reload() could throw and then we're right back to where we were before. So we either need to make the current behavior recoverable or make this patch way more complex anyway. Comment Actions Leases don't naturally expire though and they'll be held open forever otherwise. Whenever artifact hash collisions occur (we see them pretty regularly at work), people would then need to remember to release the leases. Comment Actions We have a problem right now: unused open leases don't expire. This is a general problem with Drydock leases as a whole, and affects all use cases, because pretty much anything can allocate a lease and then get hit by lightning or otherwise be unable to continue. This change (a) does not prevent the problem in general, and does not prevent it even in this use case; (b) introduces a race on the lease which may introduce new -- and potentially more severe -- problems; and (c) would very likely be completely unnecessary if we fixed the problem. Instead of doing this, I think we should fix the lease problem. Comment Actions I'm pretty sure leases have an expiry field in the new Drydock now, even if they aren't used. |