See PHI115. See PHI272. Maybe see T12145.
- Drydock should ship with a spot instance EC2 allocator which works properly and makes it clear how to write this kind of allocator.
- Drydock should ship with a one-lease-per-host allocator or a mechanism for configuring this.
- Drydock should ship with at least one reclaim-on-release resource.
- PHI272 has some other reasonable use cases which upstream builtins and/or documentation should make clear.
See PHI272. See PHI299. Drydock blueprints use `CustomField` to support type-level fields, but should use `EditField` instead.
See PHI182. `DrydockRepositoryOperationStatusView` has hard-coded errors. These should be modular.
See T11693. That has a bunch of generally sane things although it's maybe somewhat speculative.
See T11694. Some of D16594 can likely come upstream.
See PHI270. This discusses adding "temporarily unavailable" or some other similar maintenance state to blueprints.
See PHI129. See T8153. This discusses better (somewhat semantic) logging on resources.
---
Recovery Issues:
- See T10559. Unclear what this is, but I'll make some effort to reproduce it.
- See T11495. This //appears to be// that we don't recover correctly if you can't clone from a staging area?
- See PHI312. Drydock can attempt to lease a resource which is currently being reclaimed, and does not recover gracefully.
---
Errata:
- Resources in BROKEN can receive commands. Leases in BROKEN can not. Probably just an oversight?
- (T13073#235779) Can we remove `setActivateWhenAcquired()` on `Lease` and make this do the right thing automatically?
- Some logs render with HTML in them when running `bin/drydock lease`.
Compatibility breaks:
- `Lease->releaseOnDestruction()` is now `Lease->setReleaseOnDestruction(bool)`. Passing `true` preserves identical behavior to the old call.
Documentation:
- Because we can throw leases back in the pool if they acquire on dead resources, `acquireLease()` must not cause side effects other than acquiring slot locks (or we need a mechanism for reversing these side effects).
See PHI570, which identifies a possible issue with recycling resources.
See PHI885, which identifies an issue with `git fetch` hanging during repository operations. We should add timeout behavior and possibly try to figure out if `git fetch` is "actually doing something" versus hung.
See PHI882, which requests Kubernetes support.