This adds a build step which will block a build from continuing if there are previous builds of the build plan still running.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers - Maniphest Tasks
- T1049: Implement Harbormaster
- Commits
- Restricted Diffusion Commit
rP270c8d27ab4e: Implement "Wait for Previous Builds" build step
Configured a build plan with a wait of 60 seconds and a "wait for previous builds", then started a build. While that was still building, reconfigured the plan to have a wait time of 3 seconds, started it, and saw it move into the "Waiting" status. When the 60 second build finished, both builds passed.
Diff Detail
- Branch
- wait-for-previous-builds
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
Finding Earlier Buildables: This is tricky. Testing for smaller IDs isn't sufficient: there's no guarantee builds triggered by Herald will be be in order. If commits A, B and C all get discovered at about the same time, they may make their way through the commit parsing pipeline at different speeds and their Herald rules may trigger in any order. The buildables will then get created in arbitrary order, and this rule will enforce arbitrary order.
Finding all earlier buidables seems quite difficult: it's not only possible that they have larger IDs, it's also possible that they don't even exist yet. We may be running the build plan for "C" before commits "A" and "B" even make it through Herald.
I think the rule needs to be:
- Load the buildable's parents (for revisions, nothing I guess; for commits, the commit's parents).
- Check that they have all completed their Herald phase. If not, wait.
- Check that their buildables are all complete. If not, wait.
In practice, this will look like:
- Call diffusion.commitparentsquery with ConduitCall to find parents.
- Check their importStatus for IMPORTED_HERALD to determine that Herald rules have triggered.
- If all Herald rules have triggered, do a Buildable query for (automatic) buildables.
- Check that all matching Buildables are complete.
Waiting: We also can't implement "wait" as sleep(). If we do, the entire task pipeline will block once the number of waiting tasks equals the number of daemons. This is an unreasonably low limit.
Wait has to be implemented as either releasing the task's lease for later reacquisition (which is easier, but less efficient) or recursing into subtask processing (which is harder, but more efficient). Drydock does the latter, but we need more metadata about queued tasks in order to do it. It also may be less suitable here: Drydock has many shallow trees, but this may have a small number of very deep trees.
The easy way is to throw new Exception("Blocked by other work!"); or similar. This is trivial to implement, but will gum up the log a bit and not re-execute as efficiently as it could. I'd do this for now, though -- the reentrant thing is super messy. A small step forward from this would be to implement a new type of exception like PhabricatorWorkerPermanentFailureException, but which represents a temporary failure, say PhabricatorWorkerWaitingForABitException or whatever. This could be made to not log in PhabricatorTaskmasterDaemon->run() (so it doesn't gum up the log) and could have a shorter or alternate retry policy at some point.
The throw vs sleep() thing is important so we can't deadlock the queue.
src/applications/harbormaster/query/HarbormasterBuildQuery.php | ||
---|---|---|
142–148 ↗ | (On Diff #17526) | Revert these changes now? |
src/applications/harbormaster/step/WaitForPreviousBuildStepImplementation.php | ||
60 | We should do the throw new Exception("blocked...") here to prevent infinite deadlock. | |
117 | I think this behavior is correct (i.e., wait only for other copies of this build plan) but the description could make it more clear that we aren't waiting for the entire build, just other copies of this plan. |