Harbormaster scheduling should be more predictable
Open, Needs TriagePublic

Description

Root Problem: When one our builders gets behind for some reason then the order which builds happen is quite unpredictable. This is annoying for users submitting differentials as they end up waiting a long time for the CI to build their revision. It would be better if there was a predictable order which favoured builds which have been waiting longer.

What I think is happening is that once a build is triggered, it continually tries to acquire the builder every 15 seconds. Once a builder becomes available, the first build which tries to acquire the builder gets it, regardless of how long it has been waiting. A build queue would be more suitable for our purposes.

msz awarded a token.Dec 22 2016, 10:32 AM

See some discussion starting here:

T11153#180636

Broadly, today, builds get a first try in roughly the order users expect. If those first tries succeed, I expect builds to process in roughly queue-order.

If those tries fail (currently: for any reason, including because resources are not yet available) they lose their place in queue and fall into the chaotic land of retries.

You can force things to queue more narrowly by applying a build limit of 1 or using "wait for previous builds" (for commits), but you'll give up parallelism in doing so. If you actually want strict queue behavior maybe this is suitable, but you presumably do not mind if builds A and B come in, but finish in B, A order because they ran in parallel. You also presumably do not mind if builds X, Y, and Z come in, but finish in Y, Z order because X hit a bug in the test harness and can't get as far as actually building, vs deadlocking the queue waiting for it to finish.

In the general case, the goal "have the highest possible total work throughput" conflicts with the goal "complete work in the order it arrives". We currently care almost entirely about the former goal. We can tweak behavior so work finishes in roughly queue-like order more often, but "complete work in the order it arrives" is explicitly not a goal of the system.

epriestley changed the title from "Harbourmaster scheduling should be more predictable." to "Harbormaster scheduling should be more predictable".Dec 23 2016, 10:19 PM
epriestley added a project: Harbormaster.

Thanks for these suggestions. We only had one CI machine so these kinds of concerns about parallelism are not important for us. I enabled the options that you suggested.

It turns out the problem was being caused by a builder failing to terminate due to a bad commit, we've added a timeout now and the builder has managed to catch up and keep up with the current demand.

bgamari added a subscriber: bgamari.Jan 7 2017, 4:03 PM

We are still occasionally running into this problem. A queuing system which ensures some kind of fairness would be really useful for us and others in the situation where demand is much greater than the builder resources. (A similar sentiment to T11153#180635)