Page MenuHomePhabricator

User reports Drydock not parallelizing working copy builds
Closed, ResolvedPublic

Description

A user reports that they ran "Test Configuration" in two browser windows on a large (15GB) repository, but found only one clone occurred and that the second test waited for the first test to finish.

Event Timeline

Sorry, this actually is expected, if the resource pool is totally empty to begin with. Here's why:

  • You start 2 requests.
  • We limit how fast resource pools are allowed to grow: if 10 requests come in at once, and they all need new resources, we won't start to build 10 new resources right away. This limit on allocating resources is currently ceil(1 + 25% of the active pool size). For example, if you have 0 resources, ceil(1 + 0) = 1 new resource is allowed to build concurrently. If you have 10 resources, ceil(1 + 2.5) = 4 resources are allowed to allocate concurrently.
  • The first request tries to build a new resource, and that's fine. It starts allocating.
  • The second request tries to build a new resource, but the resource is already at its allocation limit (ceil(0 + 1) = 1), so it waits.
  • The first request finishes, and runs the test. The test is very fast so it will generally finish very quickly.
  • The second request wakes up. What happens now depends on whether it wakes up before the first request finishes or not. If it wins, it will start allocating a second resource, then finish on the first resource. If it loses, it will just finish on the first resource.

If you add more requests, you can get the pool to expand further. For example, if you have a third request, it will likely start building a second resource once it wakes up (although it will probably actually execute on the first resource, which will likely become available sooner than the second resource does, and it may not wake up until both other requests are done).

This won't normally affect parallelism because you'll normally have a larger pool size (say, 20) which allows ceil(1 + 5) = 6 simultaneous allocations, and most operations will not need to perform allocations (since they can reuse existing working copies). The expectation is that it only affects things when a lot of unusual requests arrive at once (e.g., 100 requests to do builds on a repository that no one ever uses). In that case, the expectation is that you probably want the throttling in most cases: you don't want to convert the whole pool over to the weird repository immediately.

Does that make sense? Are you seeing any issues outside of a testing/empty-pool environment?

(And, obviously, all of this could be way more clear in the UI.)

Okay, I am able to expand further by adding more requests. Thanks for the explaining!