Page MenuHomePhabricator

Make Drydock lease infrastructure more nimble
ClosedPublic

Authored by epriestley on Sep 28 2015, 2:28 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 8, 7:40 PM
Unknown Object (File)
Tue, Dec 3, 2:21 AM
Unknown Object (File)
Thu, Nov 28, 7:52 AM
Unknown Object (File)
Tue, Nov 26, 11:01 PM
Unknown Object (File)
Tue, Nov 26, 6:03 AM
Unknown Object (File)
Fri, Nov 22, 3:13 AM
Unknown Object (File)
Thu, Nov 21, 10:13 PM
Unknown Object (File)
Wed, Nov 20, 12:28 PM
Subscribers
None

Details

Summary

Ref T9252. Currently, Harbormaster does this when trying to acquire a working copy:

  • Ask for a working copy.
  • Yield for 15 seconds.
  • Check if we have a working copy yet.

That's OK, but Drydock takes ~1s to acquire a working copy lease if a resource is already available, so we end up doing this:

  • T+0: Ask for a working copy.
  • T+0: Yield for 15 seconds.
  • T+1: Working copy lease activates.
  • T+15: Working copy lease is used.
  • T+16: Build finishes.

So we end up spending about 2 seconds doing work and 14 seconds sleeping.

One way to fix this would be to fiddle with the yield duration, so we yield for 1, 2, 4, ... seconds or something. This probably isn't a bad idea for longer leases (i.e., wait for 15, 30, 45 ... seconds or similar) but it implies a lot of churn for short leases.

Instead, let tasks "awaken" other tasks when they complete. The "awaken" operation means: if a task is in a yielded state (no failures, no owner, explicitly yielded, future expires time), pretend it only yielded until right now instead of whenever it really yielded to.

Basically, this rewrites history so that even though Harbormaster did a yield(15), we pretend it did a yield(4) after we activate the lease if lease activation took 4 seconds.

If this misses, it's fine: we fall back to the normal yield behavior and things move forward normally a few seconds later.

If it hits, we get a more nimble process pretty cleanly.

Test Plan
  • Restarted a build plan (lease working copy + run ls) with this patch no-op'd, took about 16 seconds.
  • Restarted a build plan with this patch active, took about 1 second.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Make Drydock lease infrastructure more nimble.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added reviewers: chad, hach-que.
src/infrastructure/daemon/workers/PhabricatorWorker.php
130–136

This is just formalizing the $options before it gets out of control.

274–278

This can race with PhabricatorWorkerLeaseQuery in some sense, but any sequence of updates and outcomes is still fine and nondestructive.

Since we require the lease owner be (yield), we can't interact with unleased tasks.

If we're awakening a task while LeaseQuery is leasing it, either might execute first:

  • if LeaseQuery executes first, it will change the leaseOwner before we act, and we'll just miss the row;
  • if we execute first, LeaseQuery will fix the expiration time when it acts.

These outcomes are both fine and produce the desired result.

chad edited edge metadata.
This revision is now accepted and ready to land.Sep 28 2015, 4:21 PM
This revision was automatically updated to reflect the committed changes.