Page MenuHomePhabricator

Implement Amazon EC2 blueprint for Drydock
AbandonedPublic

Authored by hach-que on Aug 9 2014, 5:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 2:11 PM
Unknown Object (File)
Mon, Jan 6, 2:35 PM
Unknown Object (File)
Wed, Jan 1, 4:43 PM
Unknown Object (File)
Fri, Dec 27, 7:34 PM
Unknown Object (File)
Thu, Dec 26, 7:53 PM
Unknown Object (File)
Wed, Dec 25, 9:44 PM
Unknown Object (File)
Sun, Dec 22, 12:55 AM
Unknown Object (File)
Tue, Dec 17, 4:41 AM

Details

Reviewers
epriestley
Group Reviewers
Blessed Reviewers
Summary

This implements an Amazon EC2 blueprint that allocates and leases hosts from EC2. It starts EC2 instances according to the parameters specified on the blueprint, using the public key of a Passphrase key as the key pair for AWS (which means Harbormaster can automatically connect to the machine on default Linux-based AWS images).

It handles allocation of Elastic IPs, which provides a public IP address for connecting to the allocated machine when Phabricator is not hosted in the same VPC (or on Amazon).

This is built to allocate into a VPC; it doesn't support allocating a classic EC2 instance (and mainly because AWS is trying to deprecate that model anyway).

It provides min-count, max-count and leases-per-instance parameters to allow the user to configure the ideal setup, and to set minimum and maximum hosts for the blueprint.

This also implements the appropriate logic for performing clean up when releasing leases or closing resources (which in this case terminates the instance and releases the Elastic IP).

NOTE: This hasn't been tested with Windows AMIs yet, because that requires significant more setup to test. Basically, a Windows machine needs to be configured with FreeSSHD and the image saved; there are no default Windows AMIs provided by Amazon that have a SSH daemon running by default (and with the key pair specified).
Test Plan

Tested with various configurations of min-count, max-count and leases-per-instance and saw the correct behaviour when leasing and releasing via bin/drydock.

In addition, this has been tested with a Harbormaster build plan that allocates a Linux host and runs uname -a on it, which worked correctly, as can be seen in:

harbormasteraws.png (1×1 px, 165 KB)

Event Timeline

hach-que retitled this revision from to Implement Amazon EC2 blueprint for Drydock.
hach-que updated this object.
hach-que edited the test plan for this revision. (Show Details)
hach-que added a reviewer: epriestley.
hach-que edited the test plan for this revision. (Show Details)
hach-que edited edge metadata.
hach-que added inline comments.
src/applications/drydock/interface/command/DrydockSSHCommandInterface.php
63 ↗(On Diff #24560)

This prevents SSH from prompting for a password when using bin/drydock lease on the command line (due to the connection test), because bin/drydock lease has a TTY available and thus it prompts otherwise.

Ensure $caused_by_closing_resource is true when releaseLease is called from closeResource.

Mark resources as CLOSING to avoid races and allow leasing on pending resources

Fix race conditions and exceptions

So there's still possible race conditions within Drydock when it comes to allocating resources. One particular scenario that I hit was:

There's >8 taskmasters capable of picking up tasks. There are currently 0 host resources, and 4 builds all start at the same time (whose first task is Lease Host).

Basically what happens here is 4 of the taskmasters pick up the Lease Host task and all start allocating at the same time. They check to see if there are open or pending resources (and there aren't), so they decide to allocate a resource (which will pass the resource limits because according to each allocator there are 0 open and 0 pending resources). This race condition is basically present up until allocateResource first calls save(). You can see the race condition in effect here:

drydockrace.png (938×1 px, 98 KB)

To solve this race condition, I think we need to perform a global lock over the full allocator logic, and unlock right after save() is first called to place a pending resource. We could do this by creating the resource initially inside the allocator, saving it, unlocking and then passing it to allocateResource. Then we need to handle the logic to clean up the resource if allocateResource fails (especially since other allocators may have leased on it because it was pending).

epriestley edited edge metadata.

For the allocator race, I think your solution sounds good, except that I'd try to do as little logic as possible in the lock. Can the logic go roughly like this?

  • Lock on resource type.
  • Check if we can allocate; if we can, save a new STATUS_ALLOCATING resource (this is a new "placeholder" status which has no logic and just makes other allocators fail) with enough data that it "counts" for other allocators.
  • Release the lock.
  • Then have executeAllocateResource() (or whatever) fill in the resource instead of creating it?

In particular, I definitely don't want to hold locks while making AWS service calls.

Some inlines in general. Major issues are "select all stuff" and locking behavior.

About half of this is fine to upstream without resolving the tricky bits -- maybe separate out the field stuff and things like the SSH connection timeout? We can get those in as-is and that will make this easier to move forward with.

src/applications/drydock/blueprint/DrydockAmazonEC2HostBlueprintImplementation.php
208–215

Can this use $this->getInterface()?

287–306

Do we need this special-over-the-limit behavior for now? It kind of seems like bad news. If we just say "no", what goes wrong?

313–318

Did we already do this? It seems weird that we let you acquire leases on pending resources...

325–350

some ocpy/paste junk from preallocated host I think

381

(This should be SCP in the non-Windows world, probably...)

src/applications/drydock/blueprint/DrydockBlueprintImplementation.php
130–150

I don't want to pursue this, at least for now -- we should assume there may be a very large number of resources and leases (e.g, at least a million of each). We should not need to load them all to perform allocation tasks.

Behaviors like balancing loads across nodes are reasonable, but we should pursue them a few iterations out, once all the basics work, and in ways that avoid "load everything".

324–328

Same deal as above.

This revision now requires changes to proceed.Aug 11 2014, 9:35 PM
src/applications/drydock/blueprint/DrydockAmazonEC2HostBlueprintImplementation.php
287–306

Builds will randomly fail. Executing another build in parallel on the same machine and taking the performance hit is probably more preferred than allocating another EC2 instance (or specifying a higher limit and having slower builds by default).

313–318

Previously it didn't lease against pending resources, but it was then impossible to adhere to resource limits because a resource could be allocating / pending and wouldn't count as available for new leases.

What this means is that previously you could start 10 builds and it would have to start 10 EC2 instances (resources), even though it's probably faster to wait for an existing resource to finalize allocation and lease against that.

325–350

I think this is still needed (it creates a directory underneath the path specified on the blueprint, so that leases have a unique directory on the host).

381

Probably. The SFTP interface has been tested pretty well in production for uploading files from a host, for about the last 6 months, so while we could add an SCP interface for non-Windows, I don't think it's highly valuable right now.

src/applications/drydock/blueprint/DrydockBlueprintImplementation.php
130–150

These are actually only used for counting the number of resources / leases on a resource. So if scalability is an issue, we should add a field onto blueprints / resources that indicate how many objects they've currently got allocated / leased out.

Then we can leave it up to shouldAllocateLease to query what it needs to.

324–328

Again this is just used to count the number of resources, so this can use the same field as the above logic.

hach-que edited edge metadata.

Rebased on master

Still need to fix the allocator lock.

  • Move resource and lease counting to DrydockAllocationContext

No need to re-review this yet, just pushed an update so i can access it on different computers.

  • Fix race conditions in resource allocation

So I've verified that I've fixed the major race condition which allowed more than the maximum number of resources to be allocated (as per @epriestley's instructions).

However, there's now a scenario where if you run the following command with the allocator:

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do bin/drydock lease --type host --attributes platform=linux & done

What happens is that (correctly) only two resources will be placed into the pending state, and those leases will be resolved correctly. There may be 1 or 2 additional lease that lease correctly against the pending resources. However, a substantial number of leases just simply won't resolve; the bin/drydock process never terminates. I don't know what could be causing this, but my guess is that it's something in the new locking logic.

I'm not sure what aspect of this code could be causing a permanent wait / deadlock, but I've verified that it isn't waiting for resources to open (i.e. leases against pending resources).

src/applications/drydock/blueprint/DrydockAmazonEC2HostBlueprintImplementation.php
336–337

I've verified that it is not waiting here.

Note that when I set the lease and resource statuses manually, I get:

[2014-08-17 11:56:48] EXCEPTION: (Exception) Lease has already been released! at [<phabricator>/src/applications/drydock/storage/DrydockLease.php:152]
  #0 DrydockLease::waitForLeases(array) called at [<phabricator>/src/applications/drydock/storage/DrydockLease.php:182]
  #1 DrydockLease::waitUntilActive() called at [<phabricator>/src/applications/drydock/management/DrydockManagementLeaseWorkflow.php:50]
  #2 DrydockManagementLeaseWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:394]
  #3 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:290]
  #4 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/drydock/drydock_control.php:21]

This implies that the leases do exist in the database, but just aren't appearing against any known resource.

This is the status of all the (unreleased) leases in the database. Curiously, they don't appear to have a resource ID, yet the lease still exists?

MariaDB [phabricator_drydock]> select * from drydock_lease where status <> 2;
+-----+--------------------------------+------------+--------+-------+-----------+------------------------------------------+-------------+--------------+--------+--------------+                                                                                             
| id  | phid                           | resourceID | status | until | ownerPHID | attributes                               | dateCreated | dateModified | taskID | resourceType |
+-----+--------------------------------+------------+--------+-------+-----------+------------------------------------------+-------------+--------------+--------+--------------+
| 324 | PHID-DRYL-opjtnkcoafy3rnrim3v3 |         94 |      1 |  NULL | NULL      | {"platform":"linux","path":"\/srv\/324"} |  1408276707 |   1408276785 |      0 | host         |
| 325 | PHID-DRYL-v5hvm4uw3pkslwtxmo32 |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 326 | PHID-DRYL-v5dbfvip4dbrfvpgt2ci |         95 |      1 |  NULL | NULL      | {"platform":"linux","path":"\/srv\/326"} |  1408276707 |   1408276785 |      0 | host         |
| 327 | PHID-DRYL-br6tp5ovit2wx2xv3ivq |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 328 | PHID-DRYL-tvcd55ilqzvaqm2hcrbj |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 329 | PHID-DRYL-2o5logjxouzc4nluldo4 |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 330 | PHID-DRYL-sk62cqyqibeovhdhcq5c |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 331 | PHID-DRYL-jbarszbjdjai7a2ckm6r |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 332 | PHID-DRYL-yspccndrzg637tpjie3h |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 333 | PHID-DRYL-i6dzwxqvgf5dyxwidety |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 334 | PHID-DRYL-b3n5zzjxhjtgzqxpch73 |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 335 | PHID-DRYL-htgce7ypptlbsdrm6kvy |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 336 | PHID-DRYL-hto7s4h2eojmd4otxgxd |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 337 | PHID-DRYL-qjilvvxxaksfuc3gzyds |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
| 338 | PHID-DRYL-4ryx74f56csh2265ukae |       NULL |      0 |  NULL | NULL      | {"platform":"linux"}                     |  1408276707 |   1408276707 |      0 | host         |
+-----+--------------------------------+------------+--------+-------+-----------+------------------------------------------+-------------+--------------+--------+--------------+

Sorry for all of the comments.

Evidently, acquireLease does not get called, because otherwise these leases would have a resource ID. The task queue is empty, and when running a bunch of daemons with bin/phd debug PhabricatorTaskmasterDaemon, none of them appear to crash. So I don't think this is a fatal error causing an early exit.

The only thing I can see in the allocator that would cause it to terminate or exit, without throwing an exception and without running acquireLease would be return that occurs when no blueprints can allocate a resource. But there's nothing in the Drydock log to indicate it hits this code.

The log for a pending lease looks like this:

1 Blueprints Enabled
Found 1 Blueprints
0 Pending Resource(s) Remain
Found 0 Pending Resource(s)
0 Open Resource(s) Remain
Found 0 Open Resource(s)
Allocating Lease

So it's almost certainly hitting the locking code and then waiting forever. @epriestley, is there any reason why this code would deadlock or wait forever?

src/applications/drydock/worker/DrydockAllocatorWorker.php
176–198

Given that "1 Blueprints Enabled" shows in the logs for a pending lease (that is waiting forever), it is almost certainly deadlocked here.

You mentioned elsewhere that you moved away from PhutilGlobalLock to beginReadLocking() -- can you walk me through that?

That is, I would expect that PhutilGlobalLock is the correct lock type to use here and will be far simpler than using row locks.

So from memory, I started with something like:

$lock = PhabricatorGlobalLock::newLock('drydockallocation');
$lock->lock();

try {
  // ...
  $lock->unlock();
} catch (Exception $ex) {
  $lock->unlock();
}

But I soon found out that lock is not blocking, and throws PhutilLockException if it can't lock successfully. So I thought, well maybe this is a lock you have to wait for; so I then tried something like:

$lock = PhabricatorGlobalLock::newLock('drydockallocation');
while (true) {
  try {
    $lock->lock();
    break;
  } catch (PhutilLockException $ex) {
    // Wait a little while before trying to grab the lock again.
    usleep(rand(0, 500000));
  }
}

// ...

But then I started hitting "Lock '...' has already been locked by this process.", even after I restarted the daemons and web server to clear any previous locks. At that point I wasn't quite sure what I was doing wrong, tried beginReadLocking and got 99% of the expected locking behaviour.

Use $lock->lock(123); to wait 123 seconds for the lock.

Oh okay, that explains why I had all those issues :)

(I'm used to languages / frameworks where the default is to block on locks, with optional non-blocking mode)

hach-que edited edge metadata.

Use PhabricatorGlobalLock. Will test this when I'm in the office since I don't want to be calling AWS APIs on a sketchy mobile internet connection.

Include the working directory fix in D10293 against the new EC2 allocator as well

Testing the new locking behaviour now, will report back when I've verified that it behaves correctly.

Will rebase on D10304 once that lands / the locking issue is solved.

Sending through a new diff.