Page MenuHomePhabricator

Use Drydock to allocate build machines for Harbormaster
Closed, ResolvedPublic

Description

Alright, let's tackle this. At the very least it'll give us a good idea of what's working in Drydock and what's not.

The general use cases that Drydock needs to provide for in the region of build machines is:

  1. A preset list of build machines that the user has defined (e.g. preallocated hosts).
  2. Allocating out new hosts using various blueprints (AWS, etc.)
  3. Finding a host with a suitable set of capabilities.
  4. Allocating out a working directory on hosts of various types.
  5. Invoking commands remotely on the host.
  6. Cleaning up the working directory when the build finishes, regardless of whether the build succeeds or fails.
  7. Cleaning up and terminating the host resource if it's not a preallocated resource.

Looking at how Drydock behaves for allocating a working copy, it seems like it'll handle the AWS scenario quite well, but we'll need some sort of interface in Harbormaster for defining preallocated build machines (wherein Harbormaster will provide a blueprint and use that to create a specific resource).

We also need some mechanism to ensure that requesting a working copy uses a particular host; I'm reasonably confident this can just be an attribute on the working copy lease (host.lease=4) when requesting it.

If this all sounds good, I think we should proceed like so:

  1. Define DrydockPreallocatedHostBlueprint. This should only answer to leases with an attribute of preallocated=yes and trust the attributes to provide information about the resource it allocates. I'm reasonably confident that this should allocate out resources with a type of host. It can then provide the command interface based on the attributes of the resource. The resources it allocates should also have attributes preallocated=yes and remote=yes (in contrast with DrydockLocalHostBlueprint which will only provide where remote=no).
  2. Add a UI in Harbormaster to list preallocated host resources and create / edit new ones. This will basically query drydock resources where they are of the host type with attribute preallocated=yes. This UI will allow the user to define the SSH connection details as well as the target machine type (windows, mac or linux).
  3. Add a host lease PHID column and a working copy lease PHID column to the HarbormasterBuild object. Each build has a lease on the host and a lease for a working copy on that host.
  4. Update the DrydockWorkingCopyBlueprint implementation to accept an attribute of host.lease to force it to allocate on a particular host.
  5. Update the Harbormaster worker to request a lease for the host and working copy. It'll first request a host lease for the particular build type (windows, mac or linux) with attributes remote=yes (but leave preallocated unspecified to allow for dynamically allocated hosts). Once it has a lease on the host, it'll request a lease on a working copy where host.lease is equal to the ID of the lease it just got. The reason for requesting the lease on the host explicitly is that we need to be sure of the target machine's type (because we will be executing build commands on it).
  6. Update HarbormasterBuildPlan to add a machine type column where the values can either be windows, mac or linux. We'll make this a drop-down field in the UI, but we could change it to a text field if there's demand for it later on.
  7. Introduce DrydockRemoteCommandBuildStepImplementation; a variant of RemoteCommandBuildStepImplementation that instead uses the command interface on the host lease to run the command in the working directory provided by the working copy lease. It seems DrydockCommandInterface already supports everything we'll need (it returns an ExecFuture) so this should be reasonably trivial.
  8. Ensure that build plans configured for Windows and Linux target machines execute and lease correctly, and that DrydockRemoteCommandBuildStepImplementation executes the builds correctly.
  9. If UploadArtifactBuildStepImplementation is present in upstream, also create a version of that that uses the host and working copy leases. We should probably add a transfer interface used to transfer files independent of how the host resource is provided. This would even allow us to switch between SFTP and SCP on varying host types (e.g. differences between Windows and Linux).
  10. Once the dust settles and this all looks like it's working, drop the RemoteCommandBuildStepImplementation and UploadArtifactBuildStepImplementation build steps. Remove the harbormaster.temporary.hosts.whitelist configuration option.
  11. Celebrate that Drydock is now being used.

Event Timeline

hach-que claimed this task.
hach-que raised the priority of this task from to Needs Triage.
hach-que updated the task description. (Show Details)
hach-que added a project: Harbormaster.
hach-que added subscribers: hach-que, epriestley.

Looking at how Drydock behaves for allocating a working copy, it seems like it'll handle the AWS scenario quite well, but we'll need some sort of interface in Harbormaster for defining preallocated build machines (wherein Harbormaster will provide a blueprint and use that to create a specific resource).

Some discussion in D7593, but the working theory here was that these allocations would just happen through a side channel, not dynamically at runtime through Drydock itself. That is, it looks something like:

./bin/drydock create-resource --type host --attrs host=build1.mycompany.com,platform=windows,credentials=K123...
./bin/drydock create-resource --type host --attrs host=build2.mycompany.com,platform=windows,credentials=K123...

Then the Harbormaster build plan looks like:

  1. Lease a <Windows> host <with some attributes> from Drydock, emitting the lease as an artifact named "buildhost".
  2. Run command "xbuild ..." on host "buildhost".

So Harbormaster never has a dedicated pool, and different build plans can operate on different effective pools. If you want to lock it down to a pool, we'd add some attribute like "role=build" and then the build plan would request "role=build" as "<with some attributes>". In the simplest case, you'd just have an extra machine or two and use them for everything.

We also need some mechanism to ensure that requesting a working copy uses a particular host; I'm reasonably confident this can just be an attribute on the working copy lease (host.lease=4) when requesting it.

Yeah, we can force a lease to a specific resource. But the theory here is that you don't need to do this, I think: instead, you allocate a "working copy" resource:

  1. Lease a working copy of <repository> [with <attributes...>], named "buildrepo".
  2. Run command "xbuild ..." on working copy "buildrepo".

And Drydock has all the logic to figure out how to allocate and lease a working copy. Particularly, shoving this underneath Drydock's layer of abstraction means that Drydock can recycle working copies and get builds running faster by skipping clone steps.

  1. Add a host lease PHID column and a working copy lease PHID column to the HarbormasterBuild object. Each build has a lease on the host and a lease for a working copy on that host.

I think these aren't general enough; there's no guarantee a build happens on only one machine. For multi-platform builds, it can't be on only one machine.

  1. ... We should probably add a transfer interface used to transfer files independent of how the host resource is provided. This would even allow us to switch between SFTP and SCP on varying host types (e.g. differences between Windows and Linux).

Yeah, hosts should be able to provide some filesystem/transfer Interface.