Page MenuHomePhabricator

[drydock/working-copies/v0] Implement the working copy blueprint
AbandonedPublic

Authored by hach-que on Jul 1 2015, 4:32 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 26, 7:49 PM
Unknown Object (File)
Fri, Nov 22, 12:29 PM
Unknown Object (File)
Sun, Nov 17, 5:28 PM
Unknown Object (File)
Thu, Nov 14, 12:36 AM
Unknown Object (File)
Sat, Nov 9, 8:26 PM
Unknown Object (File)
Wed, Nov 6, 1:11 PM
Unknown Object (File)
Oct 25 2024, 8:35 AM
Unknown Object (File)
Oct 19 2024, 12:00 AM
Subscribers

Details

Reviewers
epriestley
Group Reviewers
Blessed Reviewers
Maniphest Tasks
T2015: Implement Drydock
Summary

Ref T2015. This implements a working copy blueprint which leases a working copy of a repository, commit or Differential revision (using the staging repository).

This only supports Git repositories at the moment, as I only use Git repositories and don't have the capacity to test or implement support for Mercurial or Subversion.

This implementation also attempts to acquire a working copy cache before cloning the repository; if acquisition of a cache succeeds then it uses the path provided by the cache, otherwise it clones directly from the original URL. It is expected that v0 will always fail cache acquisition since the working copy cache implementation will be in v1.

Test Plan

Tested on a development machine. Created a working copy blueprint and leased against it using the Harbormaster build steps in the next revision.

Diff Detail

Branch
working-copy-extract-2
Lint
Lint Passed
Unit
Test Failures
Build Status
Buildable 7085
Build 7222: [Placeholder Plan] Wait for 30 Seconds
Build 7221: arc lint + arc unit

Event Timeline

hach-que retitled this revision from to [drydock/working-copies/v0] Implement the working copy blueprint.
hach-que updated this object.
hach-que edited the test plan for this revision. (Show Details)
hach-que added a reviewer: epriestley.
hach-que edited edge metadata.

Run 'arc liberate'

epriestley edited edge metadata.

I think there are several ideas here:

Harbormaster-Awareness: This makes the blueprint smarter about interacting with buildables, but I think this logic might be better in Harbormaster? It feels better to keep this more dumb -- i.e., pass it tag=x or commit=x instead of buildablePHID=x? The URL stuff might make sense, not sure.

Submodule Handling: This probably makes sense to upstream shortly, although I'd like to get T9123 stable in production first before making this blueprint more complex.

Working Copy Caches: I'd like to see a broader need for this before bringing it upstream. My expectation is that working copy resources themselves will provide adequate caching for most workloads, and this runs into complex constraint problems in T8671.

Windows Support: I want to keep this out of v1.

Overall, I expect to figure out Harbormaster as part of T9123, and submodule handling is probably reasonable to upstream after that. The other parts I want to wait on.

This revision now requires changes to proceed.Sep 23 2015, 6:12 PM

Working Copy Caches: I'd like to see a broader need for this before bringing it upstream. My expectation is that working copy resources themselves will provide adequate caching for most workloads, and this runs into complex constraint problems in T8671.

I have a pretty strong use case for working copy caches.

Basically working copies themselves don't cache anything; every build will clone a new working copy from the internet, with a single lease on each working copy (because multiple lease don't make sense). This means every build performs Git clone from the Internet.

We have build plans that clone repositories which have individual submodules that are >400MB to clone. The repository size as a whole is even larger than this (probably around 600MB+). Cloning this much data from the Internet means that builds take about 10 minutes or so just to get started, because of the data transfer rate. With working copy caches, this drops down to about 30 seconds, which is much more reasonable.

Basically working copies themselves don't cache anything; every build will clone a new working copy from the internet, with a single lease on each working copy (because multiple lease don't make sense). This means every build performs Git clone from the Internet.

This isn't how WorkingCopy resources behave at HEAD. The initial resource allocation will clone a new working copy, but we don't throw it away when we're done. Leases just do fetch + reset when acquired, so the same clone is used over and over again as long as the resource is alive.

Is there a reason this can't work with your build process?

Does it run git clean -xdff on the working copy at HEAD to ensure a clean working state?

Also how does HEAD behave if there are 5 builds running of the same repository in parallel? Do we end up with 5 working copies permanently on that machine, or do they expire over time (my concern here is disk space usage with a large number of repositories)?

Does it run git clean -xdff on the working copy at HEAD to ensure a clean working state?

Yes (well, d and f -- we should likely add x):

https://secure.phabricator.com/diffusion/P/browse/master/src/applications/drydock/blueprint/DrydockWorkingCopyBlueprintImplementation.php;24845c70b918789be5309f88ed3f6455f5f29748$160-162

Do we end up with 5 working copies permanently on that machine, or do they expire over time (my concern here is disk space usage with a large number of repositories)?

Currently, 5 permanent copies. Resources don't expire right now, but will be able to soon (T6569). Leases can already expire, resources just need the same mechanism copied over.

The cost of disk is so small that I'd expect that not expiring these generally costs a few dollars per year, but we'll be able to expire them if this is more of a concern in some environments.

It's mostly about running into "out of disk space" scenarios than storage costs, since when the disk runs out of space, typically everything breaks in a horrible fashion.

Yes (well, d and f -- we should likely add x):

You need -f twice to force the removal of submodules (hence -ff).

Upstream has a radically different implementation / approach to working copies.