Release Server / Workflow app / Future of Releeph
Open, Needs TriagePublic

Tokens
"Love" token, awarded by Sam2304."Love" token, awarded by Luke081515.2."Love" token, awarded by tycho.tatitscheff.
Assigned To
None
Authored By
avivey, Oct 7 2015

Description

This text is written after much discussion in this task, so some terms may have changed.

We're looking for a Release system, and hope to fit it into Phabricator.

We have 3-4 different Release Flows, and the whole process is manual. The "Apps" release flow has about 40 Android apps.

Primary use-cases for Release Server:

  • Codify and trace release procedures.
  • Answer: When had Commit X hit production / some deployment group? Is commit Y in production now?
  • Make Release Notes easy to build (List changes between Releases)

Example Release Flows:

Pull Requests: I love GitHub and hate Phabricator, but I work at a company that has forced me to use Phabricator. I curse the Phabricator upstream daily. I work on a team (the "Backend Server" team) where everyone rightly feels the same way I do. We only want to use Pull Requests. These are the One True Way to build software.

  • I create a Product called "Backend Server".
  • I create one Release called "master".
  • Everyone pushes their local feature branches into the remote, then make "Release Change Request" (aka "Pull Request") to have their changes pulled to master.
  • When a request is accepted, Phabricator automatically merges it to master. Just like GitHub!
  • We reconfigure the header to say "GitHub (Bad Heathen Version)". For a time, there is peace.
  • After a while we add a "Run Tests" step to the Product, triggered "When a merge is requested". This is better than triggering on commits being pushed because we love pushing our local code into the remote in many different states of brokenness, retaining checkpoint commits, etc. But this is acceptable and we stop merging stuff that breaks the build.
  • A while later we add a "Deploy to Production" step to the product, triggered "When Deploy is Clicked".
  • Eventually we move that to "When Release is Updated" so that the thing deploys after every pull request is merged.

Facebook-Style Cherry-Picks / Phabricator-Style Stable / Backporting: I am the Phabricator upstream and have a long-lived stable branch.

  • I create a "Phabricator" Product, and an "Arcanist" product and a "libphutil" product.
  • During the week, after fixing a bad bug I merge it into "stable" using the release tool ("Release Change Request" for a single commit). This helps keep track of what got merged.
  • Maybe "stable" is a single Release or maybe we cut a new one every week. Probably the former at first and then the latter once we get bigger.
  • Every time "stable" gets updated, Harbormaster starts rolling it across all servers

Binary Release: My Release is sent to users in a box via the Post [Installed on my Enterprise servers]

  • I create a new numbered Release every Wednesday, cut from master.
  • Harbormaster compiles the whole thing as "3.11 RC1" and installs it on some Test environment
  • QA runs some tests, finds some bugs
  • I make a new "Release Change Request" to cherry-pick single commits to fix the bugs. Rebuild and deploy.
  • I Freeze the Release, not allowing any more changes. If we finds more issues, I create a new Release.
  • HM Builds "3.11 Final"
  • HM asks for QA to sign off the release, and then automagically sends it to the movable-press-company [Starts rolling servers].

Details / Plans

We'll be taking some elements from Pipelines (https://github.com/PageUpPeopleOrg/phabricator-pipeline), and mostly use Releeph and Harbormaster.

  • A "Release" is the object currently known as Releeph Branch. We'll rename it, augment it with some more information, and maybe detach it from vcs branches.
    • A Release is a HM Buildable
  • "Release Plan"/"Product Line" is the template for a Release. It will define:
    • Several HM Build plans, to run at different occasions (Similar to Pipeline): Triggers for Cut, New Change, Update, Deploy, etc.
    • Maybe instructions about "how to cut", "how to version"
    • In my example, all 40 apps will use a single "Product Line" template for their releases
  • A Release may have several Artifacts - these will be either Files or Phragments or HM Artifacts.
  • A "Workflow" is just a Harbormaster Build.
    • We'll add a "Wait For User Approval" step, to allow tracking manual steps (Probably end up using Quorum T9515).
  • Release Change Request object is essentially Releeph Pull Requests, but with some pending changes.
  • Probably rename "Releeph" to something else, or maybe write a new thing (Depending on Releeph's code state).

Details

Differential Revisions
Restricted Differential Revision
D16981: Initial code-dump of Release
There are a very large number of changes, so older changes are hidden. Show Older Changes

Broadly, here's where I think the closest mappings to roadmap/starmap stuff are:

  • Overall, this workflow is mostly owned by Releeph.
  • Releases are Releeph Products (these are currently tightly bound to having cut points and being associated with individual repositories, although we could theoretically generalize that.)
  • We add Artifacts to Releeph Products (this seems entirely reasonable).
  • We add an optional "Deploy" action to Releeph products which lets you deploy a product.
  • We add "On cut / merge" and "on deploy" build plans, or something wrapping build plans.
    • e.g.: on cut or merge, rebuild the release artifact

I think "Workflow" can likely just be a Harbormaster build plan, but with "ask for approval" steps along the way. One approach might be to use the "Quorum" stuff discussed in T9515. But it's generally reasonable for Harbormaster to include a "Wait for someone to click a button" build step, or a more specific "Wait for Product deployment approval in Releeph" action, or something like that.

I think most of this is fairly natural. The biggest issues are:

  • Releeph is still a huuuuuuuuge mess.
  • Like 9/10ths of this stuff is either in planning or just-barely-working stages of completion.
  • Tracking this all the way to servers might be tricky.
  • Modeling "Deploy" as "run a build plan" seems very under-powered for real deploy processes. But maybe adding a finite number of deploy actions which are each build plans ("Deploy to staging", "deploy to limited production") would be good enough.

There is a similar problem elsewhere of roughly "which of Debian Jessie, Debian Kraken, and Debian Longhorn is this bug fixed in?", where releases are mutable, and our stable "releases" are also mutable, and Facebook's www deploy is mutable, and Releeph currently spends most of its effort on making products mutable. But it makes sense that a large class of releases might reasonably be modeled as immutable, especially if they have meaningful build steps.

(give me a heads up before this task is made public so I can remove
screenshots)

Thanks for the inputs :)

I think that in my terminology, stable and www are not actually "releases", but rather "environment" or "deployment target"; rP0db86 would be a Release "cut point" (and possibly "version name"), and rPf760b3 would be the "deployment":

That allows releases to still be immutable.

Tracing deployment all the way to servers/field is a huge undertaking which I'm not immediately interested in; I think we've discussed a "production log" in some related context, which will fill that gap.

Actual deployment in the real world can be so ridiculously complex, that I don't even think about starting to have a generic tool for any of it.

Extending HM to support "Workflow" behavior looks like it would be simple enough, and solve an important part of the use-case (Formalize and audit the release process), although it will require some UI around "Approve This Step".

I'll start poking around Releeph, I guess.

Oh, I actually meant "Releeph Branch" not "Releeph Product", I think. Products are like "The iPhone App", branches are like "The iPhone App, 2012 Week 29". Branches could be renamed to "Releases" reasonably.

I think the idea that a release is potentially based in some number of repositories other than 1 is the only real conceptual mismatch. That one seems hard to deal with -- it's particularly desirable that a release can be a Harbormaster Buildable so we can just hand it off to build plans in the case of mutable releases, but this doesn't make much sense if release is 0 repositories or 5 repositories.

I think supporting triggers like those in @hach-que's Pipeline into Releeph is generally sensible, and this is also probably the most effective way to support cherry-pick-into-branch-in-Releeph (after T182).

It would maybe almost be better to start by throwing the Releeph codebase away or starting with a clean slate and bringing pieces over selectively. I think the basic ideas of a "Product", a "Branch/Release" and a "Request" (not applicable in this workflow, but relevant for mutable workflows) are solid enough but there is a lot more code beyond that, some of which has no short path to generality.

(I can also make a Community Skunkworks space or something for nefarious plotting if scope expands here, but I think there's good alignment with my vision for Releeph, triggers like Pipeline, your actual requirements, Harbormaster/Drydock starting to get useful in general, and plans for letting Phabricator do repository writes in T182.)

"release is more than one repository" is a purely theoretical use case, because I was brainstorming; I don't actually want to handle it right now.

I also sort of vaguely want to just rename Releeph to "Release". I don't actually know the origin of the name but I don't think it makes sense? It's just a word that sounds like "Release" but has an "f" in it? Why isn't it "Relieph"?

"release is more than one repository" is a purely theoretical use case, because I was brainstorming; I don't actually want to handle it right now.

We don't actually use multiple repositories at PageUp yet for release lines (only multiple branches), but I can see it being a reasonable scenario where you want to build a bunch of different things and then combine them all onto an AMI (where the AMI is the release artifact, and the component artifacts are used to make that).

I also sort of vaguely want to just rename Releeph to "Release". I don't actually know the origin of the name but I don't think it makes sense? It's just a word that sounds like "Release" but has an "f" in it? Why isn't it "Relieph"?

Steal "Pipeline" kthxbai

That one seems hard to deal with -- it's particularly desirable that a release can be a Harbormaster Buildable so we can just hand it off to build plans in the case of mutable releases, but this doesn't make much sense if release is 0 repositories or 5 repositories.

We actually do this in Pipeline already; we make the release and release state classes inherit from the buildable interface, and then we expose release variables for versioning. However, we always have either 0 or 1 repository URIs, because there's a build for each release state (which has one repository), and builds for the release (which have zero repositories and are expected to use published artifacts from the release state builds).

Also there was some mention elsewhere about "meta repositories" (I think I brought it up on the discussion of working copies), which would be a "repository" that tracks other repository branches. Each of the tracked repositories would be a folder when the working copy is checked out, but primarily this model gives us a unique ID for the combination of repositories, and would allow buildables to be formed as sets of repositories. If we built that mechanism, it could be re-used in the release app when those releases consists of multiple components or repositories.

Modeling "Deploy" as "run a build plan" seems very under-powered for real deploy processes. But maybe adding a finite number of deploy actions which are each build plans ("Deploy to staging", "deploy to limited production") would be good enough.

For our system, we deploy to RC after each release is built (but not "deployed"). We deploy to all production and UAT servers when the deploy button is pressed. This deploy build plan consists of about 7 "Start or Wait for Build Plan" steps which run "Deploy to Environment {$env}" plan, and we use parameterized builds to change which datacenter each plan deploys to.

I think we can formalize this as "Environments" though, and allow deployments to individual environments (or importantly, all environments or groups of environments). Each of these environments would have a deployment build plan, which uses the release as the buildable. Each environment then also allows specification of custom build parameters which are then passed through the build. This would also allow us to track which environment is supposed to have which release (since we can query "what is the last deployment run on that environment"), and allow release managers to know if something is out of sync.

+1 on "Pipeline" instead of "Releeph".

I have a somewhat similar (although much more complex) use case. Basically, I want to eventually have the ability to spin up a QA environment from a diff. Currently we have developers do this manually, which means that the instances sit dormant for hours to days until the QA person is able to test it. Instead, I want to build a "Spin me up a QA environment from this diff" button into Differential. Roughly, I expect this to work as follows:

  1. Whenever a diff is submitted, a Harbormaster build plan would go through the lint-test-build steps. At the end of the process, an artifact would be produced and stored in Phragment.
  2. There would be a Drydock blueprint for a "staging environment". I don't know what exactly this would involve, but at a bare minimum it would include:
    1. An EC2 instance.
    2. A DNS name pointing to the instance.
    3. An RDS database (with some bootstrap process to populate it with dummy data, I don't expect this to happen within Phabricator though).
  3. Clicking "Spin me up a QA environment from this diff" would basically create a "staging environment" resource which would be associated with the diff. This would mean that:
    1. It would link to the lease from the diff itself, such that I can easily discover the DNS name for the instance (for example).
    2. If the diff is closed or abandoned, the lease would automatically expire and the staging environment would be torn down.
    3. After some timeout period, the lease would expire anyway.

We have a somewhat similar process in place for our dev environments, where we give our developers an instance running with HEAD and allow them to rsync their code onto it. One of the issues that we face is that spinning up instances (especially spot instances) takes time. From clicking the button, it can take 10-15 minutes to do the following:

  1. Submit a spot instance request.
  2. Wait for the request to be fulfilled.
  3. Submit a DNS record change.
  4. Wait for the DNS change to propagate.
  5. Wait for the instance to be provisioned (and also our bootstrap process).

This process isn't currently managed in Phabricator, although I would like it to be. One thing that would help here is being able to define some sort of autoscaling group of instances. This way, instead of developers needing to wait for instances to become available, we could setup some sort of rules that says that, during business hours, we should always have at least X instances unleased to any developer. Therefore, when a developer requests a development environment we could ideally fulfil this with unleased instances in the pool rather than needing to make additional spot instance requests.

Is there an unavoidable cost to keeping QA environments up for multiple days?

We had a feature like that at Facebook ("Sandcastle") but the cost was negligible (approximately ~1GB of code on disk, except that it used hardlinking to reduce that to nearly nothing) so it just spun them up automatically and kept them around for a long time. I'd imagine this is desirable in general and I presume it is usually realistic, although it sounds like that's not how things work now. Is there a straightforward technical path available there which is maybe just not worthwhile at the current scale, or are there factors which make it very difficult?

Sorry, I don't quite follow... (It's early and I'm still somewhat jetlagged... I'm in Vegas for AWS re:Invent at the moment).

qgil removed a subscriber: qgil.Oct 8 2015, 3:28 PM

Haha, no problem. I just mean:

Currently we have developers do this manually, which means that the instances sit dormant for hours to days until the QA person is able to test it.

Why is it a problem that the instance sits dormant for days? That is, I'd expect that a QA environment should not need a whole machine to itself, so one VM can hold thousands of QA environments and the cost per environment should be very small (a few cents per day?). Is there a strong technical reason that it's impossible to put 1K+ QA environments on a single host?

The cost is probably small at the moment because we are only talking about an application server, but in the future I'd expect a QA environment to consist of:

  1. Application server
  2. Database
  3. Redis
  4. Memcache
  5. Elasticsearch
  6. RabbitMQ
  7. Various internal microservices

It's possible that many of these pieces won't exist in the QA environment, or will exist as shared services, but it expect the cost to quickly add up.

Unfortunately, I don't think it's easy to just run thousands of environments on a single host. In the general case, the application assumes that it isn't sharing resources with any other host. I expect that bad things might happen if we tried to deploy multiple instances side by side. It's also not possible in the general case because a QA environment might require changes to multiple repositories (there may be a dependent diff which changes the puppet code, for example).

Generally though, I'd rather keep these environments close to production where possible and, as such, I'd prefer to deploy a greater number of smaller instances than larger shared instances.

My concern is not primarily with cost but rather with process. The current process requires developers to perform build steps on their local machine. Our build process isn't overly complex, but I'd rather that staging environments are deployed with the same artifacts (or similar) that would be deployed to production rather than relying on developers. This would also help enforce that what is being tested is exactly what is being reviewed (there are no local changes which were made by the developer after submitting the diff).

To clarify on my terminology, my eventual goal is to have the following environments:

  1. Dev
  2. Test / QA
  3. Staging
  4. Production

Each environment progressively approaches to production. The Dev environment would essentially be a (ideally local) environment with no external dependencies. The database would be localized and most external services would be disabled or mocked out. The QA / Test environment would have some extra pieces, specifically external services. The staging environment would basically be a replica of production.

avivey added a comment.Oct 8 2015, 8:02 PM

@joshuaspence: I have practically the same desire, but I think HM and Drydock are moving to answer this need ("Build be a complex environment based on this diff").
I'm not sure how this fits into the "Release" and "Workflow" use-cases though?

Well, it is essentially releasing into a non-production environment. We may then want to promote the artifact from the test environment to production.

I would want to use the same (or a very similar) workflow for deploying across environments.

Releases would be explicitly marked as being non-production.

avivey added a comment.Oct 8 2015, 9:50 PM

OK, that makes sense to me.

Keep in mind those test releases will be based off diffs, not commits in that model, so we almost certainly can't re-use any of the built artifacts since they aren't integrated.

In this case I'd almost advocate for a "non-production" and "production" set of environments, so a non-production release can never be accidentally pushed to production.

The name "Pipeline" brings "Data Pipeline" to mind for me, possibly because AWS has a product called "AWS Data Pipeline", although it looks like no one else particularly likes "Releeph" either.

Maybe "Conveyor"?

I'd be OK with "Release" too, but that would somewhat preclude us from having an object within the application called a "Release", and I suspect we might want to rename "Branches" to "Releases".

Maybe "Culvert", although that's sort of an ugly, odd word.

Chuckr would be my vote

avivey added a comment.Oct 9 2015, 6:11 PM

maybe "Produce"?

("Pipeline" and "Conveyor" both sound like ETL to me. didn't get "Culvert" and "Chuckr").

A culvert is just a big drainpipe.

chuckr is Chuck Rossi, the Facebook release engineering lead of fame, legend and renown.

As an engineer at Facebook, if you saw "chuckr mentioned you in IRC", pants were ruined.

avivey added a comment.Oct 9 2015, 6:44 PM

So, here's what I understand:

  • Harbormaster will learn the "Wait for user approval" step. Eventually it will use the Quorum UI (Assuming it's coming), so a quick fix can be base on Policy. This should handle more-or-less all the use-cases of "workflow", so we don't need that name any more.
    • This will also answer "Deploy Workflow", at least for now.
  • Release object will be mostly the existing "Releeph Branch" object, with some UI parts from Pipeline.
    • Release will be as immutable as possible
    • Release will be Buildable, and based on a either Commit or Diff (Revisions are mutable)
    • Release will have (a single?) HM Build
    • Release will have "artifacts" (HM Artifacts? Files? Phragmants? TBD)
  • We'll need a Release Plan object, similar to how Build Plan relates to a Build
    • It will hold some parameters, to allow me to release 40 apps in one go.
    • It will reference the Build Plan for the Builds
  • Releeph might be renamed or re-written, depending on how brave we are.

ps: I obviously can't vote against "chuckr".

avivey added a comment.Oct 9 2015, 6:52 PM

Artifacts: Either as Files or Phragments, they can be:

  • Attached to a Release with some flag / slot (A Release expects some specific artifacts)
  • Trigger a build for "validation" (if uploaded from outside the system)
  • We can feed them via the HM Build, as in "expect the file to be somewhere, then create a HM Artifact". This sounds a big convoluted.

Release will be as immutable as possible

I still want to support a pretty-much-exactly-like-GitHub pull request workflow (where the release is mutable and never closes, e.g. master) and a Facebook-style mutable merge model (where the release is mutable and closes after a period of time, e.g. production-20151101) in this tool, so I'd expect Releases to retain full mutability, just not actually be mutated in your environment.

Release will have (a single?) HM Build

I'd expect there to potentially be a bunch of builds triggered per release in the long run.

with some UI parts from Pipeline.
We'll need a Release Plan object, similar to how Build Plan relates to a Build

Minor technical distinction, but I'd expect Products to pick up the "run builds" parts of Pipeline, rather than Branches/Releases directly. So you'd configure a Product like:

  • When a new Release is cut or updated, run plans: [build artifacts]
  • If this is a mutable release, when a merge is requested, run plans: [(none)]
  • When (hand waving here) Ops clicks the "Deploy" button, run plans: [deploy to staging]

So the "Release Plan" would probably just be a batch way to say "Click the 'deploy' button on these 40 releases"?

I think they need more than one "deploy" button, and maybe we need to introduce the idea of a "Target Environment" or something, so the actual button is "Click the 'deploy to staging' button on all these releases", and then the next screen says "3 of those releases have no way to deploy to staging, deploy the other 37?".

avivey added a comment.Oct 9 2015, 7:09 PM

I'm thinking of "Release Plan" as a more generic Product:

  • Instructions on how to cut
  • Instructions on which HM build(s) to run when
  • List the expected artifacts

For my 40 apps, I'd like to have a single Release Plan, which are parameterized over "name" and "repository", and somehow invent "version name/number", "cut commit hash". I can do 40 calls to "make Release form Release Plan", but I don't want to have 40 copies of essentially the same "Use master and run build X" information.

avivey added a comment.Oct 9 2015, 7:12 PM

The "sort-of-like-github-pr" flow of master is essentially a hook for master being updated? Or more "pull request" where "deploy" means "merge to master and close this"?

Specifically, here are three workflows with different levels of mutability that I'd like to support:

Pull Requests: I love GitHub and hate Phabricator, but I work at a company that has forced me to use Phabricator. I curse the Phabricator upstream daily. I work on a team (the "Backend Server" team) where everyone rightly feels the same way I do. We only want to use Pull Requests. These are the One True Way to build software.

  • I create a Product called "Backend Server".
  • I create one Release called "master".
  • Everyone pushes their local feature branches into the remote, then makes Pull Requests to have their changes pulled to master.
  • When a request is accepted, Phabricator automatically merges it to master. Just like GitHub!
  • We reconfigure the header to say "GitHub (Bad Heathen Version)". For a time, there is peace.
  • After a while we add a "Run Tests" step to the Product, triggered "When a merge is requested". This is better than triggering on commits being pushed because we love pushing our local code into the remote in many different states of brokenness, retaining checkpoint commits, etc. But this is acceptable and we stop merging stuff that breaks the build.
  • A while later we add a "Deploy to Production" step to the product, triggered "When Deploy is Clicked".
  • Eventually we move that to "When Release is Updated" so that the thing deploys after every pull request is merged.

Facebook-Style Cherry-Picks / Phabricator-Style Stable / Backporting: I am the Phabricator upstream and have a long-lived stable branch.

  • I create a "Phabricator" Product, and an "Arcanist" product and a "libphutil" product.
  • During the week, after fixing a bad bug I merge it into "stable" using the release tool. This helps keep track of what got merged.
  • Maybe "stable" is a single Release or maybe we cut a new one every week. Probably the former at first and then the latter once we get bigger.

Binary/Build-style Releases: Releases are versioned and build real binaries and immutable. All the same stuff above, except there are never any pull requests or "on pull request" actions. Maybe there's just an option to disable them in the Product.

"Pull Requests" are the existing "Pull Requests" in Releeph. They're literally just pull requests. Releeph today is like 90% about implementing pull requests and then 10% about surfacing pertinent details about those requests prominently so chuckr and peers can bulk process hundreds of them per day.

(The pull requests are just useless outside of the Facebook workflow because they can't merge and the "you can do hundreds of them really quickly" aspect isn't useful at less-than-Facebook scales.)

Specifically, you:

  • Go to a Branch/Release page.
  • Click the "New Pull Request" button.
  • That goes into the queue for the Branch/Release.
  • Whoever owns the Branch/Release can approve/reject it.
  • That's the end of the workflow today since Harbormaster didn't exist and none of T182 was planned. Last I knew, Facebook completed the rest of the workflow with custom arc do-a-bunch-of-git-stuff extensions.

I think this is totally compatible with immutable Branch/Releases, we just might need a way to hide/disable the workflow and hide/disable any configuration options that are specific to it ("On Pull Request", etc).

avivey added a comment.Oct 9 2015, 9:12 PM

"Binary" style releases might actually not be as immutable as I'm hoping; A Release Candidate might start it's life as a cherry-pick style release, and then be frozen at some point. If we're building binaries each time a new cherry-pick is picked (To test in Staging, e.g.), we might call them all "3.11 RC3", and when finalizing, build a new one as "3.11".

If we find a bug after freezing, I'd like to think that we'll start on a "3.12 RC1".

This is setup-specific, so "frozen" might just be a state on the Release object (And HM plan will be allowed to "Freeze Release"?)

Yeah, that seems reasonable to me. You can already "Close Branch" today which is effectively the same action as "Freeze Release". I'm broadly comfortable with moving Harbormaster in the direction of having richer application awareness and interactions, although we'll have to think a bit about what happens when you "Freeze Release" in a build plan and run it on a commit (does it fail? get skipped? configurable?).

For the "40 applications with similar plans" case, I don't think all the Releases under a Product necessarily need to have the same repository, but then you're still looking at some sort of API/script action to do the actual creation of releases (a Product could be more like a "Product Line" in that case). But maybe that's fine, at least for now.

For example, I think Phabricator, Arcanist and libphutil probably have identical Product rules except for which repository they come from, so there might be a use case for that even in the upstream.

@hach-que - I'm getting ready to public-ize this task (After updating the description).

avivey renamed this task from RFC: Release Server / Workflow app to Release Server / Workflow app / Future of Releeph .Oct 12 2015, 8:20 PM
avivey updated the task description. (Show Details)
avivey removed avivey as the assignee of this task.
avivey added a project: Harbormaster.

Not sure if any of this is relevant to T8297, but everyone loves walls of text!

avivey changed the visibility from "Subscribers" to "Public (No Login Required)".Oct 13 2015, 3:53 PM
avivey added a project: Restricted Project.Dec 23 2015, 1:15 AM

FWIW, at the WMF we make releases that involve hundreds of repositories: one for mediawiki, plus one for each mediawiki extension that we host. I don't think it is an ideal situation, and it causes me all kinds of grief, but it's the current state of affairs. So there is at least one potential use case for a release that encompasses a snapshot of multiple repos.

Luke081515.2 added a subscriber: Luke081515.2.
kaw added a subscriber: kaw.Jan 20 2016, 1:20 PM
elesh added a subscriber: elesh.Jan 29 2016, 7:55 AM
amusso added a subscriber: amusso.Apr 7 2016, 7:42 PM

So I was thinking about software components recently and one of the issues I've had with both Jenkins and Harbormaster is that builds of one repository aren't aware of the builds from another repository. This is a common scenario in the software I build:

  • Repository A is some software in source form
    • Each commit of repository A gets built and published into an external package repository. The package is literally versioned by the Git commit hash, and the external package repository also has a copy of the branch pointers.
  • Repository B is some software in source form
    • It depends on https://packagerepo/SoftwareA or w/e, and it tracks the master branch of that software.
    • Each commit of repository B gets built and published into an external package repository, etc.

The problems is when someone makes changes to both A and B, and commits them one after another. In this scenario, one of two things happen, either:

  • Repository B incorrectly uses an older binary version because Repository A hasn't finished building yet, or
  • Repository B clones the source of Repository A and builds it from source form because the binary isn't ready yet

Both of these options are undesirable, and I'd rather have repository B wait until the dependency from A's build is available.

I thought of some sort of componentized-build server that layers on top of Harbormaster. Instead of building arbitrary buildables however, you set up "components" in this system.

Each component tracks one of more repositories and creates a Harbormaster buildable when one of those repositories changes. Components also have identifier URIs. In addition, Harbormaster build steps can push dependencies back to the component, to make it aware of other related components and dependencies.

So in this case you'd have component A tracking repository A and component B tracking repository B. When something from B starts building, it has something like "Scan Dependencies" build step which looks at the contents of the working copy and picks up dependencies from the package management files on there (we'll need to make this extensible to support different package formats or something? maybe we can just make it run a command with the expectation the command returns a JSON blob?). So this step from component B would post back something like "component A's package URL at version XYZ" or "component A's Git source URL at version XYZ". Then the "Scan Dependencies" step would wait for until there's a Component A built with that version, or wait until Component A's build stabilizes (in the scenario where it's tracking a mutable pointer like master). In order to resolve the ambiguity around "is master up-to-date according to Phabricator", we can make the stabilization check request the repository is updated now (in the case of imported repositories), and wait until all commits are imported.

As per the other things in this task, you'd be able to have multiple phases / stages (manual or automatic) that trigger separate Harbormaster builds, and you'd just flag one of these stages as "the component version is now considered published". We could extend this and have like "the component version is now considered published in XYZ environment" or something, and then "Scan Dependencies" could wait until the component is available in a certain environment too? That would allow us to have like "this package is available in the NuGet repository" or "this AMI is now available in the AMI registry" as different phases / environments?

I don't know whether this architecture sounds useful to anyone else?

[/end ramblings?]

avivey added a comment.EditedSep 12 2016, 11:51 PM

One thing that happened locally wrt "Product Lines" and 40 apps:

  • RelMan wants to think about the 40 apps going out at the same time as a single thing, even if they have slightly different code changes.

For instance, they want to cut them all at the same time, and deploy them all to the staging environment/prod in one go, etc.
ATM, that means that we have a single Release with lots of repos in it, but that might one day grow into a "Meta-Release"/"Release Bag"/"Train", which is basically a collection of releases that are managed together.

@hach-que what you describe is being used by OpenStack (a cloud management system) for their CI. They wrote an adhoc software named Zuul and the feature you describe match the description at http://docs.openstack.org/infra/zuul/gating.html It is using Gerrit (a code review tool by Google for Android) as a source. So you are not alone :-]

Reusing your example with A being for example a library and B depending on it. When you approve diff 1 on repo A and immediately after diff 2 of repo B you want to pass both diff to the buildable of B so it knows about the diff in A that is about to land. So you get:

  1. (test A + diff 1)
  2. (test B + diff 2) with (A + diff 1)

If you want to speed up the process by having the build run in parallel you will need to build (A + diff 1) twice, though in the second build you can probably skip the tests of A.

There are some gotchas:

  1. if first build fail because (test A + diff 1) has some fault. The second build has to be retriggered/updated to build against A (since diff 1 did not merge)
  1. first build pass: it lands. The second build if run in parallel will also land assuming (test B + diff 2) pass tests.

Depending on your project, when there are lot of cross project dependencies and the tests are long, it might be worth parallelizing. Else throttle and update changes depending on the outcome of changes ahead in the queue.

shyu added a subscriber: shyu.Dec 2 2016, 3:33 AM
avivey added a revision: Restricted Differential Revision.Dec 9 2016, 10:10 PM
gou1 added a subscriber: gou1.Dec 19 2016, 10:13 AM
avivey updated the task description. (Show Details)Dec 19 2016, 6:34 PM
avivey removed a project: Abuse.
eadler added a subscriber: eadler.Jan 4 2017, 5:50 PM

So I was thinking about software components recently and one of the issues I've had with both Jenkins and Harbormaster is that builds of one repository aren't aware of the builds from another repository. This is a common scenario in the software I build:

This is one reason it makes a lot of sense (at least for some teams / orgs) to use a "monorepo":

I've made some progress in the direction of this task in D16981 and D17020, but I've since then more-or-less lost the external pressure to implement this. I might get around to completing this eventually, but I might not.
The code in those diffs is mostly usable, but it does require some amount of local extensions to be implemented. If anyone is interested in trying it out (Or even taking over the changes), I can instruct you on how to do it locally and what's missing.

Those diffs are based on a local implementation of the full system, so it should be in a working order.

voondo added a subscriber: voondo.Feb 9 2017, 2:18 PM
Sam2304 added a subscriber: Sam2304.

@avivey: I'm somewhat interested in this if you have any tips for getting it working locally I would like to try it out and see if I can contribute anything towards a finished extension.