Discussion: How will iteration over build steps work?
Open, WishlistPublic

Description

So there was some discussion about how in Harbormaster you'd be able to configure build steps to run across multiple sets of data (I'm trying to avoid the term targets here). Basically you could have a single build step called "Build Code", but get it to run over multiple architectures or configurations (like "x86", "x64", "ARM").

I want to get @epriestley's thoughts on how this will be represented in the UI and how this behavior will be structured in the context of existing build steps. I think it's probably useful to be able to designate a set of build steps to be run over a list of configurations, but I'm not sure how this will behave or interact with dependencies, nor how it will be structured internally.

hach-que created this task.Aug 6 2014, 4:06 AM
hach-que updated the task description. (Show Details)
hach-que raised the priority of this task from to Needs Triage.
hach-que added a project: Harbormaster.
hach-que added subscribers: hach-que, epriestley.

Here's a rough sketchy on my idea here, which is pretty nonfinal:

  • Steps split the graph by emitting more than one target.
  • Each target emits instances of the step's artifacts.
  • When a step emits more than one target, execution continues for each target independently.
    • For example, if you have steps "A" and "B", and "B" uses an artifact emitted by "A", and "A" splits 3 ways, you'll get 3 copies of "B", even if it's a totally mundane step like "sleep".
    • If there's another similar "C" afterward and "B" also splits three ways, you'd get 9 copies of "C".
    • These copies have different instances of input artifacts, so they're not identical. E.g., one sees $A.artifact.value as "x86", one sees it as "x64", one sees it as "ARM", etc.
  • The major execution change is HarbormasterBuildEngine::updateBuildSteps() becoming more target-oriented and slightly less step-oriented (basically, dealing with the possibility that a step may have more than one target).

So this lets you do stuff where all the complexity is hidden in a single value -- for example, if you want to run a build that loads 3 URLs, step 1 emits a "variable artifact" vector with 3 values, and step 2 is "make an HTTP request to $step1.url". This has the effect of loading all 3 URLs:

Foreach Step
Emits Artifact: Variable
Values:
  http://wwww.example.com/A
  http://wwww.example.com/B
  http://wwww.example.com/C
  
    +
    |
    +
    
HTTP Request Step
Makes Request To: $step1.url
( This runs 3 times. )

However, we'll need to do more for more complex cases:

  • A new type of step (or some new option) merges split execution lines back into a single execution line, then emits some merged results.

For example, provisioning "x86", "x64", and "ARM" machines might look a lot different. If these steps are very different, I imagine the build plan for this looking like this:

Provision  Provision  Provision
  ARM        x86        x64
  
   +          +          +
   |          |          |
   +----------+----------+
              |
              +

        Merge Execution

              +
              |
              +

          Run Tests
      (This runs 3 times.)

In this case maybe we could abstract that away, but I think having "merge" in the general case would provide more flexibility in mixing parameterizable and nonparameterizable steps.

But this is all super early and I don't imagine building it for a while. I'd like to get Harbormaster + Drydock solid before we do. Until then, I think there are some workarounds we can implement (like "run another build plan") which will let you accomplish most of these goals with a little extra work -- but maybe that's optimistic.

epriestley triaged this task as Wishlist priority.Aug 6 2014, 7:23 PM

but maybe that's optimistic

Specifically, how much of your use cases does "run another build plan" solve? Is it like 80% of the way there or like a huge messy pain?

Do you need merging of execution threads?

hach-que added a comment.EditedAug 6 2014, 10:25 PM

"Run another build plan" is more around solving "I want to restart the build, but I don't want to repeat things that have already been successful". I think it's solves this pretty well (where the relevant artifacts are stored in a separate system and the buildable's identifier is used to look them up).

Our main build is generalised, and publishes data-centre non-specific packages. However, our deployment is data-centre specific, and we need to run the same deployment process for each data-centre.

I was thinking of using multiple targets so we could say "run these steps for DC0, DC1, DC2, DC3.." etc. The problem is that the targets system doesn't allow restarting. Given that we don't want to redeploy to data centres that have had successful deployments if one fails, I think it's probably going to be a requirement to be able to restart these deployments individually. Now at the moment we can get them restartable by having a set of build plans for each data centre, but then we have to maintain N*X build plans, where N is the number of data centres and X is the number of build plans per data centre. That's probably not very maintainable in the long term.

We could maybe have "template build plans" (which are just normal build plans), and have a build step that instantiates and executes build plans from templates, with variables substitution. That probably requires being able to mark build plans as temporary (so they don't show in the Manage Build Plans). It also raises questions about how to make the restart behaviour work correctly, since the build plan that's run has to be the same across restarts, and if the step is instantiating from a template, it needs to be able to know when it can use an already defined build plan (and then, how does this behave if you change the template?)

So I think, even when we have multiple targets implemented, we'll need to find a way to intersect the two feature sets of "this part of a build is restartable" and "do the same thing for a list of things".

Also, given what I've outlined in the above comment, I don't think we need merging of execution threads.

eadler added a project: Restricted Project.Aug 5 2016, 4:54 PM
urzds added a subscriber: urzds.Jul 12 2017, 11:15 AM