Page MenuHomePhabricator

Allow Harbormaster builds to run another build plan on the same buildable
Closed, InvalidPublic

Description

We never had a task around this when D9807 and D13635 originally existed.

As per @epriestley's original comments, we should first aim for a build step that simply runs another build plan on the same buildable. Even if this build plan has been run before with the same or different parameters, we should always create and start a new build (i.e. no implicit de-duplication).


Original feedback from code review:

These semantics don't make any sense to me for a "Run another plan" step. They mean that all builds from the same build plan are equivalent. This is probably mostly-true today, but not entirely, and definitely won't be true in the future: they may have different parameters. I think the semantics should always be "start a new build". If plan X says to run plan Y 5 times, plan Y should run 5 times.

I think you're solving some more complicated problem here, but that "run build" should always run a build. Some version of this might be reasonable as "generate artifact if not already generated", but that needs more infrastructure in place first. As written, this plan is not appropriate for subdividing work, because if you have a build plan like "make" and run two builds on different hosts, only one of them will actually build the binary. This is unintuitive. We should have "run build" first, which does what it says, and once that actually works we can have "generate or wait for artifact".

We can not put while(true) { sleep(...); } loops in build plans. These can lock the queue. Build plans which need to wait must yield, like HarbormasterWaitForPreviousBuildStepImplementation does.

I'm not really sure this is the right solution to any problem. Presumably you're doing something like:

  • Build 1
  • Run or wait for "make" artifact
  • Run lint

Then:

  • Build 2
  • Run or wait for "make" artifact
  • Run unit

But this pattern won't work if you want to do this across two hosts, since the "make" artifact on one host will conflict with the "make" artifact on the other host. I'd expect this to be structured as:

  • Build 1
  • Run "make"
  • Run unit (artifacts = make)
  • Run lint (artifacts = make)

Then you can run as many copies of that as you want without issues.

Basically, this looks like a hack to solve a very specific problem, but the general version of that problem requires solving a number of larger problems (sub-builds, artifact scope, parameterized builds, parallelism [now solved or better-solved in HEAD?], partial restarts). This kind-of fixes all those problems in your environment, but isn't nearly general enough for the upstream. I suspect almost no other builds outside of your environment could actually use this step.

I'll accept a version of this which actually runs another plan, but I don't think that does much to solve the problem you encountered on its own.


Some more feedback from T5542:

I tend to agree, mostly, but I think D9807's global-namespace-on-artifacts thing isn't the best approach.

I'd prefer to see these things expressed as:

  • Build "A"
  • Retention Policy:
    • After a failure, retain artifacts for 24 hours.
  • Steps:
    • Run "make", producing an artifact visible to this build and its children called "B"
    • (1.1) Run build X (on artifacts = B)
    • (1.2) Run build Y (on artifacts = B)

This says: build a binary, then run several things with it in parallel, providing them data about it. If this build fails, retain its artifacts for 24 hours before destroying them.

You could then individually restart 1.1. or 1.2 for up to 24 hours. After that, the artifacts might get GC'd and you'd have to restart the whole build.

We could probably put a retention policy on target state and actually let you restart targets, but that feels messy and complicated and hard to understand, and probably unnecessary.

This probably solves T5803 too.

Event Timeline

There were also some comments around embedding build plans (with individual build steps being restartable), but I can't find them now. I'd like to pursue these features in the near future, as we're finding Jenkins "multi-configuration projects" don't actually allow you to restart individual builds (which we rely on because AWS is inherently unreliable in it's operations).

I managed to find the original restart target feedback that @epriestley wrote.

I think with the trigger infrastructure we can now at least start to pursue artifacts with expiries, but it depends on what @epriestley sees as most important:

  • Adding a run build plan step
  • Allowing builds to be continued from where they left off after they failed using artifact expiries

I'll wait until I hear back from upstream before I pursue any of this stuff.

(Those two things seem reasonably separate, so this might be better split into two issues, but I'll leave it up to @epriestley to decide that; I just wanted to get better visibility on the discussions that happened around those original code reviews in case any community member wanted to tackle implementing these features with upstream guidance)

eadler added a project: Restricted Project.Aug 5 2016, 4:45 PM

(Feature request with no support mana.)