Provide a workflow to restart Harbormaster builds
ClosedPublic
Actions

Authored by yelirekim on Sep 2 2016, 6:08 AM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Maniphest Tasks

T10867: Version daemons more clearly in daemon console so it's clear when `phd reload` has taken effect

Commits

rP403073c989b4: Provide a workflow to restart Harbormaster builds

Summary

Ref T10867 for original use case. This workflow provides a plausible way for administrators to stop the daemons when performing upgrades or maintenance, then bring those daemons back up without resulting in the failure of builds that were running at the time.

On our organization's phab install, builds are running 24/7. The majority of these builds last for at least several minutes, and contain build steps which fail if interrupted and then resumed, as happens when turning daemons on and off.

Instead of allowing these build steps to resume execution as normal, this workflow will instruct active builds to restart their entire build process instead of just resuming whichever step they were on.

Test Plan

contrived a build plan which would fail if resumed partway through:

lease a working copy
command touch restart_{build.id}
command test -e restart_{build.id} && rm restart_{build.id} && sleep 60

followed old procedure:

run a few of these builds manually
./bin/phd stop
./bin/phd start
saw the builds fail

followed new procedure:

run a few of these builds manually
./bin/phd stop
./bin/harbormaster restart --active
./bin/phd start
saw the builds pass

Diff Detail

Repository

rP Phabricator

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

yelirekim updated this revision to Diff 39663.Sep 2 2016, 6:08 AM

yelirekim retitled this revision from to Provide a workflow to restart Harbormaster builds.

yelirekim updated this object.

yelirekim edited the test plan for this revision. (Show Details)

yelirekim added a task: T10867: Version daemons more clearly in daemon console so it's clear when `phd reload` has taken effect.

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptSep 2 2016, 6:08 AM

Herald added a subscriber: epriestley. · View Herald Transcript

yelirekim updated this object.Sep 2 2016, 6:10 AM

yelirekim edited edge metadata.

What are the advantages of this approach over using bin/phd graceful in your environment? (Predictability of what the "restart" script does / how long it will run for?)

Do you have plans for coordinating the stop + harbormaster restart + start sequence across multiple daemon hosts ("internal deploy magic")?

contain build steps which fail if interrupted and then resumed

Is this fixable? In theory? In practice? Offhand, this seems a little unusual (I'd expect most builds to be repeatable, e.g. commands arc unit or make should work even if run in a working copy with some leftovers from a previous failed build).

I think this is likely fine as a general operations/administration tool, but this particular restart workflow is probably not universally applicable. In particular, if your builds last several hours instead of several minutes, this throws away their work.

Actual code looks fine, and I think this is justified on the basis of making it easier to debug/develop Harbormaster even if it isn't a universal solution to build/restart interactions.

PhutilConsole is sort of out-of-favor versus tsprintf() but I'm not really happy with either API at the moment so I think "do whatever" is reasonable for now, since I suspect the One True API For Telling Users Stuff From The Console has yet to be written.

src/applications/harbormaster/management/HarbormasterManagementRestartWorkflow.php
58	Slightly more flexible as: pht('Restart %s build(s)?', new PhutilNumber($count)) ...then go translate it in `PhabricatorUSEnglishTranslation` if you want pretty text.

This revision is now accepted and ready to land.Sep 2 2016, 12:18 PM

use pretty numbers when displaying build count

Herald added a subscriber: Korvin. · View Herald TranscriptSep 2 2016, 12:46 PM

In theory we could make it so that steps resume correctly, but in practice I have very little control over the contents of the scripts that get run. People tend to wrap all of the stuff up that their build is supposed to do into a single script, and assume they're starting fresh each time it's executed.

Graceful stop isn't a great strategy because we do have builds that run for hours, and I want to be awake when the update completes. Restarting hours long builds is better that failing hours long builds.

Closed by commit rP403073c989b4: Provide a workflow to restart Harbormaster builds (authored by yelirekim, committed by yelirekim). · Explain WhySep 2 2016, 1:31 PM

This revision was automatically updated to reflect the committed changes.

michaeljs1990 awarded a token.Sep 2 2016, 4:10 PM