Page MenuHomePhabricator

Improve several exception behaviors for Harbormaster workers
ClosedPublic

Authored by epriestley on Jan 12 2014, 8:00 PM.

Details

Summary

Ref T2015. Several fixes:

  • checkForCancellation() no longer exists, and isn't relevant for resumable stops. Throw it away for now.
  • Fix an issue where a build could pass even if the final step failed.
  • phlog() exceptions so they show up in bin/harbormaster and the daemon logs.
  • Write an exception log if a step fails.
  • Add a "throw an exception" step to debug this stuff more easily.
Test Plan
  • Grepped for checkForCancellation().
  • Ran a failing build where the final step caused the failure.
  • Observed phlog() in bin/harbormaster output.
  • Observed log in web UI:

{F101168}

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

src/applications/harbormaster/step/CommandBuildStepImplementation.php
73

I presume the reason for not using resolvex is that we want to dump stdout / stderr to the log first?

src/applications/harbormaster/step/WaitForPreviousBuildStepImplementation.php
33โ€“36

Why is this removed? This will mean the build will not display waiting information to the user.

src/applications/harbormaster/step/CommandBuildStepImplementation.php
73

Yeah. This would maybe be a little bit nicer as resolvex + capture exception + log + rethrow, I just wanted to get rid of $build mutations in workers.

src/applications/harbormaster/step/WaitForPreviousBuildStepImplementation.php
33โ€“36

Two reasons:

  • I've removed all updates of $build from the actual steps. This avoids a bunch of locking and race issues where multiple things are trying to write to $build at once: instead, the build update process does all build writes, and it uses locks to guarantee it is the only simultaneous writer. So these direct writes to $build violate that (and, e.g., could overwrite changes in other fields which had happened since $build loaded).
  • Now that steps can run in parallel, a single step being in the "waiting" state does not mean that the whole build is waiting: other steps may be running. Instead, we should set the step as waiting and fail temporarily. Elsewhere, in the build update process, we should set the whole build as waiting only if all of the current steps are waiting. Since the only thing that not doing this costs us is that statuses say "running" instead of "waiting" and it's somewhat complex, I left it out of this diff (as the other changes represent meaningful unbreaks). I plan to restore it in the near future.