Page MenuHomePhabricator

Allow short "result" message for CI builds
Open, Needs TriagePublic

Description

Allow builds to report a short message, to let users distinguish between various error cases ("Failed to obtain code" is different from "Failed to compile" and from "Unit tests failed").

Show these anywhere where we show the build status now (Like Revisions).

We could possibly hard-code N such possible results, but I fear users will always come up with new ways to fail.

The use-case I have is let users distinguish between different kinds of failures:

  • build system blew up -> call build-eng
  • something doesn't compile / tests fail - these are only slightly different; I think users might take "doesn't compile" more seriously than "2 tests fail".
  • Deployment failed because of network -> try again later.
  • Deployment failed because something else -> call build-eng.

And depending on how smart the build system:

  • Build failed because master is broken -> fix master and try again.
  • 4 tests failed, because Alice broke them last week -> go bug Alice to fix it.

Event Timeline

eadler added a project: Restricted Project.Jan 11 2016, 9:27 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jan 11 2016, 9:43 PM

T6139 is also a request for this (I'll leave it up to @epriestley to decide which way to merge it).

While I'm at it, I'll add that the lack of this feature is a significant cause for developers ignoring build failures. Recently we had builds failing for at least over a week without anyone paying attention because there was no distinguishment between "failed to clone because of some network issue" and "the database scripts are broken and someone needs to really fix this now". It's now at the point where developers just ignore builds because it doesn't tell them what's wrong, and nothing happens about broken builds until a build engineer realizes and emails the rest of the development floor.

I think this ticket is much more modest than T6139; I'm thinking this is front-end part (Store and display), and T6139 is more back-endish (Find root cause in log). This is more similar to T9124.

My stack is still running on Jenkins and calling conduit, so I can/need figure this out in code (Based on exceptions thrown) rather than parse logs (Mostly).

I don't really mind how it's implemented; it's mainly that if another task is required to set the reason for failure, then what happens when *that* fails or is otherwise not run (because the error is too fatal for the script or application that's being run to continue). Harbormaster right now doesn't have build steps that run on failure, so there's no post-build steps that can be guaranteed to run to assess the failure result.

(Although another benefit of T6139 is being able to surface build logs in emails that are sent out, so developers don't have to dig for the answer)

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:15 PM

I tried adding a build-step to collect the summary from jenkins, but it turns out that the next step doesn't run anyway if the first step failed (Which is the case I'm more interested in), so back to square one with this.