I am using Harbormaster with Jenkins and I would like to be able to report back an unstable state rather than just pass or fail. Currently if any tests fail but the build worked I have to report back fail.
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Resolved | epriestley | T8089 Unprototype Harbormaster (v1) | ||
Resolved | epriestley | T8097 Allow external systems to report results into Harbormaster | ||
Resolved | epriestley | T5920 Support grey-area results ("unstable" builds, "non-critical" builds) in Harbormaster |
Event Timeline
Personally, I'm not a fan of this. If your unit tests fail, then surely you aren't going to deploy code with failed unit tests? Or if you are, then the unit tests obviously don't impact customers / clients / anyone etc in which case they're not affecting the outcome of your build (i.e. what you do does not change based on whether the unit tests pass or fail).
If you have unit tests that are unreliable or are environment dependent, then they also don't tell you anything useful. Is the failure a real failure? How can you ever know? Eventually this leads to developers ignoring unit tests, and when there really is a serious issue, you won't know about it until it's too late.
I get your point, and agree, but I'm trying to service multiple projects, not just my own, and some of those projects will be very used to Jenkins and the unstable status, hence the request to be able to mark as such a status via the conduit call.
Also, it is useful to know whether tests fail or the building process files without having to go and look in jenkins, which I currently have to. It's really nice from an end-user point to get as much information as I can from a single interface/dashboard.
In an ideal world I would also be able to send back more information, such as a link to the unit test or sonar results page that has the fail. This would then display as a message along with the failed build. If this were possible then the idea of an unstable status is probably not useful any longer.
Not an urgent thing, I understand that, but something useful none the less until Harbormaster has features capable of replacing Jenkins, which is a long way off currently.
If you want to show more information in Harbormaster, you can probably implement a custom build step that starts a build on Jenkins, polls for the logs, and outputs the Jenkins logs into Harbormaster logs. You can also output "link" artifacts when implementing custom build steps, which can link to whatever external URIs you want.
Thanks for the info, I'll take a look into doing that in my case. If I do
get around to that then of course I'll post the code up on github for
anyone else.
What does an "unstable" state mean? How is it different from "fail"?
Do specific tests get marked as "not really very important" and they only cause things to become "unstable" when they fail?
Should I just go read the Jenkins documentation?
Unstable in jenkins is where the build completes but a publisher reports
unstable. In the case of a default maven build with JUnits for example, the
default JUnit publisher reports unstable if any test fails. I believe that
other publishers can do more complex things like mark unstable until
certain thresholds are hit and can then report fail for example.
In practice, what does "unstable" mean for you, personally? (Or does it not have a specific meaning in your environment, and you're just trying to accommodate multiple projects which have some per-project meaning for the status?)
For example, at Facebook, we had some old tests which depended on making service calls, and these sort of had a flag on them which said "this test is known to be flaky so kind of just ignore it a bit". However, this grew out of the test suite being a huge unstable mess and we basically declared test bankruptcy, marked all tests "not that important", and then migrated mostly-reasonable tests to a new "actually important tests" location.
I'm also personally not a fan of supporting "unstable", unless it has some meaningful distinction from "fail" that has a reasonable use case behind it. In the Facebook case, arguably the use case was that you could declare test bankruptcy: keep a suite of bad tests running for a while to make it easier to migrate them forward, by letting you see their current/recent results and make a decision about whether to delete or fix a test. I think this is a rare/weak use case, though, and with a system like Harbormaster this could probably be accomplished with two separate builds (one for "bad tests", and one for "good tests").
the default JUnit publisher reports unstable if any test fails
This seems really weird to me -- when does JUnit report "fail"? Only if every test fails? Or does it never report "fail", and "unstable" is just a synonym for "fail"?
We've also talked in the past about letting builds fail without failing the buildable. For example, if quasi-build steps like internal documentation generation or symbol indexing fail, that might not be a deployment-blocking issue. This would probably result in some "soft pass" state (all the tests are good, but some noncritical auxiliary thing is having issues), which might be similar to "unstable".
Basically, I'd like to understand why something would report "unstable" instead of "fail", and how users would react differently to "unstable" and "fail".
From reading Jenkins, it sounds like "unstable" means "the build worked and produced a binary/executable/result, but that binary is known to be bad (due to test failures)". Is that roughly accurate? If you have processes like that, do you do things with "unstable" build output or otherwise treat these builds differently from an build failure?
This seems really weird to me -- when does JUnit report "fail"? Only if every test fails? Or does it never report "fail", and "unstable" is just a synonym for "fail"?
After reading the documentation this is less weird to me -- what I gather is that "fail" means "the build did not produce an executable/result", while "unstable" means "the build produced an executable/result, but the tests say it's busted".
"and you're just trying to accommodate multiple projects which have some per-project meaning for the status"
This is exactly the case, this isn't for my own projects, but for the accommodation of any number of other projects that will have multiple definitions and meanings.
To most jenkins users (as the majority I know use it with maven builds and JUnit) unstable is just a way of saying failed because tests didn't all pass. You've pretty much got that in your last comment though. I don't think that it's actually necessary to have an "unstable" status after previous comments though as it sounds possible to add additional things to the log for a buildable. Can I do this via conduit calls like I can set the build status or will I need to look at creating a build step?
It would however be really useful to be able to display a message of some type in the build buildable banner in either a dashboard or in the buildables list in harbormaster.
We're going to expand support for things like linking to an external build page, publishing top-level results, publishing results over Conduit in general, and having builds affect overall buildable status in ways other than "always fail it", we just don't have this stuff built out yet.
That sounds great, and I totally realise that Harbormaster is in it's
infancy still, I just thought it worthwhile registering my interest in
something that I'd like to see. Feel free to close this as it sounds your
existing plans will make this request obsolete.
Cool. I'm going to leave it open for now since I think the discussion is useful, and taking this stuff into account as we figure out some of the status features will be helpful. I imagine we might have other users with similar questions as this gets built out a little bit, too.
In Jenkins job configuration you can set thresholds that will then mark build as:
- success
- unstable
- failure
Then each build step can set it's own. I haven't tried it for unit tests, but for coding standard validation you can set this:
- stable = 0 warnings
- unstable = < 100 warnings
- failure = > 100 warnings
There's nothing specifically actionable here so I'm going to close this. We'll probably expand states in the future, but I think many/most/all of these cases only differ in what they're labeled, and not in the actions the system takes as a result.
For example, in most cases Harbormaster likely doesn't care if a build failed because the build tier was down vs the build failed vs tests failed vs the build plan was unresolvable vs credentials were wrong vs it got aborted. It probably does the same things next in all these cases (stopping the build process, warning users who try to land changes, alerting people, etc). When two statuses are identical in their effects and just have different names, I'd prefer to have one status (e.g., "Fail" in all these cases) and use other features (perhaps "summary logs" in T6139) to communicate details to humans who are interested in understanding the reason for the failure.
For setting a threshold range of "acceptable brokenness" (e.g., pass if < 100 unit test failures, or pass if <100 warnings or whatever), I think this threshold is probably often a bad idea. If you do want to implement it, you could just have your test harness tell Harbormaster that things passed if they produce 0-100 warnings, then provide additional information in the build log.
For unit tests, we do support "fail/unstable", except that our "fail" state is called "broken" (e.g., the software does not build, or the test harness does not run) and the "unit tests failed" is called "fail". I think this makes a little more sense than "fail/unstable", because "test failures" is idiomatic while "test instabilities" is not, although maybe this is more conventional in other build systems than I believe.
In all cases, I'd like to wait for use cases where Harbormaster should act differently based on the result because of some real-world need before introducing more statues. If we introduce statuses that are indistinguishable except for their names, we'll end up with a million statues that aren't meaningful or useful to very many users.
If many installs want different labels which don't actually behave differently, it may make sense to introduce sub-statuses, like Maniphest statues. Internally, Maniphest tasks essentially have two statuses (Open, Closed) but they have customizable labels for these statuses which are not distinguishable from one another. We could do something similar with builds, where "Fail" is what Harbormaster cares about but you can add a bunch of sub-labels to it to match the expectations of users familiar with an external build system.
We also need to have a way to aggregate target statuses into a build status, and aggregate build statuses into a buildable status. With "pass" and "fail" this is straightforward, but it seems less possible with arbitrary first-class statuses (if a build has one "Unstable" target and one "Violates Standards" target and one "Performance Concern" target, what is the status of the build)?