Page MenuHomePhabricator

harbormaster.sendmessage does not update build status for the same build target.
Closed, WontfixPublic

Description

If harbormaster.sendmessage method is called multiple times for same build target only the first message is consumed. In other words, if I'd send "fail" and then "pass" to the same build target, build would be marked as failing still.

For example, lets have some build target PHID-HMBT-dwor4e56q7f2mzrn3xrr:

echo '{ "buildTargetPHID": "PHID-HMBT-dwor4e56q7f2mzrn3xrr", "type": "fail" }' | arc call-conduit --conduit-uri http://phabricator.local/ harbormaster.sendmessage
echo '{ "buildTargetPHID": "PHID-HMBT-dwor4e56q7f2mzrn3xrr", "type": "pass" }' | arc call-conduit --conduit-uri http://phabricator.local/ harbormaster.sendmessage

First message will be consumed and build will be marked as "failing". And no matter if "pass" will be sent after some time.

mysql> SELECT * FROM harbormaster_buildmessage WHERE buildTargetPHID = 'PHID-HMBT-dwor4e56q7f2mzrn3xrr';
+----+--------------------------------+--------------------------------+------+------------+-------------+--------------+
| id | authorPHID                     | buildTargetPHID                | type | isConsumed | dateCreated | dateModified |
+----+--------------------------------+--------------------------------+------+------------+-------------+--------------+
| 10 | PHID-USER-7mufbthf75bu64nprxfd | PHID-HMBT-dwor4e56q7f2mzrn3xrr | fail |          1 |  1441973511 |   1441973511 |
| 11 | PHID-USER-7mufbthf75bu64nprxfd | PHID-HMBT-dwor4e56q7f2mzrn3xrr | pass |          0 |  1441973527 |   1441973527 |
+----+--------------------------------+--------------------------------+------+------------+-------------+--------------+

It is possible go to buildable page, restart failed build. Then new build target will be created for the diff and diff will be marked as passed (if build will pass of course).

But if external CI platform is used (in my case it is Jenkins), sometimes builds are failing because of environment reasons. harbormaster.sendmessage method is called with "fail" value. In this case it is simpler just restart build on Jenkins side with the same parameters (and same Harbormaster Build Target). Let's assume, build is passing after restart. harbormaster.sendmessage with same HMBT and "pass" is sent again, record is stored in harbormaster_buildmessage but not consumed.

I'm wondering if it is expected behavior that build status can not be updated once it is consumed?

Event Timeline

Pawka assigned this task to epriestley.
Pawka updated the task description. (Show Details)
Pawka added a project: Conduit.
Pawka added a project: Harbormaster.
Pawka added subscribers: Pawka, joshuaspence.

Is Jenkins restarting the build automatically, or is a human going into Jenkins and clicking "restart"?

Human.

Usually we are restarting builds on Jenkins if they failed due some environment issue. For example connection problems or git misconfiguration. If other words if build failed not because of code changes in diff, we restart build on Jenkins.

But Jenkins is only real world use case example.

Broadly, this behavior is expected.

Harbormaster cares about knowing that a build is "complete" because it wants to clean up state associated with the build once things are done. You might imagine a hypothetical build like this:

  1. Bring up a machine to run tests on.
  2. Build binary.
  3. Run tests.
  4. Publish good binary somewhere.

If we treat "fail" in step (3) as "temporary failure which may later become a pass without any notice", that means we need to keep the machine, working copy, build, etc., around indefinitely because we might need them later to complete step (4) after we learn that the "fail" was really a "pass". This particular scenario can be approached with more nuance, but in the general case Harbormaster can't tear down a build until it knows things are done.

This isn't hugely relevant today because Harbormaster never controls meaningful resources normally right now, but is likely to become relevant in some cases within weeks (T9123 + T9252).

Even if we did overwrite states, restarting the build would potentially produce double test results, double lint, double logs, etc. It seems desirable to separate these in Harbormaster.

Also, are "environmental reasons" things that Jenkins can meaningfully distinguish when reporting? That is, could Jenkins report one result for "human intervention required", and a different result for "test failure"? Or are these not distinguishable, and the environmental issues cause test failures and Jenkins can't tell them apart?

I could imagine several approaches:

  • Tell humans to restart builds from Harbormaster instead of Jenkins.
  • Have Jenkins tell users to restart builds from Harbormaster instead of Jenkins.
  • Add a Harbormaster-side API for restarting builds, and have Jenkins call that instead of doing an internal restart.
  • Add a Harbormaster-side API for restarting builds, and have Jenkins call that and pass some kind of parameter? So when Harbormaster calls Jenkins again it just restarts the same build but with a different target?
  • If these failures are distinguishable, have Jenkins send a "temporary failure, need human help" result instead of a "fail" result.
  • Add some kind of "this build has no internal state and merely reflects the state in some other system" flag to builds which changes a bunch of resource management and garbage collection behavior.

Some of these might be reasonable if they're easy in Jenkins, maybe.

Thanks for info!