Page MenuHomePhabricator

Harbormaster failing to trigger Jenkins build through HTTP GET and producing abnormal HTTP error code
Open, Needs TriagePublic

Assigned To
None
Authored By
swisspol
Mar 3 2016, 5:15 PM
Referenced Files
F1139874: pasted_file
Mar 3 2016, 5:16 PM
F1139871: pasted_file
Mar 3 2016, 5:15 PM
F1139865: pasted_file
Mar 3 2016, 5:15 PM

Description

I've observed this puzzling behavior a few times now:

  • A new commit is pushed
  • Herald triggers Harbormaster build plans correctly
  • They start the builds on various Jenkins servers successfully except for 1 build plan:
    • The build starts correctly in Jenkins but Harbormaster thinks it failed starting or something

pasted_file (619×1 px, 157 KB)

The log show a completely strange HTTP error code 28?

Restarting the build fixes the issue every time:

pasted_file (1×2 px, 258 KB)

PS: This is the hyper instance on Phacility.

Event Timeline

OK I just noticed this build plan as 2 duplicate steps (with the same 1.1 identifier)? That certainly shouldn't be the case. Did something get corrupted?

pasted_file (409×446 px, 35 KB)

They both link to the same page: https://hyper.phacility.com/harbormaster/step/view/2/

Actually all build plans show these duplicate steps. Is this an (unrelated) bug?

BTW I realized I forgot to specify in the description of the task that what I'd like to know is why do the logs look like this and what is error 28? Does this come from Phabricator? Is there a lower-level log one could look at?

This may be cURL 28, CURLE_OPERATION_TIMEDOUT. We have a 60-second timeout on this build step. Is it possible your server requires more than 60s to acknowledge the initial request?

Also known as the far more hilarious CURLE_OPERATION_TIMEOUTED.

It should not require more than 60 seconds, but rather less than 1 second. Only the Jenkins instance running inside a Windows VM has this intermittent bug though. So it still sounds like a reasonable explanation: maybe Jenkins in Windows gets suspended if not used for some time or the VM itself is paged out on the host HW and takes forever to become responsive, who knows...

Feel free to close this task or repurpose it into having better logged error messages in such cases, which I wouldn't mind 😉

Since that sounds possibly plausible, I'll confirm that timeouts end up looking like this in the UI, and repurpose this for "make timeouts readable to humans" if they do. I can dig into it some more if that turns out to be a dead end.

I've rebooted the Windows VM and haven't seen this issue in a few days. So we're good. Thanks for the help.

I saw however another strange HTTP error message a couple times: 7 i.e. CURLE_COULDNT_CONNECT since now I know where to look.

This task should likely be repurposed to make log CURL error messages better (there has to be an API to convert them to a string) and not appear as "fake" HTTP errors. I can create a new task if you prefer.

eadler added a project: Restricted Project.Aug 5 2016, 4:45 PM