Page MenuHomePhabricator

Make Harbormaster build variables extensible
Open, Needs TriagePublic

Description

In some cases, gluing Harbormaster to external build systems is easier if some of the glue logic occurs in Phabricator. We can modularize build variables and make the system extensible to support this.


Original Description

Our short names are script-referenceable names that we can use to talk to CI. It would be very useful to support this as a variable in harbormaster similar to repository.callsign to avoid having to replicate the same rules per repo.

Event Timeline

eadler added a project: Restricted Project.Mar 5 2016, 7:30 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Mar 5 2016, 11:26 PM

If diffusion.search existed, you could call it with repository.phid to pull the short name, callsign, ID, etc. Would this be sufficient?

I'm hesitant to push a huge bundle of repository information to the build server, since this makes the API complicated, and build servers need to be able to call harbormaster.sendmessage to report results anyway, so it shouldn't let you skip anything.

The API endpoint for "make an http request" is ci.example.com/job/{repository_short_name}/build. We don't need to push much more than that but in order to construct the HTTP request we need the short name.

You control that API though, right? And could conceivably expose /harbormaster/job/{repository-phid}/ or whatever?

I'm wary about using "there exists a third-party API which would be more convenient to call if X was available" as justification for X, particularly when that API is a private, proprietary one which exists on only one install.

(Relying on this as a linkage seems vaguely tenuous, too -- e.g., you may have repositories with short names which are not valid in Phabricator.)

To the degree that the root problem here is "our proprietary API would take less work to invoke if the short name was exposed as a build variable", I'd rather look into, e.g., making build variables more extensible in service of a more general version of that problem, like "proprietary APIs with varied calling conventions are common in the wild, and it's often easier to glue the link together in Phabricator than in the remote build system".

To be clear: this API is jenkins. It isn't unique to us or proprietary. The repo short name is exactly equal to what Jenkin's call the "job name"
The most unique part perhaps is using the short name rather than the callsign. That said, we want to move away from callsigns and Jenkins predates Phabricator for us.

Why is the approach used by Uber's Jenkins/Phabricator Plugin (which does not require repository.name) unsuitable in your situation?

There is a different job per repo. What might work for us is a special phabricator job that uses diffusion.search to figure out the short name and then set off the correct job. I suspect this is more convoluted than ideal but likely works.

Yeah, my expectation is that whatever's on the Jenkins side needs to know about Phabricator, because it needs to call harbormaster.sendmessage to report results when it's done. So it seems like the job has to be some-harbormaster-aware-build, and like it should be able to look up the repository name without undue work, as it already needs to know how to call Conduit.

I presume it's straightforward to make some-harbormaster-aware-build trigger the real build and read the results after looking up relevant information, but maybe that's not true.

If you didn't care about results and purely wanted to trigger builds, you could conceivably use stock Jenkins with some theoretical repository.name, but I think you can't get any further than that, and that seems too limited to be a good first step that you start with and then later refine, since there's no reporting back, not merely limited reporting. If we could get back a pass/fail I'd think this was a better starting point, but firing a build into the void seems questionably useful even as a rough cut.

If we had a "Call Jenkins" build step in the future which POST'd to some standardized Jenkins API and then polled Jenkins to fetch results, it would make sense to make that easy to set up, but I think you pretty much need glue on the Jenkins side today so it seems like it should be easier to put all the glue there? But maybe Jenkins works in a way such that this isn't true? I don't have very much experience with it.

The remote is aware of phabricator. We just don't have a hardcoded job
name like uber does.
It isn't hard for us to add indirection but IMHO this adds needless
complexity.

The API endpoint for "make an http request" is ci.example.com/job/{repository_short_name}/build

To be clear: this API is jenkins. It isn't unique to us or proprietary. The repo short name is exactly equal to what Jenkin's call the "job name"

Job-name == repo-name and 1 job per repo is a specific convention though.

Most of the discussion here is about conduit, but I think the original ask was just about "Make HTTP Request" to be able to template the short name? (In other words this is about "how to call the right job" not "how to make the job smart enough to call back".)

how to make the job smart enough to call back

My contention is that if the job isn't smart enough to call back, it isn't useful. As far as I know, the job has to be smart enough to call back to report results. If it isn't, it just fires off into the void and Harbormaster can never tell you if it passed or not.

If it's already that smart, it seems like it should be a small step to make it slightly smarter.

I don't want to add upstream build variables in every case where they'd make things more convenient for someone, since that could get out of hand quickly.

For example, T7516 asks for clone URIs to be available in the API. I don't know if the underlying use case there is build-realted or not, but it seems possible that it is and seems silly to populate repository.uri.clone.git.read-write, repository.uri.clone.git.read, repository.uri.staging.git.read, etc., particularly after T10366, even though these could conceivably make gluing Harbormaster to a build system easier. Likewise, T5508 asks for a branch variable, but builds aren't branch-based and that's an arbitrarily long list with zero or more elements that isn't necessarily meaningful for some buildables.

This list is already pretty silly, and I'd really like to reduce it to repository.phid and let the other side use Conduit to pull the nearest-regional read-only clone URI or whatever it needs:

Screen Shot 2016-03-07 at 10.29.47 AM.png (180×660 px, 47 KB)

A bunch of this won't work or won't make sense after T10366. (Also, step.timestamp is completely ridiculous?)

I'm very open to making build variables extensible, but I think the bar for upstream defaults needs to get stricter and become "this is critical information for many build processes", not "this helps to simplify at least one build process on one install somewhere in the world".

Having a layer of indirection to call off another jenkins job is a much larger step. Not unreasonable but a level of complexity which is more than just "make one more call to conduit".

(In other words this is about "how to call the right job" not "how to make the job smart enough to call back".)

This is exactly correct.

Maybe this is a better question: if someone else files a similar task saying that the repository projects or date created or full list of tags or default branch would help them glue to Jenkins/Travis/Hudson, when should I say "yes" or "no"?

Do you think I should just always say "yes", and accept that we may have hundreds of build variables in a couple of years?

If I should sometimes say "no", what makes this particular build variable special?

Maybe this is a better question: if someone else files a similar task saying that the repository projects or date created or full list of tags or default branch would help them glue to Jenkins/Travis/Hudson, when should I say "yes" or "no"?

I think this question should be answerable the same way as

Is this build variable reasonable to require in the build/test API?

I can't imagine any API that would want the repo projects, date created, or tags in the API. The default branch is questionable (I can't imagine such an API where the branch would be required in the HTTP request but possibly could be convinced).

Another way to reword the question:

Is it reasonable for this value to be part of the job name?

In this wording things such as callsign or shortname are valuable but date created and projects are not.

Is it reasonable for this value to be part of the job name?

I don't think any mutable, optional, multi-purpose, human-facing identifier is a reasonable way to interface with a build system API (previously, callsign was immutable and required, but is now mutable and optional, which is why I want to remove it).

Builds won't work if someone forgets to set this optional value. Builds will break if someone changes it because they want to change the display name. It's unique-per-repository, so two repositories can never run the same build, even though that might be desirable. The character set is restricted, so it may not work for some range of values in the other system.

Anyway, I think we're attacking this problem from two very different points of view, and I'm not having much luck convincing you of mine on the merits.

As the upstream, I'm willing to make this extensible. I'm not willing to support repository.short-name or similar as it seems very much like it benefits only one install using one particular approach to gluing into one particular build system, and plan to remove repository.callsign, repository.vcs, step.timestamp, etc. When this is extensible, you can choose whether maintaining Jenkins glue or a Phabricator extension is preferable. The extension would look something like this:

final class JenkinsJobNameBuildVariableExtension
  extends HarbormasterBuildVariableExtension {
  
  public function getVariableName() {
    return 'jenkins.jobname';
  }

  public function hasValue($object) {
    $value = $this->getVariableValue($object);
    return strlen($value);
  }

  public function getVariableValue($object);
    $repository = null;
    
    if ($object instanceof DifferentialRevision) {
      $repository = $object->getRepository();
    }

    if ($object instanceof DifferentialDiff) {
      $repository = $object->getRepository();
    }

    if ($object instanceof PhabricatorRepositoryCommit) {
      $repository = $object->getRepository();
    }
    
    if ($repository) {
      return $repository->getRepositorySlug();
    }

    return null;
  }
}
epriestley renamed this task from Permit access to repository "Short Name" from harbormaster. to Make Harbormaster build variables extensible.Mar 7 2016, 7:08 PM
epriestley updated the task description. (Show Details)

I'm very open to making build variables extensible, but I think the bar for upstream defaults needs to get stricter and become "this is critical information for many build processes", not "this helps to simplify at least one build process on one install somewhere in the world".

My view an an early adapter is that a composable HarbormasterBuildVariableExtension is the best long term solution, but you typed that out as code faster than I could do prose. There is way too much variance in build naming to cover anything. That at least feels cleaner than custom herald rules, or the OneTrueMegaJenkinsBuildStep.

I'm a little worried that doesn't make for a great out of the box or support experience since almost anything interesting will require extensions, but that is basically the case anyway today.

I'm a little worried that doesn't make for a great out of the box or support experience since almost anything interesting will require extensions, but that is basically the case anyway today.

We can (in theory) tackle this by building build steps that have the right logic (e.g., the "Build with CircleCI" step in D14286 has a bunch of variables internally, they're just local to the step). I think this is a more promising approach than exposing every variable CircleCI needs as a build variable and then trying to make it work with the generic "Make HTTP Request" step.

I don't think there's really a way forward along this path for Jenkins, though, since it seems like the way builds are triggered isn't very standardized from install to install (e.g., you have package-based builds IIRC, eadler has repository-based builds, Uber has something-else-based builds).

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Apr 7 2016, 6:05 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Apr 7 2016, 6:08 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:19 PM