Page MenuHomePhabricator

Allow `arc diff` to run a build step like `gradle` first, then read lint and unit messages from the output
Open, Needs TriagePublic

Description

"arc unit" should be able to run Java tests.

I have a basic implementation of this for Maven-based Java projects that I can contribute as a starting point.

Event Timeline

sgrimm updated the task description. (Show Details)
sgrimm added a project: Arcanist.
sgrimm added a subscriber: sgrimm.

The general plan here is to switch unit test configuration to an .arcunit file by default (T5568). This technically works today but isn't quite ready for use yet.

(We switched lint to .arclint a while ago and this seems to have worked well in nearly all cases -- my sense is that the biggest issue for lint is generally now package and version management, per T5055, not actual configuration.]

Is your implementation running tests granularly (and possibly making some effort to run "related" tests when passed a list of changed paths), or just running everything (cf T9109)? I'd ideally like to get more engines running related tests since I think it's pretty useful, but I'm not sure how many engines and codebases can actually do this in a reasonable way (in some codebases, even human users are probably powerless to generate a reasonable "related test" list from a list of changed files).

Broadly, we're happy to upstream this stuff but it should likely wait until APIs modernize for T5568. The mechanical "run some command and parse the results" stuff should all be straightforward.

The general way forward on granularity/related-tests also remains a little unclear to me: I'm not sure if this is a core capability we should expect most test engines to support, or a more niche capability.

(I also wonder if the default arc unit <path> signature would be better as arc unit <test name> or similar for some test engines.)

I've attached

just so we have something concrete to talk about. Right now this is sitting in our source repository and we load and run it via .arcconfig entries as you'd expect. It is much closer to a "run everything" implementation than a highly selective one, though it is able to deal with repositories that have multiple Java projects (where a "project" is indicated by the presence of a pom.xml file) and will only run tests in projects that have changes. So for source bases that have code for multiple server components, it'll be at least somewhat selective. And if the pending change list has no changes in any Java projects, it won't run anything. (Obviously this has the potential to be wrong if, e.g., a change modifies data files that affect a test, but it's likely to be right in cases like a repository that has both front-end and back-end code.)

In the future I would love to integrate coverage information but it's not clear what the best way to do that is, given that there's no widely-used command-line tool for it.

The test runner I'm using here does support running individual lists of tests. Your last comment is correct: it is based on test class names rather than paths, though in Java there's almost always a 1-to-1 correspondence between class names and filenames so that should be a pretty easy gap to bridge. It also supports globbing of test names, if that helps. If this code seems reasonable enough to accept, I can make it pay attention to additional command line arguments.

This also has another limitation: it works with just one of the popular Java build tools. It's a very popular one but there are plenty of people using others too. I think many of those other tools should be able to do almost exactly what I'm doing here, just with a different command line syntax.

One other aspect of this that is maybe worth discussing since you mentioned architectural changes: in Java land we get lint output from the compiler rather than from a separate lint tool. We also compile the code implicitly when we run tests. I haven't implemented a Java linter yet but it would sure be convenient to not have to compile twice to get both lint and test results, since the from-scratch compilation I need to do to get full lint results can be a little time-consuming. I wonder if it makes sense to, instead of having a "run the linter, then run the tester, then run the stager" process, instead have "run the list of tools for operation X" and any tool can report multiple kinds of results. Then I would have my test module report both lint and test results. I suspect this pattern is going to be at least somewhat useful for compiled languages in general: although there is a C lint tool, you are probably running "gcc -Wall" to build tests anyway.

Can you compile the code explicitly, without running tests?

The big reason the phases are separated is that there's an interactive step between them where we ask the user if they want to continue (and possibly if they want to apply patches) which may invalidate test results or abort the workflow.

  • In the simple case, the compiler or linter points at errors, we prompt the user, and the user decides they're real problems. They say "oh, I'll fix those" and save some time since they don't have to wait for tests to run before they get results.
  • In the more complex case, the compiler or linter points at errors and provides patches. We prompt the user, the user chooses to apply the patches, and this changes the codebase and invalidates any build or test work done up to this point. In this case we'd have to re-run the tests anyway to get a clean result.

We could do this all in one shot, but it will do unnecessary work and slow the process down in a bunch of cases.

I'm not at all familiar with the modern state of Java tools, but I'd expect compiled language lint/unit processes to generally look like this:

  • Lint runs make or whatever, inspects the output, and returns lint messages.
  • Unit runs make or whatever, expects it to exit immediately if there are no changes, and ignores the output except for the error code. It throws if the build fails. Otherwise, it runs unit tests and returns test results.

In your environment, does the equivalent of make not exist, and/or does it not have "exit immediately if there are no changes" behavior? Or is this generally a reasonable sort of pattern to follow?

Also, just to check that my thinking is right since I'm not in touch with the Java world, we're really talking about Maven here, not Java in general, right? Maven is just a build tool for Java, not part of the core language? Is it so ubiquitous that it's effectively universal, or is it just the most common/popular tool? How are Maven, Gradle, and Ant related? Is Maven also used for other languages (Scala, Groovy)?

That pattern is definitely possible, but the "make or whatever" step can take a noticeable amount of time on a complex project even when there's nothing to build, so you're kind of faced with choosing between "minimize time to lint errors" and "minimize total time for a successful diff submission."

This is dependent on build tools and there are some tools (like Facebook's "Buck") that are focused on making builds really fast, and of course it's also dependent on project structure. The main project I'm working on is a much worse case than average since we do database-driven code generation and thus figuring out if there's anything to build is not a simple matter of comparing filesystem timestamps; most Java projects won't have anything like that but there may still be a noticeable delay. (In day-to-day development we use an IDE that incrementally compiles on the fly and runs tests for us, so the speed of the command-line build tool is something we mostly encounter in the context of arc diff.)

One option that would work is to pass the "treat lint warnings as errors" option to the compiler unless the user has indicated that they want to ignore lint. Then you'd only have the overhead of multiple checks for code changes in cases where there were lint warnings, which seems fine since you'd be stopping for user interaction anyway.

The test runner I wrote is Maven-specific and would work for any language that outputs its test results in XUnit format in the default location for Maven test runs. Maven can compile Groovy and Scala code but I don't have a good sense for how common it is in those languages.

Gradle is another build tool that does the same thing as Maven but is configured using Groovy scripts instead of XML files. Ant is an older build tool that's more analogous to make in that it has none of Maven/Gradle's dependency management or lifecycle management features. The code I have here would probably be very easy to adapt to Gradle – possibly even as simple as replacing the "mvn" command with "gradle" and checking for Gradle build files instead of Maven ones – but Ant would be trickier since, unlike Maven, there is no well-accepted convention for scripting test runs using Ant.

By "One option that would work" I meant, "One option that would work in a world where a single command can be both a linter and a tester," in case that wasn't clear.

/me is from a java shop and jumps in.

There are many different "java" build tools including ant, maven, gradle, buck, pants, bazel/blaze . Most of those tools are "java" centric but may take a stab at other JVM languages. For example maven can build Scala projects, but I think most Scala developers use their Special Scale Build tool.

Maven is a not part of the core langauge but I think it's uncontroversial to say it's the most popular for open source projects. "Maven" is both a build tool and a repository/package format and the maven repository is very close to ubiquitous. Most other java build tools use "maven" repositories. For example a project using ant might have a 'go-fetch-all-of-my-dependencies-from-a-maven-repo' step.

Maven the program has a large plugin ecosystem. So you can do things live mvn compile but also mvn run-an-entire-jenkins-instance-so-I-can-test-my-plugin or mvn build-and-deploy-my-website. You can also customize the stages so that if your static-analysis plugin detects an error it can abort the build. It's common in java-land to integrate these sorts of tools with maven or IDEs instead of calling them directly.

For "linting" there are a few things that go on:

  • There are static analysis tools like findbugs. They look at your code and tell you that you never released a lock.
  • There are style tools like checkstyle that do 'your indentation sucks'.
  • IDE's and other projects have custom compilers that try to catch more than vanilla javac. This may overlap with static analysis. To make it more confusing static analysis tools might run on jvm bytecode (aka after compilation).
  • Maven might integrate with all of those either to strictly abort the build or dump out a report later.

Maven is perceived at being sucky at incremental compilation & testing which is part of why java developers lean on their IDEs and why there is a new build tool each week. It would (probably?) not be terribly hard to only run certain tests based on file name, but I'm not familiar with your exactly that works with arc and other systems today.

We could fix the "maven-make can't detect a no-op quickly" issue in arc itself by letting your test engine say "run maven-make unless some other part of the current arc workflow already ran it and hasn't made changes to the working copy in the meantime", since we're fine with assuming that the rest of the world is frozen while you're running arc.

If we did that, does mvn test have some sort of mode where it reuses the earlier build, or does mvn test always imply maven-make, which is potentially slow even if it's a no-op?

That is, would this roughly work, in theory?

  • Some shared piece of code knows how to run maven-make and return raw results (e.g., return code and unparsed raw output or whatever).
  • Lint runs maven-make if it hasn't already been run, then parses the output.
  • Unit runs maven-make if it hasn't already been run, then runs maven-unit and parses those results.
  • When arc invalidates test results (e.g., by applying patches to the working copy) it also invalidates the stuff-that-has-been-run-already cache.

This seems straightforward if it's possible, but I'm not sure if a maven-unit command which cheaply reuses the results of maven-make exists or not.

Even if no such command exists, this capability would still let you get the result you want, with this less-graceful approach:

  • Some shared piece of code knows how to run maven-everything and return raw results.
  • Both lint and unit run maven-everything if it hasn't already been run, then look at only the part they care about in the raw results.

But if this command isn't decomposable, maybe no build systems exist with slow no-ops but decomposable commands and we'd be better off just formalizing "do everything at once in a big ball" build steps.

AFAIK, the common tool in building Scala is sbt, which is more-or-less the same as Maven or Gradle, but it's in Scala.

Also, from my experience in the Java world, a run of mvn/gradle build/sbt build of 5 minutes isn't uncommon, even without running tests, especially if the IDE isn't fully integrated into it (Maven is just unaware of anything the IDE already did, so it's practically a clean build).

That's why, when working in Java-land, I've focused on getting the offloading the lint/tests to the CI system instead of trying to run locally.

Yeah, maybe a good general question is: does mvn-unit <one trivial test> run in under, say, 60 seconds in any realistic environment? If the answer is "no", maybe arc diff isn't a good venue for these tests.

Differential will likely support a "--try" mode soon which puts revisions in a new "draft" state (no notification to reviewers) until CI comes back green, then (automatically or manually) submits the code for review. We'll hammer out the details once it actually gets built, but maybe this is conceptually a better fit?

By default, and the default is used in basically every project I've seen, the "run tests" command in Maven will do a build first. An analogous Makefile would be something like

compile: $(OBJECT_FILES)
   ...run the compiler...
test: compile
   ...run the tests...

where running make test will not just blindly run the tests using whatever binary happens to be sitting there already.

That said, it is possible to configure Maven to run tests without doing a build, and if we have a separate notion of "run make" in arc as proposed, then I'll modify our Maven configuration accordingly. The subtlety there is that you'd still want the default behavior in contexts other than arc since tools generally assume they can run mvn test on a clean source tree and get test results.

There are plenty of projects where running one trivial test, including the compilation, would run in well under 60 seconds. The "run tests on CI" option would probably be attractive for many larger projects. For my project I'd be on the fence about that, since for the common case of a small change to a single module where all the tests pass it potentially introduces a much longer delay. Right now arc diff including running the tests (but not linting) takes between 10 and ~90 seconds to run depending on which part of the code base we're working on.

Hmm, okay. In the short term, I think we can bring some equivalent of MavenTestEngine upstream after T5568, more or less as written.

In the longer term, maybe we can find some other use cases to motivate having a smarter build pipeline and installs can upgrade from "mvn test" to "mvn test-without-build" after adjusting their projects to provide this target. Having .arcunit will make this easier since the engine can now have configuration options (e.g., what command to run to execute tests, defaulting to "test"). Then things could work by default but there could be a pathway to make them work faster if you do a little legwork to decompose the build a bit.

If we do start down the path of having a smarter build pipeline we probably end up with something that looks like an entire build tool, so I don't want to commit to that lightly. But this may start making more sense at some point, especially once we get to a point where having Harbormaster run other arc commands (like arc unit) on a build server becomes reasonable (T9123). Provided that's ever straightforward, there may be a reasonable argument for giving arc more awareness of the build process.

(We can probably make arc unit <testname> work without any real issues separately from all this, although I don't know how valuable that really is if mvn test <testname> does the same thing.)

epriestley renamed this task from Add unit test engine for Java tests to Allow `arc diff` to run a build step like `gradle` first, then read lint and unit messages from the output.Feb 11 2016, 6:37 PM