Page MenuHomePhabricator

Add Java linters, checkstyle and PMD
AbandonedPublic

Authored by cspeckmim on Dec 2 2015, 5:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 18, 8:53 PM
Unknown Object (File)
Mon, Nov 18, 2:05 AM
Unknown Object (File)
Mon, Nov 11, 12:03 AM
Unknown Object (File)
Sun, Nov 10, 3:09 PM
Unknown Object (File)
Fri, Nov 1, 1:05 PM
Unknown Object (File)
Fri, Oct 25, 1:52 AM
Unknown Object (File)
Oct 21 2024, 5:02 AM
Unknown Object (File)
Oct 19 2024, 10:31 PM
Tokens
"Like" token, awarded by rabahmeradi."Like" token, awarded by siepkes."Love" token, awarded by vlada.

Details

Reviewers
None
Group Reviewers
Blessed Reviewers
Maniphest Tasks
T9886: Add Java linters, checkstyle and PMD
Summary

Adds a basic linter for executing checkstyle and PMD, parsing the output into lint messages. These linters were based around the existing ArcanistCppcheckLinter.php.

Closes T9886

Test Plan

Checkstyle

  1. I configured checkstyle linter on a project with the following configuration:
"checkstyle": {
  "type": "checkstyle",
  "include": "(\\.java$)",
  "bin": "./misc-utils/lib/jars/checkstyle-6.13-all.jar",
  "flags": [
    "-c",
    "/Users/cspeckrun/Desktop/google_checks.xml"
  ],
  "checkstyle.simplify-source-classname": true
}
  1. I ran arc linters and verified that "checkstyle" was shown as "CONFIGURED"
  2. I ran arc linters checkstyle and verified that it displayed appropriate configurations for checkstyle as well as the version being 6.13.
  3. I ran arc lint MyBadJavaClass.java on a java file which contained a measurable number of lint errors.
  4. I verified that the reported lint errors corresponded to my expected lint errors from previously running checkstyle manually on the same file.
  5. I ran arc lint MyGoodJavaClass.java on a java file which contained no lint errors.
  6. I verified that there were no reported lint errors.
  7. I updated the .arclint file to remove the simplify-source-classname field, then re-ran steps 2-4 and additionally verified that the reported source name was no longer simplified.

PMD

  1. I configured PMD linter on a project with the following configuration:
"pmd": {
  "type": "pmd",
  "include": "(\\.java$)",
  "bin": "./misc-utils/lib/pmd-5.4.1/lib/pmd-core-5.4.1.jar",
  "flags": [
    "-rulesets",
    "rulesets/java/unusedcode.xml",
    "-dir"
  ]
}
  1. I ran arc linters and verified that "pmd" was shown as "CONFIGURED".
  2. I ran arc linters pmd and verified that it displayed appropriate configurations for pmd as well as the version being 5.4.1.
  3. I ran arc lint MyBadClass.java on a java file which contained a measureable number of lint errors.
  4. I verified that the reported lint errors corresponded to my expected lint errors from previously running pmd manually on the same file.
  5. I ran arc lint MyGoodJavaClass.java on a java file which contained no lint errors.
  6. I verified that there were no reported lint errors.

CPD

  1. I configured CPD linter on a project with the following configuration:
"pmd": {
  "pmd.command": "cpd",
  "type": "pmd",
  "include": "(\\.java$)",
  "bin": "./misc-utils/lib/pmd-5.4.1/lib/pmd-core-5.4.1.jar",
  "flags": [
    "--minimum-tokens",
    "5",
    "--files"
  ]
}
  1. I ran arc linters and verified that "pmd" was shown as "CONFIGURED".
  2. I ran arc lint MyBadClass.java on a java file which contained a measureable number of lint errors.
  3. I verified that the reported lint errors corresponded to my expected lint errors from previously running pmd manually on the same file.
  4. I ran arc lint MyGoodJavaClass.java on a java file which contained no lint errors.
  5. I verified that there were no reported lint errors.

Diff Detail

Repository
rARC Arcanist
Branch
arcpatch-D14632
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 9253
Build 10966: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@cburroughs: I read through some of the discussion on T9223 and I'm not entirely sure I follow what's desired - whether maven would be invoking arc, vice-versa, and when, etc. I'm also not familiar with arc unit, as we have a jenkins cluster running our unit tests without any integration to phabricator. I'm guessing when that task gets figured out the XML parsing code here can just be reused as needed by maven.

Sounds good to me. Thanks for looking.

Thought a bit more about the maven side - are you imagining some thing where running arc lint runs a maven task? arc lint โ†’ mvn basic-linting โ†’ report back to arc

In that case you would need a new ArcanistExternalLinter which would invoke maven, using the "flags" field to indicate which maven task to run. There's difficulty however in knowing how to parse the maven output -- since different tasks would have different outputs. Maybe something like:

"my-maven-lint": {
  "type": "maven",
  "include": "(\\.java$)",
  "bin": "maven",
  "flags": [ "maven-lint-task" ],
  "maven.task_type": "checkstyle"
}

Where the maven.task_type field references another ArcanistExternalLinter which is used to parse the output of the maven task.

Something similar could be done for gradle. We use gradle, but only for creating builds and not for linting or unit tests.

  • Updated ArcanistLinterTestCase to allow for specifying a filename to use in place of the testfile.lint-test used as input to the linter during test.
    • This is (unfortunately) needed as checkstyle will not report anything unless the file being linted has the .java extension.
  • Added test case for checkstyle
cspeckmim retitled this revision from [WIP] Adding checkstyle linter to Add checkstyle linter.Dec 4 2015, 7:09 AM
cspeckmim updated this object.
cspeckmim edited the test plan for this revision. (Show Details)

These changes are functionally complete and I have verified them according to the test plan - I removed the [WIP] flag from the title and I consider this now ready for review.

Going to add PMD as another java linter as well - which provides more typical linting capabilities compared to code-style of checkstyle.

My last commit was diff'd and reverted the initial checkstyle linter..

I wish I had more experience with git.

Need to add UT's for PMD

  • Adding CPD support to the PMD linter
  • Fixed lint issue with UT code

OK This is ready to go. I added PMD support as well, allowing for it's sub-command 'CPD' for copy-paste detection.

cspeckmim edited the test plan for this revision. (Show Details)

I'm finding some issues during verification

Fixed CPD to not throw an error when there's no copy/paste detected. Also adjusted the message to not spit out literal '\n'

cspeckmim retitled this revision from Add checkstyle linter to Add Java linters, checkstyle and PMD.Dec 8 2015, 7:09 AM
cspeckmim updated this object.
cspeckmim edited the test plan for this revision. (Show Details)

I apologize for the noise on this one. I've run through all the test cases I wrote up in the Test Plan field and everything appears to be working as expected.

src/lint/linter/ArcanistPMDLinter.php
18 โ†—(On Diff #35562)

Getting the command-line execution correct for PMD was a bit of a chore -- ArcanistExternalLinters checks against interpreter/binary made this a tad difficult to get right, but essentially the command looks like:

  1. java <- interpreter
  2. -Xmx1024m <- optional, interpreter.flags user provided
  3. -cp <- mandatory interpreter flags, causes the binary to be interpreted as a classpath, but in the case of PMD, the single jar is not sufficient (PMD consists of ~15 jars which need to be in the classpath) so we must override it later.
  4. pmd-core.jar <- binary
  5. -cp <- manadatory binary flags, this overrides the previous -cp argument
  6. {classpath} <- constructed listing of all jars in the same directory as binary (pmd-core.jar)
  7. {classname} <- the main entry-point, this is where we specify pmd vs. cpd
  8. -failOnViolation false <- by default pmd/cpd will return exit code of 4 if they find linting issues (this is also unfortunate... but at least they provide an option to change this)
  9. -format xml/--format xml <- specify the report to be XML
  10. -rulesets rulesets/java/naming.xml/--minimum-tokens 100 <- user-specified binary flags for configuring pmd/cpd
  11. -dir/--files <- user-specified binary flags. These specific ones are mandatory, however they must be last as the files-to-be-processed are passed into these arguments instead of taking as regular application arguments (another unfortunate).
104โ€“107 โ†—(On Diff #35562)

This was unfortunate -- the pmd command vs. the cpd command have different arguments to do the same thing, -format vs. --format both set output report format and both accept a single parameter (such as xml). Also -dir vs. --files both of which specify either directories/files to analyze (according to help either command will operate on both directories and files).

150 โ†—(On Diff #35562)

I'm not very familiar with this @$dom syntax in PHP so I confess I'm not sure I could explain why this is here (or probably needed). This was how the ArcanistCppcheckLinter did XML parsing -- I searched and found several XML parsing libraries seem to be available in PHP but this API is most familiar to me. Also PMD and checkstyle both seem to be using stringbuilding to make the XML instead of an XML API~

192โ€“197 โ†—(On Diff #35562)

I wasn't sure of what would be an appropriate way to handle this. The XML API is a little painful in this manner -- in java-land I was thinking

if ($violation instanceof DOMElementNode) {
  $first_child = ((DOMElementNode) $violation)->firstChild;
  if ($first_child instanceof DOMTextNode {
    $description = ((DOMTextNode) $first_child)->wholeText;
  }
}

Maybe doing a check against nodeType would be preferred?

223 โ†—(On Diff #35562)

The severity I map here is a little arbitrary. Is there a way I should have done this to expose the user's severity mapping?

256 โ†—(On Diff #35562)

I'm open for ideas on how best to report duplicated code like this. I was thinking something like:


Duplicated code detected:

./misc-utils/src/com/company/File1.java:177
./misc-utils/src/com/company/File1.java:293

Right now the paths reported are absolute and not relative. I wasn't sure if there was a good way to get the relative path, and particularly what it should be relative to (cwd probably?). Would there be a way that we could have links pointing to other files/lines? We have access to the code fragment that was detected as duplicate, but I didn't think it was necessary to put that in the message, particularly when the message is inlined.

262 โ†—(On Diff #35562)

This was fun to figure out. I seem to recall knowing about array's internal iterators eons ago when I wrote my own secure-robust-practical forum board in PHP but rediscovering it was enlightening.

318 โ†—(On Diff #35562)
src/lint/linter/__tests__/ArcanistCheckstyleLinterTestCase.php
8 โ†—(On Diff #35562)

Passing in the binary through the UT's config file thing didn't seem to work - I kept getting errors as it looked like it was failing on trying to find the default binary.

src/lint/linter/__tests__/ArcanistLinterTestCase.php
63 โ†—(On Diff #35562)

Unfortunately both checkstyle and pmd/cpd do absolutely nothing if the file extensions don't match as expected (or it might be just that way for java files). This allows for the UT lint-test file to specify what filename should be used when writing out the code, to be passed to the linter.

src/lint/linter/__tests__/checkstyle/google_checks.lint-test
1 โ†—(On Diff #35562)

Would it make sense to exclude lint-test files from Arcanist project linting?

src/lint/linter/__tests__/checkstyle/google_checks.xml
1 โ†—(On Diff #35562)

I wholly apologize for this file being included in the project, but it seemed necessary since the checkstyle UT relies on this to specify code-style to verify in order for the UT to pass. I suppose since we don't distribute the lint binaries that maybe this isn't necessary to be included and assume whomever is running the UT's will know to setup these linters appropriately for passing? This file does change per release of checkstyle, which is why I initially thought to add it.

from a quick run-down, I think simplify-source-classname should default to on.

from a quick run-down, I think simplify-source-classname should default to on.

I also use simplify classnames, but left the default to be explicit. I don't feel particularly strong one way about it -- mostly that I don't really know the extent to how users use it (would having colliding class names for checkstyle lint checks be common?).

I'll plan to change it soon from your feedback.

This is what it looks like without the simplify:

pasted_file (130ร—554 px, 16 KB)

i.e., the full message is there, and it's coming from the code (I assume). So even if someone is writing custom rules, and gives them the same class names for different issues (Which is pretty out there, imho), the long error would be different.

I haven't actually looked into this - maybe checkstyle exposes something else that would be a better name / short message?

The XML format doesn't have much else data -- here's the source that writes out the XML for an error. The source is the fully-qualified classname of the checker class.

@Override
public void addError(AuditEvent event) {
    if (event.getSeverityLevel() != SeverityLevel.IGNORE) {
        writer.print("<error" + " line=\"" + event.getLine() + "\"");
        if (event.getColumn() > 0) {
            writer.print(" column=\"" + event.getColumn() + "\"");
        }
        writer.print(" severity=\""
            + event.getSeverityLevel().getName()
            + "\"");
        writer.print(" message=\""
            + encode(event.getMessage())
            + "\"");
        writer.println(" source=\""
            + encode(event.getSourceName())
            + "\"/>");
    }
}

I just poked around at their other formatter (the default one) which does use this code, essentially same thing except also removes 'Check' from the beginning of the checker's classname.

/**
 * Returns check name without 'Check' suffix.
 * @param event audit ivent.
 * @return check name without 'Check' suffix.
 */
private static String getCheckShortName(AuditEvent event) {
    final String checkFullName = event.getSourceName();
    final String checkShortName;
    final int lastDotIndex = checkFullName.lastIndexOf('.');
    if (lastDotIndex == -1) {
        if (checkFullName.endsWith(SUFFIX)) {
            checkShortName = checkFullName.substring(0, checkFullName.lastIndexOf(SUFFIX));
        }
        else {
            checkShortName = checkFullName.substring(0, checkFullName.length());
        }
    }
    else {
        if (checkFullName.endsWith(SUFFIX)) {
            checkShortName = checkFullName.substring(lastDotIndex + 1,
                checkFullName.lastIndexOf(SUFFIX));
        }
        else {
            checkShortName = checkFullName.substring(lastDotIndex + 1, checkFullName.length());
        }
    }
    return checkShortName;
}

The only other way to change that would be to make a custom 'logger' or there may be way to register a custom AuditEventFormatter. Digging through the AuditEvent class it looks like the only thing it tracks is the Class<?> of the checker - so getting further detail on the source for a more friendly name is likely not easily doable without a lot of customization in checkstyle itself.

"CustomImportOrderCheck" is not that bad a name, and "Custom Import Order" (Caps to space, sans "Check") is only a little bit clearer (For Java programmers ๐Ÿ˜‰ ). I'm not sure any more effort is useful beyond the current simplify code.
Certainly modifying checkstyle or adding more classes to it is not very useful use of time...

I have an update for defaulting to simplifying class names as @avivey suggests* however the unit test is now failing. It looks like the paths resolved when running unit test are not generating correct path names anymore:

'java' '-jar' '/usr/local/bin/checkstyle-6.13-all.jar' '-f' 'xml' '-c' '/Users/cspeckrun/Projects/neandrake/phacility/arcanist/src/lint/linter/tests/checkstyle/google_checks.xml

Notice that it's using /linter/tests/ and not /linter/__tests__/. I don't really know right now why this is happening. I did verify when setting the path in the ArcanistCheckstyleLinterTest class that dirname(__FILE__) does properly resolve with the __tests__ so there's probably something else going on that I'm not sure where to look. The change I made is working appropriately however.


_* I actually changed the format of the config option as it felt weird to just swap the default -- in that if you want to affect the behavior you would set the setting to false. I changed it to instead be checkstyle.fully-qualify-sourcename which defaults to false.

When I took this code to my library, the checkstyle linter refused to analyze files not ending with .java, so the tests failed anyway (CS 6.4-ish, I think, even after adding .lint-test in the xml).

I'm not sure what the state of the world in T10038 is, but I think it's something like "Very little chance of upstreaming", so you can probably cheat a little in the test code if it will get things going.

I have no guess about the /__test__/ becoming /test/ thing. Do other linter tests run?

It might make sense to extract the Interpreter Flags to a separate diff, because it's more generally useful.

When I took this code to my library, the checkstyle linter refused to analyze files not ending with .java, so the tests failed anyway

I specifically had to modify the ArcanistLinterTestCase to write out the intermediate file using a different name because of this issue (which was also an issue for PMD) (see my inline comment: https://secure.phabricator.com/D14632#inline-49549).

The issue with tests vs. __test__ I'm not sure how to wiggle in order to get passing. I explicitly have it set a path with __tests__ and somewhere along the line it's getting translated to tests. Other UT's seem to run fine (I don't have some of the tools installed though so they skip).

It might make sense to extract the Interpreter Flags to a separate diff, because it's more generally useful.

I wish I had done this earlier, as my knowledge of using git to manipulate commits is extremely limited -- I think I'd just need to know how to rebase this change on top of another diff/commit. Additionally I've had some difficulty in the past working with dependent diffs with arc/phab, though it was with using mercurial.

OK - I extracted the relevant changes to a new revision for now: D15067

Will need to figure out how to update this diff to be dependent on it rather than include the changes itself. I think:

  1. Remove conflicting changes with new commit
  2. Rebase on top of other commit (unsure how)
  3. Update revision?
  1. Remove conflicting changes with new commit
  2. Rebase on top of other commit (unsure how)
  3. Update revision?

That should work (use git rebase -i for both 1 and 2).

You might want to squash everything - that might reduce the total number of conflicts to resolve.

Then arc diff <other commit> to update this one...

Inversing the checkstyle flag for simplifying sourcename and defaulting so that sourcenames are simplified.

@avivey - thanks for the tips I think I did it on the first try?! Well I'm not sure if the dependency relationship is correct, but I believe this D15067 is supposed to be a "dependency" and not a "dependent".

Still need to determine why the UT isn't resolving the right path to __tests__.

@avivey @cspeckmim Sorry to intrude, but I'm also interested in this functionality. I've read the comments but what needs to be done to move this forward? I assume getting D15067 landed? And I think the unit tests for this change need to be fixed?

@avivey @cspeckmim Sorry to intrude, but I'm also interested in this functionality. I've read the comments but what needs to be done to move this forward? I assume getting D15067 landed? And I think the unit tests for this change need to be fixed?

Yes those things need to happen. The changeset was working but then I worked on splitting out part of it into D15067 but haven't revisited this since. Part of the reason is that there's little chance this would be accepted upstream (T10038#150629), and I haven't yet had a chance to work within my organization to setup a fork of arcanist which all developers would install from, in order to utilize this functionality.

@cspeckmim So basically T5055 needs to be done before any linter can be added in a maintainable way? There is no recent activity in T5055 and since it is an architectural change it is really hard for the community to do since it would always require closely working with upstream and apparently they have other priorities. I guess there is not much else to do then to off-load the linting to the CI build engine then.

I guess there is not much else to do then to off-load the linting to the CI build engine then.

Unfortunately I think that's the current situation until they are funded to prioritize T5055.

I would be willing to chime in for T5055 however I doubt anyone will want to foot the entire bill for such a large architectural change. I also think that upstream is the one that would benefit the most from it since it enables people to contribute more easy. I mean we both clearly want to contribute by extending phabricator (you even made the contribution already) but are unable.

Until T5055, this change looks like it can live in the Community Resources page... The only modified file is a test-engine, so just bundling a copy of the test engine in there would just work.

It's far from ideal, yeah, but at least it's usable.

In any case, these lints can take a long time to run, probably too long for users to agree to run them, so running them in the CI isn't a bad choice.

The only modified file is a test-engine, so just bundling a copy of the test engine in there would just work.

@avivey - This was split into two diffs and is dependent on D15067, as it does require some changes to the ArcanistExternalLinter. Unfortunately not as simple as dropping in new linter files.

A workaround I did for this in my package looks like this:

  • A script embedded in the package
resources/java-jar.sh
#! /usr/bin/env bash
java -jar "$@"
  • Replace the interpreter with that:
ArcanistCheckstyleLinter.php
public function getDefaultInterpreter() {
  $lib = 'libname';
  $root = dirname(phutil_get_library_root($lib));
  return $root.'/resources/java-jar.sh';
}

This probably doesn't work on Windows, but most other things don't either.

Plus, because I didn't want to bundle the jar in the library's git repo, I did this:

1<?php
2
3final class GetDependenciesWorkflow extends ArcanistWorkflow {
4 public function getWorkflowName() {
5 return 'get-dependencies';
6 }
7
8 public function getCommandSynopses() {
9 return phutil_console_format(<<<EOTEXT
10 **get-dependencies**
11EOTEXT
12 );
13 }
14
15 public function getCommandHelp() {
16 return phutil_console_format(<<<EOTEXT
17 Download external resources you might need.
18EOTEXT
19 );
20 }
21
22 public function run() {
23 $console = PhutilConsole::getConsole();
24
25 $lib = 'libname';
26 $root = dirname(phutil_get_library_root($lib));
27
28 $working_copy = ArcanistWorkingCopyIdentity::newFromPath($root);
29 $configuration_manager = clone $this->getConfigurationManager();
30 $configuration_manager->setWorkingCopyIdentity($working_copy);
31 $repository = ArcanistRepositoryAPI::newAPIFromConfigurationManager(
32 $configuration_manager);
33
34
35 $dependencies = array(
36 array(
37 'filename' => 'checkstyle-6.14.1-all.jar',
38 'sha1' => '8e2c3a2bcef100c084e6ea80cc426b3443632d8c',
39 'uri' =>
40 'http://my.web.server/checkstyle-6.14.1.jar',
41 ),
42 );
43
44 chdir($root.'/resources/');
45 foreach ($dependencies as $dep) {
46 phutil_passthru('curl -o %s %s', $dep['filename'], $dep['uri']);
47 }
48 }
49
50}

@cspeckmim: Is there any chance this will be merged into master? (Sorry for joining in late)

@cspeckmim: Is there any chance this will be merged into master? (Sorry for joining in late)

As-is definitely not (caused some UT failures and probably needs more UT). I stopped working on it primarily as Phacility has communicated that they would no longer be taking new linters upstream, per T10038 and related tasks/discussions.

Hello!
I am getting this error:
Usage Exception: Configured lint engine "ArcanistPMDLinter" is not a subclass of "ArcanistLintEngine", but must be.
Can anybody help me?
Thank in advance.

This revision is not in the upsteam, you should not be applying it locally.

@chad But I need PMD linter in my local phabricator.
What should I do?

@pranshu1995 - you can manually run PMD, or set it up to IDEs like Eclipse. Currently it's not supported by arcanist.

@cspeckmim Okay! Thanks for replying.
Is there any support for some other static code analyzer for Java?