Changeset View
Standalone View
src/lint/linter/ArcanistPMDLinter.php
- This file was added.
<?php | |||||
/** | |||||
* This linter invokes PMD for linting java code. | |||||
* The PMD report is given over stdout as a simple XML document | |||||
* which maps fairly easily to ArcanistLintMessage. | |||||
* | |||||
* The manner of executing PMD is a little round-about in order | |||||
* to work around the internals of ArcanistExternalLinter. | |||||
* | |||||
* The binary is passed as a classpath argument to `java` then | |||||
* the binary arguments override it with a full classpath. | |||||
* This also enables the binary being used to build the full | |||||
* classpath since the PMD distribution uses multiple jars. | |||||
*/ | |||||
class ArcanistPMDLinter extends ArcanistExternalLinter { | |||||
private $subCommand = 'pmd'; | |||||
cspeckmim: Getting the command-line execution correct for PMD was a bit of a chore… | |||||
public function getInfoName() { | |||||
return 'Java PMD linter'; | |||||
} | |||||
public function getLinterName() { | |||||
return 'PMD'; | |||||
} | |||||
public function getInfoURI() { | |||||
return 'https://pmd.github.io/'; | |||||
} | |||||
public function getInfoDescription() { | |||||
return pht('Use `%s` to perform static analysis on Java code.', | |||||
'PMD'); | |||||
} | |||||
public function getLinterConfigurationName() { | |||||
return 'pmd'; | |||||
} | |||||
public function getLinterConfigurationOptions() { | |||||
$options = array( | |||||
'pmd.command' => array( | |||||
'type' => 'optional string', | |||||
'help' => pht( | |||||
'Specify the subcommand to run, defaults to `pmd`. '. | |||||
'Available subcommands are `pmd` or `cpd`. See the PMD '. | |||||
'documentation for more: '. | |||||
'https://pmd.github.io/pmd-5.4.1/usage/running.html'), | |||||
), | |||||
); | |||||
return $options + parent::getLinterConfigurationOptions(); | |||||
} | |||||
public function setLinterConfigurationValue($key, $value) { | |||||
switch ($key) { | |||||
case 'pmd.command': | |||||
$this->setSubCommand($value); | |||||
return; | |||||
} | |||||
return parent::setLinterConfigurationValue($key, $value); | |||||
} | |||||
public function setSubCommand($sub_command) { | |||||
$this->subCommand = $sub_command; | |||||
return $this; | |||||
} | |||||
public function getVersion() { | |||||
// The pmd executable doesn't appear to actually be able to print a version | |||||
// However if you run the binary, the help output gives examples that | |||||
// appear to display the version number in paths... | |||||
$interpreter = $this->getInterpreter(); | |||||
$classpath = $this->buildClasspath(); | |||||
$mainclass = $this->getSubCommandMainClass(); | |||||
// having the help printout for PMD returns with error code 4. | |||||
list($code, $stdout, $stderr) = exec_manual('%s -cp %s %s', | |||||
$interpreter, $classpath, $mainclass); | |||||
$matches = array(); | |||||
$regex = '/^\$ pmd-bin-(?P<version>\d+\.\d+\.\d+)/m'; | |||||
if (preg_match($regex, $stdout, $matches)) { | |||||
return $matches['version']; | |||||
} | |||||
return false; | |||||
} | |||||
public function getInstallInstructions() { | |||||
return pht('Ensure java is configured as interpreter and '. | |||||
'the pmd jar is configured as the binary. If you need to pass '. | |||||
'additional additional JVM arguments include them with the '. | |||||
'`interpreter` field. Use the `flags` field to configure pmd '. | |||||
'arguments, including the `-rulesets my_rulesets.xml`. '. | |||||
'You must pass the `-dir` argument as the last flag.'); | |||||
} | |||||
protected function getMandatoryFlags() { | |||||
$classpath = $this->buildClasspath(); | |||||
$subcommand_mainclass = $this->getSubCommandMainClass(); | |||||
return array( | |||||
'-cp', | |||||
$classpath, | |||||
$subcommand_mainclass, | |||||
'-failOnViolation', | |||||
'false', | |||||
Not Done Inline ActionsThis was unfortunate -- the pmd command vs. the cpd command have different arguments to do the same thing, -format vs. --format both set output report format and both accept a single parameter (such as xml). Also -dir vs. --files both of which specify either directories/files to analyze (according to help either command will operate on both directories and files). cspeckmim: This was unfortunate -- the `pmd` command vs. the `cpd` command have different arguments to do… | |||||
'-format', | |||||
'xml', | |||||
); | |||||
} | |||||
public function shouldExpectCommandErrors() { | |||||
return false; | |||||
} | |||||
public function shouldUseInterpreter() { | |||||
return true; | |||||
} | |||||
public function getDefaultBinary() { | |||||
return 'pmd-core.jar'; | |||||
} | |||||
public function getDefaultInterpreter() { | |||||
return 'java'; | |||||
} | |||||
protected function getMandatoryInterpreterFlags() { | |||||
// this causes the jar binary to be interpreted as classpath | |||||
// later to be overwritten with a full classpath during execution | |||||
return array('-cp'); | |||||
} | |||||
protected function parseLinterOutput($path, $err, $stdout, $stderr) { | |||||
$dom = new DOMDocument(); | |||||
$ok = @$dom->loadXML($stdout); | |||||
if (!$ok) { | |||||
return false; | |||||
} | |||||
$files = $dom->getElementsByTagName('file'); | |||||
$messages = array(); | |||||
foreach ($files as $file) { | |||||
$violations = $file->getElementsByTagName('violation'); | |||||
foreach ($violations as $violation) { | |||||
$message = new ArcanistLintMessage(); | |||||
$message->setPath($file->getAttribute('name')); | |||||
$message->setLine($violation->getAttribute('beginline')); | |||||
Not Done Inline ActionsI'm not very familiar with this @$dom syntax in PHP so I confess I'm not sure I could explain why this is here (or probably needed). This was how the ArcanistCppcheckLinter did XML parsing -- I searched and found several XML parsing libraries seem to be available in PHP but this API is most familiar to me. Also PMD and checkstyle both seem to be using stringbuilding to make the XML instead of an XML API~ cspeckmim: I'm not very familiar with this `@$dom` syntax in PHP so I confess I'm not sure I could explain… | |||||
$message->setCode($this->getLinterName()); | |||||
// include the ruleset and the rule | |||||
$message->setName($violation->getAttribute('ruleset'). | |||||
': '.$violation->getAttribute('rule')); | |||||
$description = ''; | |||||
if (property_exists($violation, 'firstChild')) { | |||||
$first_child = $violation->firstChild; | |||||
if (property_exists($first_child, 'wholeText')) { | |||||
$description = $first_child->wholeText; | |||||
} | |||||
} | |||||
// unescape the XML written out by pmd's XMLRenderer | |||||
if ($description) { | |||||
// these 4 characters use specific XML-escape codes | |||||
$description = str_replace( | |||||
['&', '"', '<', '>'], | |||||
['&', '"', '<', '>'], | |||||
$description); | |||||
// everything else is hex-code escaped | |||||
$escaped_chars = array(); | |||||
preg_replace_callback( | |||||
'/&#x(?P<hexcode>[a-f|A-F|0-9]+);/', | |||||
array($this, 'callbackReplaceMatchesWithHexcode'), | |||||
$description); | |||||
$message->setDescription($description); | |||||
} | |||||
$column = $violation->getAttribute('begincolumn'); | |||||
if ($column) { | |||||
$message->setChar($column); | |||||
} | |||||
$severity = $violation->getAttribute('priority'); | |||||
switch ($severity) { | |||||
case '1': | |||||
$message->setSeverity(ArcanistLintSeverity::SEVERITY_ERROR); | |||||
break; | |||||
case '2': | |||||
$message->setSeverity(ArcanistLintSeverity::SEVERITY_WARNING); | |||||
break; | |||||
case '3': | |||||
case '4': | |||||
Not Done Inline ActionsI wasn't sure of what would be an appropriate way to handle this. The XML API is a little painful in this manner -- in java-land I was thinking if ($violation instanceof DOMElementNode) { $first_child = ((DOMElementNode) $violation)->firstChild; if ($first_child instanceof DOMTextNode { $description = ((DOMTextNode) $first_child)->wholeText; } } Maybe doing a check against nodeType would be preferred? cspeckmim: I wasn't sure of what would be an appropriate way to handle this. The XML API is a little… | |||||
case '5': | |||||
$message->setSeverity(ArcanistLintSeverity::SEVERITY_ADVICE); | |||||
break; | |||||
// The above five are the only valid PMD priorities, | |||||
// this is for completion as well as preparing for future severities | |||||
default: | |||||
$message->setSeverity(ArcanistLintSeverity::SEVERITY_WARNING); | |||||
break; | |||||
} | |||||
$messages[] = $message; | |||||
} | |||||
} | |||||
return $messages; | |||||
} | |||||
private function buildClasspath() { | |||||
$jar_files = array(); | |||||
$lib_path = Filesystem::resolvePath($this->getBinary()); | |||||
$lib_path = substr($lib_path, 0, strrpos($lib_path, DIRECTORY_SEPARATOR)); | |||||
$lib_path_contents = Filesystem::listDirectory($lib_path); | |||||
foreach ($lib_path_contents as $lib_path_file) { | |||||
if (preg_match('/\.jar$/', $lib_path_file)) { | |||||
Not Done Inline ActionsThe severity I map here is a little arbitrary. Is there a way I should have done this to expose the user's severity mapping? cspeckmim: The severity I map here is a little arbitrary. Is there a way I should have done this to expose… | |||||
$jar_files[] = $lib_path.DIRECTORY_SEPARATOR.$lib_path_file; | |||||
} | |||||
} | |||||
return implode(PATH_SEPARATOR, $jar_files); | |||||
} | |||||
private function getSubCommandMainClass() { | |||||
$command_mainclass = ''; | |||||
switch ($this->subCommand) { | |||||
case 'pmd': | |||||
$command_mainclass = 'net.sourceforge.pmd.PMD'; | |||||
break; | |||||
case 'cpd': | |||||
$command_mainclass = 'net.sourceforge.pmd.cpd.CPD'; | |||||
break; | |||||
} | |||||
return $command_mainclass; | |||||
} | |||||
private function callbackReplaceMatchesWithHexcode($matches) { | |||||
return $this->convertHexToBin($matches['hexcode']); | |||||
} | |||||
/** | |||||
* This is a replacement for hex2bin() which is only available in PHP 5.4+. | |||||
* Returns the ascii interpretation of a given hexadecimal string. | |||||
* | |||||
* @param $str string The hexadecimal string to interpret | |||||
* | |||||
* @return string The string of characters represented by the given hex codes | |||||
*/ | |||||
private function convertHexToBin($str) { | |||||
$sbin = ''; | |||||
Not Done Inline ActionsI'm open for ideas on how best to report duplicated code like this. I was thinking something like: Duplicated code detected: ./misc-utils/src/com/company/File1.java:177 ./misc-utils/src/com/company/File1.java:293 Right now the paths reported are absolute and not relative. I wasn't sure if there was a good way to get the relative path, and particularly what it should be relative to (cwd probably?). Would there be a way that we could have links pointing to other files/lines? We have access to the code fragment that was detected as duplicate, but I didn't think it was necessary to put that in the message, particularly when the message is inlined. cspeckmim: I'm open for ideas on how best to report duplicated code like this. I was thinking something… | |||||
$len = strlen($str); | |||||
for ($i = 0; $i < $len; $i += 2) { | |||||
$sbin .= pack('H*', substr($str, $i, 2)); | |||||
} | |||||
return $sbin; | |||||
} | |||||
Not Done Inline ActionsThis was fun to figure out. I seem to recall knowing about array's internal iterators eons ago when I wrote my own secure-robust-practical forum board in PHP but rediscovering it was enlightening. cspeckmim: This was fun to figure out. I seem to recall knowing about array's internal iterators eons ago… | |||||
} | |||||
Not Done Inline ActionsI borrowed this from cspeckmim: I borrowed this from
http://php.net/manual/en/function.hex2bin.php#113057 |
Getting the command-line execution correct for PMD was a bit of a chore -- ArcanistExternalLinters checks against interpreter/binary made this a tad difficult to get right, but essentially the command looks like: