Page MenuHomePhabricator

Code coverage reports from `arc unit` are sometimes inaccurate
Closed, ResolvedPublic

Description

I've found that sometimes my coverage reports are incorrect based on what appears to be some sort of idiosyncrasy in PHPUnit's --coverage-clover output, most notably when running arc diff against either a directory or a lot of files. Basically, the Clover coverage is giving false positives on which lines are not covered vs those which are unreachable. I'd wager this is some weird artifact of PHP7 being so new and some sort of static analysis being off, but that's beside the point.

From arc diff --detailed-coverage src/

Screen Shot 2016-02-22 at 5.41.51 PM.png (448×780 px, 55 KB)

Running on a single file isn't affected due to the way the coverage aggregation works; arc diff --detailed-coverage src/SignRequest.php

Screen Shot 2016-02-22 at 5.41.09 PM.png (524×808 px, 73 KB)

This patch fixes it for me, but I don't know if there are broader implications for other languages or test runners. If it looks OK I'll re-submit this as a diff. All it does is give preference to "not reachable" over "uncovered" when combining coverage reports.

diff --git a/src/unit/ArcanistUnitTestResult.php b/src/unit/ArcanistUnitTestResult.php
index a0dc2f7..b01b63c 100644
--- a/src/unit/ArcanistUnitTestResult.php
+++ b/src/unit/ArcanistUnitTestResult.php
@@ -137,6 +137,9 @@ final class ArcanistUnitTestResult extends Phobject {
         if ($more_coverage[$ii] == 'C') {
           $base[$ii] = 'C';
         }
+        if ($more_coverage[$ii] == 'N' && $base[$ii] = 'U') {
+          $base[$ii] = 'N';
+        }
       }

       // If a secondary report has more data, copy all of it over.

Event Timeline

In the very, very general case, I think it's undesirable to favor "N" over "U" because the reports could be coming from two different engines, only one of which knows how to execute the lines.

The most plausible example of this I can come up with is something like this:

#ifdef GIRAFFE
  do_giraffe_stuff();
#else
  do_no_giraffe_stuff();
#endif

In this scenario, when you run arc unit, it makes a --with-giraffe build and a --without-giraffe build and runs tests on both, and each call is "N" on one build but "U" on the other. Of these two, "U" is correct.

I'm just making this situation up (as far as I know, there are no real-world implications of your proposal today), but it seems sort-of plausible? While the only plausible situation I can immediately come up with where favoring "N" is better is the one here -- the coverage engine is buggy/inconsistent and sometimes decides reachability too broadly.

Maybe there's some way we can fix the behavior before it hits coverage aggregation, though.

That seems totally reasonable. It seems like a bit of a stretch for PHP code specifically, but sane as a general-purpose answer.

I'm playing with a different approach in the PHPUnit result parser that amounts to "if this test covered no lines in the file, just exclude it from the report", which I think will still usually exclude false-positive Uncovered marks from bad static analysis but still handle the theoretical situation you described above. Files with no coverage should still get marked as such during the aggregation at the end. And, of course, other languages will be totally unaffected.

Implementation of what's described above - also seems to provide the expected results in my case, and limits impact to PHPUnit results. If a file is executed at all, it should show up normally; when completely untouched (which tends to show up with phpunit whitelisting, now necessary in the newest versions).

diff --git a/src/unit/parser/ArcanistPhpunitTestResultParser.php b/src/unit/parser/ArcanistPhpunitTestResultParser.php
index 46e9c13..d4f7d1d 100644
--- a/src/unit/parser/ArcanistPhpunitTestResultParser.php
+++ b/src/unit/parser/ArcanistPhpunitTestResultParser.php
@@ -129,8 +129,10 @@ final class ArcanistPhpunitTestResultParser extends ArcanistTestResultParser {
       $line_count = count(file($class_path));

       $coverage = '';
+      $any_line_covered = false;
       $start_line = 1;
       $lines = $file->getElementsByTagName('line');
       for ($ii = 0; $ii < $lines->length; $ii++) {
         $line = $lines->item($ii);
         for (; $start_line < $line->getAttribute('num'); $start_line++) {
@@ -144,6 +146,7 @@ final class ArcanistPhpunitTestResultParser extends ArcanistTestResultParser {
             $coverage .= 'U';
           } else if ((int)$line->getAttribute('count') > 0) {
             $coverage .= 'C';
+            $any_line_covered = true;
           }
         }

@@ -153,10 +156,27 @@ final class ArcanistPhpunitTestResultParser extends ArcanistTestResultParser {
       for (; $start_line <= $line_count; $start_line++) {
         $coverage .= 'N';
       }

-      $len = strlen($this->projectRoot.DIRECTORY_SEPARATOR);
-      $class_path = substr($class_path, $len);
-      $reports[$class_path] = $coverage;
+      if ($any_line_covered) {
+        $len = strlen($this->projectRoot.DIRECTORY_SEPARATOR);
+        $class_path = substr($class_path, $len);
+        $reports[$class_path] = $coverage;
+      }
     }

     return $reports;

I also have a change that IMO simplifies the coverage report processing logic, which I feel is hard to read with the multi-pass for loop implementation:

+      $coverage = str_repeat('N', $line_count);
+      foreach ($lines as $line) {
+        if ($line->getAttribute('type') != 'stmt') {
+          continue;
+        }
+        if ((int)$line->getAttribute('count') > 0) {
+          $is_covered = 'C';
+          $any_line_covered = true;
+        } else {
+          $is_covered = 'U';
+        }
+        $line_no = (int)$line->getAttribute('num');
+        $coverage[$line_no-1] = $is_covered;
+      }

If either of these seem appropriate, I'll re-package this as a proper diff.

Sure, happy to bring those both upstream.