Page MenuHomePhabricator

Plans: Code coverage reporting
Open, NormalPublic

Description

  • See also PHI1816, which is roughly "all of this" but currently focuses on directory-level coverage aggregation.

See T13105. Previously, files in Diffusion showed a similar coverage column on the right. At time of writing, coverage support hasn't made it to DocumentEngine yet, although I intend to restore it soon.

T11372 / PHI181 want multiple columns of coverage information to support "checked/unchecked" annotations from a static analysis tool. This is conceptually reasonable, although I worry it is so exotic that we'd essentially be building a feature for exactly one install and one narrow use case. Offhand, it's hard to come up with other coverage-like annotations which are interesting and useful. Long ago, T1630 proposed complexity annotations (which might mark particular functions as "red" or "green" or something in between, depending on how complex it is) but I'm not very bought-in to this class of tools being especially useful and there's no current mana behind it. A somewhat-related request in PHI460 discusses an application for recording "ghost stories" about the code -- the mechanism is different, but the goal (warn engineers that they are touching risky code) is similar. I'm better aligned with this as an approach than with automated complexity.

PHI577 reports that this UI previously transposed "Not Executable" and "Not Covered" in tooltips.

These tasks discuss coverage reporting in arc:

They are all similar and likely fairly straightforward when arc unit gets revamped.

T9524 discusses a potentially substantial improvement to coverage (aggregating it at the directory level in Diffusion).

Currently, coverage is broadly clumsy to interact with. It can be updated in Diffusion with diffusion.updatecoverage and in Differential with harbormaster.sendmessage but these methods don't interact with one another or share anything.


I'm inclined to move gradually toward potentially making coverage modular. Although I don't find the "checked" or "complexity" use cases especially compelling, I don't think they're sufficiently exotic to warrant outright dismissal. Conceptually, support for multiple types of coverage-like information shouldn't be difficult, since a coverage-class is defined by a handful of simple rules (some labels and colors, and some rules for which coverage identifier wins when aggregating coverage). We may never get to a place where we formally support the "checked" or "complexity" cases, but I think modularity has little downside here and may put us in a better place even if we never implement any other types of coverage.

Event Timeline

epriestley created this task.

PHI1231 would like API access to coverage.

I'd like to do these first:

  • Support for multiple sets of coverage annotations (T11372). Although I am not in love with the use cases here, my concern is that they're low-value, not inherently bad. I'd mostly like to make sure the backend can store this data and the API won't need changes to read/write it; I'm less concerned about showing it in the UI or providing real support for it.
  • Backend changes for coverage aggregation (T9524).

These should put us in a position to build a stable read/write API for coverage.

The primary backend issue is the use of the the PhabricatorRepositoryBranch object. This is sort of a clumsy hack. A better API would be to push coverage information for a given path at a given commit and let Phabricator figure everything else out.

If we start with that assumption, we get an API which looks like this:

  • To write, push <repository, commit, path, coverageType, coverage>. (Possibly, we also have an API to delete old coverage, but this isn't critical.) This isn't quite a *.edit method, and should probably accept multiple writes.
  • To read, request <repository, refs/commits, paths, coverageTypes>. This can look like a *.search method.

One complicating concern here is that coverage is really multiple different datasets:

  • A test (or group of tests) may individually produce coverage, e.g. test X covers lines Y.
  • In the UI, we normally want to show aggregate coverage for all tests. We can produce this by combining individual coverage results.
  • T9524 wants a different type of aggregation (at the directory level). This would be inefficient to store line-by-line but can be stored as counts.
  • T11372 wants a different dimension of coverage, which might or might not have similar aggregation levels.

So maybe "coverage" is already several different datasets:

  1. Run coverage.
  2. Unified file-level coverage, aggregated from run coverage.
  3. Directory coverage.

The data is mostly just blobs of who-cares-what attached to paths at particular commits, but maybe all of this can just be expressed through coverage types? The trick is that we generally want to unify (1) directly into (2), then discard (1). We want to aggregate (2) into (3) without discarding it.

An issue with the <commit, path> aggregation is that directories like src/ will change frequently but we probably don't care much about aggregating their coverage all the time?

This whole problem is much easier if we do something like "only the HEAD of master can have coverage", and that's probably fine 95% of the time, but that's a real cheaty solution.

If we suppose:

  • An external script writes a bunch of run coverage data with the new "coverage write" API.
  • We verify it's properly formed and dump into the worker queue.
  • For each item we pull out of the queue, we build an object out of it and hand it the blob of data.
    • Since it's ephemeral run coverage, the blob of data works by loading the "aggregate coverage" row if it exists, merging into it, then dumping the original data.
    • Then it marks parent directories as dirty at that commit?
    • Once it's done, it queues tasks to do aggregation updates I guess?

Then the web UI can find the last touched commit and pull aggregate coverage for display.

The directory list is trickier for high-volume repositories and for directories with many files. I think we can get there, but this is a lot of whole lot of layers of daemons queueing other daemons and intermediate storage tables and locks.

PHI1816 would also like dimensional coverage: coverage by language; coverage by team/owner. This is perhaps worth considering in how coverage is aggregated.

Coverage by owner seems particularly tricky because one file may have many owners.

See PHI1997, which discusses things from the perspective of reporting coverage from a build system.

When building diffs/revisions, coverage of files not touched by change isn't relevant, and isn't likely to become relevant in the future. differential.changeset.search, implemented by T13605, allows builds to filter the set of files they're reporting coverage for and report a small coverage set.

In contrast, for commits, it likely makes sense to encourage clients to report coverage exhaustively. For example, if commit C just adds or removes tests, it may still affect coverage in files it did not touch directly. If clients report only coverage for modified files, the aggregate view of coverage in the repository will diverge from reality over time.

This implies a fair amount of cleverness on the aggregation side because this data will be 99%+ redundant.