Page MenuHomePhabricator

Allow revisions to be queried by affected path via the API
Closed, ResolvedPublic

Description

The "affected paths" index, which allows open revisions to be queried by which paths they affect, has been updated:

  • The frozen API method differential.query no longer supports the paths parameter (and will raise an error if it is provided).
  • The modern API method differential.revision.search now supports an affectedPaths constraint. Provide a list of paths:

Screen Shot 2021-03-15 at 4.01.12 PM.png (1×1 px, 408 KB)

It may be useful to use the repositoryPHIDs constraint alongside the affectedPaths constraint. If you do not, queries like /example.txt will find revisions affecting /example.txt in any repository. With both constraints, querying will be able to select a more efficient query plan, although the affectedPaths constraint will also work on its own.

Affected Path Index

With phabricator.developer-mode enabled, you can now inspect the affected path index directly:

Screen Shot 2021-03-15 at 1.28.40 PM.png (180×368 px, 16 KB)

Screen Shot 2021-03-15 at 1.28.50 PM.png (778×1 px, 140 KB)

These changes fixed a couple of bugs and limitations with the index. In particular:

  • If you edited the "Repository" of a revision, the index wasn't rebuilt properly (it still recorded the revision as affecting paths in the old repository).
  • If you created a revision without a "Repository", the index wasn't built at all (but there was also no way to query it).

The index is now always built, and always rebuilt when the repository is changed. This index update is not retroactive. You can optionally run this script if you'd like to fix existing indexes:

update-affected-path-index.php
<?php

require_once 'scripts/init/init-script.php';

$viewer = PhabricatorUser::getOmnipotentUser();

$query = id(new DifferentialRevisionQuery())
  ->setViewer($viewer)
  ->needActiveDiffs(true)
  ->setOrderVector(array('-id'));

$iterator = new PhabricatorQueryIterator($query);
foreach ($iterator as $revision) {
  try {
    $active_diff = $revision->getActiveDiff();

    $diff = id(new DifferentialDiffQuery())
      ->withPHIDs(array($active_diff->getPHID()))
      ->setViewer($viewer)
      ->needChangesets(true)
      ->executeOne();

    id(new DifferentialAffectedPathEngine())
      ->setRevision($revision)
      ->setDiff($diff)
      ->updateAffectedPaths();

    echo tsprintf(
      "%s\n",
      pht(
        'Reindexed %s.',
        $revision->getMonogram()));
  } catch (Exception $ex) {
    echo tsprintf(
      "%s\n",
      pht(
        'Exception while indexing %s: %s',
        $revision->getMonogram(),
        $ex->getMessage()));
  }
}

echo tsprintf(
  "%s\n",
  pht('Done.'));
phabricator/ $ php -f update-affected-path-index.php

This script is safe to run while Phabricator is online. There is probably little reason to bother with this unless you plan to run a bunch of historical queries, or just like tidying up.

Limitations

There is currently no way to search for revisions that affect /example.txt but do not have a repository. This probably isn't terribly useful. (As a workaround, you could query for revisions affecting /example.txt, then filter them on the client.)


Original Request

See PHI2025. An install would like to query revisions by affected path, as Diffusion does to generate the "Recently Open Revisions" panel (which is a bit oddly named?).

DifferentialRevisionQuery currently has withPath(repository_id, path_id), which may be called multiple times. It is called by:

  • Diffusion, to populate "Recently Open Revisions".
  • Differential, to populate the "Recent Similar Revisions" panel.
  • differential.query supports an explicit "paths" constraint.

The DifferentialAffectedPath table stores repositoryID, pathID, revisionID, and has a key on <repositoryID, pathID, epoch>.

epoch seems like it's useless; nothing ever queries it.

The <repositoryID, ...> would almost certainly be better as <pathID, ...>. This table is never queried with a repository ID and no path ID.

This would be easier to query with repositoryPHID than repositoryID, but that's probably not worth the effort of migration.

Upshot:

  • Rename "Recently Open Revisions" to "Recent Open Revisions".
  • Change the ->withPath(repository_id, path_id) API to ->withPaths(array $list_of_path_strings).
  • Drop the <repositoryID, pathID, epoch> key.
  • Drop the epoch column.
  • Add a <pathID, repositoryID> key.
  • Expose a constraint.

The withRepositoryPHIDs(...) + withPaths(...) API is technically less powerful than withPath(repository_id, path_id). For example, it can not, in a single query, select:

(revisions which affect /a/b/c.txt in repository A) unioned with (revisions which affect /x/y/z.txt in repository B)

There's likely no use case for this and any reasonable caller can almost certainly execute two queries.

Event Timeline

epriestley triaged this task as Wishlist priority.Mar 14 2021, 7:50 PM
epriestley created this task.

I'm likely to break the paths constraint for frozen API method differential.query. This was added by D2788 as a Facebook-specific patch with no rationale that I can dig up, and the author didn't have any related changes from around that time.

I believe arc has never used this constraint and it likely has had zero callers in the wild for many years.

epriestley updated the task description. (Show Details)
epriestley updated the task description. (Show Details)

This is now in stable; presuming it works until evidence to the contrary emerges.