Page MenuHomePhabricator

Purge "Unknown Object"
Closed, ResolvedPublic

Description

Basically, the way that we had setup our repositories was messy (both because they were added with an ugly callsign and because they were created by inserting rows directly into the DB), and so I deleted the repositories (./bin/repository delete) and re-added them. However, now there are lots of "Unknown Object"s where previously a commit was linked.

Is there an easy way that I can purge these unknown objects?

Admittedly, our version of Phabricator is a bit old (maybe 6 months). I am upgrading it soon.

{F136765}

Event Timeline

joshuaspence assigned this task to epriestley.
joshuaspence raised the priority of this task from to Normal.
joshuaspence updated the task description. (Show Details)
joshuaspence added a subscriber: joshuaspence.

This is a hard problem in the general case because references to an object can end up anywhere (in extreme cases, in JSON fields, in log files, in email, in remote systems, etc), and it's not hugely practical to keep track of them all or clean them all up. So we make a (hopefully) reasonable effort and then try to make sure nothing disastrously bad happens when we miss things.

That said, we can do a much better job than we do now in making that reasonable effort. In particular, we can clean up edges (shown here) and transactions automatically in most cases.

Some specific issues here:

  • I'd like to remove most destroy-by-name to a new bin/destroy or similar, which just takes any object name. We have the technical ability to destroy most objects, but it's spread across different scripts or, in some cases, has no user-facing UI.
  • When destroying objects with PHIDs, we should automatically destroy all edges.
  • When destroying objects with transactions, we should automatically destroy transactions.
  • When destroying objects with custom fields, we should automatically destroy all custom field data.
  • Possibly, we should implement some sort of bin/destroy clean which iterates over common tables with foreign PHIDs (like edge) and resolves the PHIDs, deleting rows which reference unresolvable objects (and maybe restoring rows with half the edge missing). This could repair this kind of issue.

I wouldn't expect updating to affect this, since it's intentional that we show these kinds of references in the UI when they exist (I think they make things less confusing in many cases, compared to having things ninja-vanish -- it's also way easier for us to support issues if the report is "my thing says unknown object", which has one fairly narrow cause, than if the report is "my thing isn't there at all", which could be anything).

For your purposes, if you want to get rid of those rows, you can either wait for bin/destroy clean or write an approximation of it. Roughly, it should look like this for your case:

$conn_w = id(new DifferentialRevision())->establishConnection('w');
$rows = new LiskRawMigrationIterator($conn_w, 'edge');
$viewer = PhabricatorUser::getOmnipotentUser();

foreach ($rows as $row) {
  $src = $row['src'];
  $type = $row['type'];
  $dst = $row['dst'];
  
  $dst_object = id(new PhabricatorObjectQuery())
    ->setViewer($viewer)
    ->withPHIDs(array($dst))
    ->executeOne();

  if (!$dst_object) {
    $editor = id(new PhabricatorEdgeEditor())
      ->setActor($viewer)
      ->removeEdge($src, $type, $dst);

    // This will permanently destroy data if uncommented.
    // $editor->save();
  }
}

Basically:

  • Iterate over all the edges in Differential. These are <src_phid, type, dst_phid> tuples which connect objects together with some type relationship, like "the src is a revision, and the dst is a commit which closes the revision".
  • For each edge, load the destination object. If it doesn't exist (for example, it's a commit that has been deleted)...
  • ...remove that edge from the database.

No promises that code actually works, but it might give you a reasonable shot at fixing this before we can build bin/destroy clean or some other more general solution.

Ok, so I used the script you provided by modified it slightly to perform a dry run. However, I am getting the following error:

[2014-04-02 07:52:08] EXCEPTION: (AphrontQuerySchemaException) #1054: Unknown column 'id' in 'where clause' at [/data/www/libphutil/src/aphront/storage/connection/mysql/AphrontMySQLDatabaseConnectionBase.php:308]
  #0 AphrontMySQLDatabaseConnectionBase::throwQueryCodeException(1054, Unknown column 'id' in 'where clause') called at [/data/www/libphutil/src/aphront/storage/connection/mysql/AphrontMySQLDatabaseConnectionBase.php:278]
  #1 AphrontMySQLDatabaseConnectionBase::throwQueryException(Resource id #220) called at [/data/www/libphutil/src/aphront/storage/connection/mysql/AphrontMySQLDatabaseConnectionBase.php:184]
  #2 AphrontMySQLDatabaseConnectionBase::executeRawQuery(SELECT * FROM `edge` WHERE `id` > 0 ORDER BY ID ASC LIMIT 100) called at [/data/www/libphutil/src/xsprintf/queryfx.php:9]
  #3 queryfx(Object AphrontMySQLDatabaseConnection, SELECT * FROM %T WHERE %C > %d ORDER BY ID ASC LIMIT %d, edge, id, 0, 100)
  #4 call_user_func_array(queryfx, Array of size 6 starting with: { 0 => Object AphrontMySQLDatabaseConnection }) called at [/data/www/libphutil/src/xsprintf/queryfx.php:25]
  #5 queryfx_all(Object AphrontMySQLDatabaseConnection, SELECT * FROM %T WHERE %C > %d ORDER BY ID ASC LIMIT %d, edge, id, 0, 100) called at [/data/www/phabricator/src/infrastructure/storage/lisk/LiskRawMigrationIterator.php:30]
  #6 LiskRawMigrationIterator::loadPage() called at [/data/www/libphutil/src/utils/PhutilBufferedIterator.php:131]
  #7 PhutilBufferedIterator::next() called at [/data/www/libphutil/src/utils/PhutilBufferedIterator.php:86]
  #8 PhutilBufferedIterator::rewind() called at [/data/www/phabricator/fix-edges.php:12]

Any thoughts?

If the table isn't too large to fit in memory, you can do this:

- $rows = new LiskRawMigrationIterator($conn_w, 'edge');
+ $rows = queryfx_all($conn_w, 'SELECT * FROM edge');

Ok, so the script works now, and the edge gets deleted. But the "Unknown Object (Commit)" still appears in the differential.

It seems that the commit still exists in the phabricator_differential.differential_commit table.

Oh, sorry! That's not actually an edge, it's a weird table which predates edges. You should be able to use the same script, but:

  • Iterate over differential_commit instead.
  • Check the existence of commitPHID.
  • If it doesn't exist, delete all rows with that commitPHID.

Ok sure.

Is there some sort of "editor" class to do that (similar to PhabricatorEdgeEditor), or would I have to use execute raw SQL?

You can safely use raw SQL. WIth edges, there's a second half to the edge that the editor can clean up, but this table doesn't have any weird magic.

Awesome, this seems to work. Thanks!

If they aren't hurting anything, you can just leave them -- we might be able to clean them up more intelligently in the future. If they do start hurting something, you should be able to apply similar techniques to the above.

Ok, thanks for your help.