Page MenuHomePhabricator

Rebuilding CCs from Maniphest Transactions
Closed, ResolvedPublic

Description

I had deleted a user account in Phabricator but had not removed all references to the user beforehand (in this particular case, I did want to delete the user as oposed to disabling the account).

Whilst trying to fix stale references to the deleted user, I accidentally modified the ccPHIDs field for all rows in the maniphest_task table. I figured that I could essentially rebuild the ccPHIDs field by replaying the relevant Maniphest transactions. To do this, I used the following script:

#!/usr/bin/env php
<?php

require_once(__DIR__ . '/scripts/__init_script__.php');

$task_table = new ManiphestTask();
$conn_w = $task_table->establishConnection('w');

$rows = queryfx_all(
    $conn_w,
    'SELECT * FROM maniphest_transaction WHERE transactionType = %s ORDER BY dateCreated ASC',
    'ccs'
);
$conn_w->openTransaction();

foreach ($rows as $row) {
    $row_id = $row['id'];
    $task_id = $row['taskID'];

    echo "Replaying transaction {$row_id} (T{$task_id})...\n";

    queryfx(
        $conn_w,
        'UPDATE %T SET ccPHIDs = %s WHERE id = %d',
        $task_table->getTableName(),
        $row['newValue'],
        $task_id
    );
}

$conn_w->saveTransaction();
echo "Done.\n";

I just wanted to ask if there was a more general way to recover data by replaying transaction logs? And also ask if there are any gotchas that I have missed in my recovery script.

Event Timeline

joshuaspence assigned this task to epriestley.
joshuaspence raised the priority of this task from to Normal.
joshuaspence updated the task description. (Show Details)
joshuaspence added a subscriber: joshuaspence.

This seems OK to me. I think the one "gotcha" is that Herald currently adds CCs in a non-transactional way, so if Herald added CCs after the last transaction they would not be restored. This should be fixed when we get around to it (see T4484).

There's no code to replay transaction logs in the general case. I'd guess it would run into a lot of issues:

  • some transactions have side effects not captured directly in the transaction (e.g., some state changes in Differential are clear to humans, but implicit in the transaction record, and difficult to reconstruct);
  • the application-level "new" value for transactions and the storage level "new" value for transactions aren't always formatted the same way, and code doesn't exist to reverse this transformation currently;
  • the original object state immediately after creation isn't stored, so we don't have a starting point for playing transactions forward;
  • transaction logs don't have any code to be rewindable, so we can't play them backward to get back to the starting point (and this code would probably be hard/impossible to write for all the same reasons here except the lack of start state, and very difficult to maintain);
  • we keep data in very old transactions in good enough shape that we can show it, but don't always convert it to be applyable when there are format changes.

You could probably get, like, 60% of the way there for some objects by rewinding all the transactions in reverse order (with newValue = oldValue) and then replaying them forward (with newValue = newValue again) in forward order, but I think it would be challenging to get working well enough to be useful.

Fair enough.

Thanks for the insight.