Improve performance of bulk PHID assignment, particularly for "phabricator:20210215.changeset.02.phid-populate.php"
Closed, ResolvedPublic


See PHI2003. An install with several million changesets reports uncomfortably slow application of phabricator:20210215.changeset.02.phid-populate.php.

The upstream apply rate for this patch was 153s for 347,385, i.e. 2,270 changesets/second or a bit faster than 10 minutes per million changesets.

Improving the performance of this patch is a bit tricky because bulk update queries aren't trivial in MySQL. You can't INSERT ... VALUES ... ON DUPLICATE KEY UPDATE ... VALUE(...) without providing default values for all columns with no default value:

mysql> INSERT INTO differential_changeset (id, phid) VALUES (1, 'aaa') ON DUPLICATE KEY UPDATE phid = VALUES(phid);
ERROR 1364 (HY000): Field 'diffID' doesn't have a default value

Since this is running inside the migration context, it's fine to use a temporary table and update with a JOIN.