Improve efficiency of worker task GC for huge loads
2e09a93dc123
Actions

Description

Improve efficiency of worker task GC for huge loads

Summary:
Fixes T9808.

An instance imported a very large repository, generating approximately 4 million tasks over the course of a few days. A week later, these tasks started expiring and became candidates for garbage collection.

The GC works by deleting 100 rows at at time over and over again. It finds the rows it's going to delete by querying for old rows.

Currently, this query generates a WHERE dateCreated < X ORDER BY id DESC query. This query can not efficiently execute using a single key, because it relies on dateCreated order to find the rows, then on id order to sort them. With a table with 4M rows, this is slow.

This would still be OK, except that the query has to execute a lot of times since it only deletes 100 rows each time. Particularly, it needs to execute a total of ~40K times.

Instead, generate WHERE dateCreated < X ORDER BY dateCreated DESC, id DESC. This should have the same effect in general and the GC definitely doesn't care about the difference, but it should be more efficient at large scales.

Test Plan:
I had to TRUNCATE the problem table so I don't have a perfect repro to completely convincingly test this anymore. Both queries behave fine at small scales, which is why we haven't seen this before.

I was able to run the newer query in production before I nuked the table and have it complete in a reasonable amount of time, while the old query hung longer than I wanted to wait (several minutes?). The query plan for the new query was also a good one, while the query plan for the old query was terrible.

I loaded the daemon console and ran bin/garbage collect --collector worker.tasks --trace. I verified the queries looked reasonable and produced reasonable results in production.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9808

Differential Revision: https://secure.phabricator.com/D14505