Page MenuHomePhabricator

Worker task table has some remaining awkward keys
Closed, ResolvedPublic

Description

See IRC.

  • With 1M tasks in queue, the query to pull a task off the top of the stack turns into a garbage mess full of tablescans.
  • Some of the keying assumes more failed tasks than queued tasks. This is technically true on a normal install (perhaps dozens of failed tasks, ~0 queued tasks) but the scale is irrelevant. All realistic installs with more than 100K rows should have 99% of them in queue.
  • The keys on this table are also a mess.

Event Timeline

epriestley updated the task description. (Show Details)
epriestley raised the priority of this task from to High.
epriestley claimed this task.

All realistic installs with more than 100K rows should have 99% of them in queue.

We also don't even use this key when selecting already-failed tasks.

D10895 appears to be a sufficient fix. I want to adjust some other keys on this table eventually so I'm going to leave this open, but I'll downgrade the priority once the dust settles.

Just to confirm I'm not crazy, here's some supporting documentation for "mixing ASC + DESC makes the key unusable":

http://explainextended.com/2010/11/02/mixed-ascdesc-sorting-in-mysql/

Here's a blog post covering exactly this problem (priority column + id column in a queue) and arriving at the same conclusion and solution:

http://beerpla.net/2009/03/18/mysql-indexing-considerations-of-implementing-a-priority-field-in-your-application/

epriestley renamed this task from Selection query on task table is garbage with many unlocked tasks to Worker task table has some remaining awkward keys.Nov 24 2014, 8:16 PM
epriestley lowered the priority of this task from High to Low.

Let us know if you're still seeing issues with large queues after those patches, but I think the meat of this issue is fixed.

epriestley moved this task from Backlog to Availability on the Daemons board.Apr 8 2016, 9:45 PM
epriestley moved this task from Availability to vNext on the Daemons board.Feb 21 2017, 12:37 AM

Both tables have this key:

  • key_object <objectPHID>

This key is useful to find tasks related to a particular object, and correct as-is.


The "Archive" table has these keys:

  • dateCreated <dateCreated>
  • leaseOwner <leaseOwner, priority, id>
  • key_modified <dateModified>

We use key_modified to build the "Recently Completed Tasks" panel.
We use the dateCreated key to GC the table.
I can't immediately identify anything that hits the leaseOwner query in the "Archive" table.


The "Active" table has these keys:

  • dataID <dataID>
  • taskClass <taskClass>
  • leaseExpires <leaseExpires>
  • leaseOwner <leaseOwner(16)>
  • key_failuretime <failureTime>
  • leaseOwner2 <leaseOwner, priority, id>

The dataID key is there to enforce a unique constraint. However, that's pointless and this table is high-volume. The key isn't interesting otherwise, and isn't useful to drive queries. It should probably be removed.
The taskClass key drives the "Queued Tasks" panel.
The leaseExpires key drives getting tasks with expired leases run again.
The leaseOwner key is a subset of leaseOwner_2 and should be removed.
The key_failuretime key drives the "failures" row in the daemon console. This could be extracted from the table but is reasonable for now.
The leaseOwner_2 key has a silly legacy name and should ideally be renamed.


Upshot:

  • Drop archive key leaseOwner.
  • Drop active key dataID.
  • Drop active key leaseOwner.
  • Rename active key leaseOwner_2 to key_owner or similar.