Worker task table has some remaining awkward keys
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Nov 23 2014, 6:48 PM

Description

See IRC.

With 1M tasks in queue, the query to pull a task off the top of the stack turns into a garbage mess full of tablescans.
Some of the keying assumes more failed tasks than queued tasks. This is technically true on a normal install (perhaps dozens of failed tasks, ~0 queued tasks) but the scale is irrelevant. All realistic installs with more than 100K rows should have 99% of them in queue.
The keys on this table are also a mess.

Revisions and Commits

rP Phabricator
	D20175	rP4b10bc2b643d Correct schema irregularities (including weird keys) with worker task tables
	D10895	rPb5f7e9eec60a Reverse meaning of task priority column
	D10894	rP7e1c31218395 Add `bin/worker flood`, for flooding the task queue with work

Related Objects

Mentioned In: T13253: Plans: Daemon Status Reporting
T10756: Make daemons work correctly no matter where they are or how many copies are running
T6768: Worker queue lease names are unwieldy and could be better implemented
T6667: Destroying a repository does not remove the commit worker tasks from the queue
Mentioned Here: D10895: Reverse meaning of task priority column

Event Timeline

epriestley created this task.Nov 23 2014, 6:48 PM

epriestley claimed this task.

epriestley raised the priority of this task from to High.

epriestley updated the task description. (Show Details)

epriestley added projects: High Support Impact, Wikimedia.

epriestley added subscribers: epriestley, • chasemp, 20after4.

epriestley added a revision: D10894: Add `bin/worker flood`, for flooding the task queue with work.Nov 23 2014, 7:10 PM

All realistic installs with more than 100K rows should have 99% of them in queue.

We also don't even use this key when selecting already-failed tasks.

epriestley added a revision: D10895: Reverse meaning of task priority column.Nov 23 2014, 7:34 PM

D10895 appears to be a sufficient fix. I want to adjust some other keys on this table eventually so I'm going to leave this open, but I'll downgrade the priority once the dust settles.

Just to confirm I'm not crazy, here's some supporting documentation for "mixing ASC + DESC makes the key unusable":

http://explainextended.com/2010/11/02/mixed-ascdesc-sorting-in-mysql/

Here's a blog post covering exactly this problem (priority column + id column in a queue) and arriving at the same conclusion and solution:

http://beerpla.net/2009/03/18/mysql-indexing-considerations-of-implementing-a-priority-field-in-your-application/

aklapper added a subscriber: aklapper.Nov 23 2014, 7:43 PM

epriestley added a commit: rP7e1c31218395: Add `bin/worker flood`, for flooding the task queue with work.Nov 24 2014, 7:10 PM

epriestley added a commit: rPb5f7e9eec60a: Reverse meaning of task priority column.

Let us know if you're still seeing issues with large queues after those patches, but I think the meat of this issue is fixed.

epriestley mentioned this in T6667: Destroying a repository does not remove the commit worker tasks from the queue.Dec 1 2014, 11:50 AM

epriestley mentioned this in T6768: Worker queue lease names are unwieldy and could be better implemented.Dec 16 2014, 11:21 PM

epriestley moved this task from Backlog to Availability on the Daemons board.Apr 8 2016, 9:45 PM

Herald added a subscriber: eadler. · View Herald TranscriptApr 8 2016, 9:45 PM

epriestley mentioned this in T10756: Make daemons work correctly no matter where they are or how many copies are running.Apr 8 2016, 9:46 PM

epriestley moved this task from Availability to vNext on the Daemons board.Feb 21 2017, 12:37 AM

joshuaspence added a subscriber: joshuaspence.Feb 21 2017, 8:26 AM

epriestley mentioned this in T13253: Plans: Daemon Status Reporting.Feb 15 2019, 12:24 PM

Both tables have this key:

key_object <objectPHID>

This key is useful to find tasks related to a particular object, and correct as-is.

The "Archive" table has these keys:

dateCreated <dateCreated>
leaseOwner <leaseOwner, priority, id>
key_modified <dateModified>

We use key_modified to build the "Recently Completed Tasks" panel.
We use the dateCreated key to GC the table.
I can't immediately identify anything that hits the leaseOwner query in the "Archive" table.

The "Active" table has these keys:

dataID <dataID>
taskClass <taskClass>
leaseExpires <leaseExpires>
leaseOwner <leaseOwner(16)>
key_failuretime <failureTime>
leaseOwner2 <leaseOwner, priority, id>

The dataID key is there to enforce a unique constraint. However, that's pointless and this table is high-volume. The key isn't interesting otherwise, and isn't useful to drive queries. It should probably be removed.
The taskClass key drives the "Queued Tasks" panel.
The leaseExpires key drives getting tasks with expired leases run again.
The leaseOwner key is a subset of leaseOwner_2 and should be removed.
The key_failuretime key drives the "failures" row in the daemon console. This could be extracted from the table but is reasonable for now.
The leaseOwner_2 key has a silly legacy name and should ideally be renamed.

Upshot:

Drop archive key leaseOwner.
Drop active key dataID.
Drop active key leaseOwner.
Rename active key leaseOwner_2 to key_owner or similar.

epriestley added a revision: D20175: Correct schema irregularities (including weird keys) with worker task tables.Feb 15 2019, 12:54 PM

epriestley closed this task as Resolved by committing rP4b10bc2b643d: Correct schema irregularities (including weird keys) with worker task tables.Feb 16 2019, 3:17 AM

epriestley added a commit: rP4b10bc2b643d: Correct schema irregularities (including weird keys) with worker task tables.

Worker task table has some remaining awkward keysClosed, ResolvedPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Worker task table has some remaining awkward keys
Closed, ResolvedPublic
Actions