Page MenuHomePhabricator

Moving a task on a workboard is very slow (and sometimes times out with a fatal error)
Closed, ResolvedPublic

Description

I'm trying to gather some more information from the user, but a user has reported that they received a server error when trying to move a task on a workboard. I see this in our PHP error logs:

[26-Mar-2015 16:52:42 Australia/Sydney] PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /usr/src/phabricator/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php on line 166
[26-Mar-2015 16:52:42 Australia/Sydney] >>> UNRECOVERABLE FATAL ERROR <<<

Maximum execution time of 30 seconds exceeded

/usr/src/phabricator/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php:166


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻
[26-Mar-2015 16:53:41 Australia/Sydney] PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /usr/src/libphutil/src/parser/PhutilTypeSpec.php on line 1909
[26-Mar-2015 16:53:41 Australia/Sydney] >>> UNRECOVERABLE FATAL ERROR <<<

Maximum execution time of 30 seconds exceeded

/usr/src/libphutil/src/parser/PhutilTypeSpec.php:1909


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻
[26-Mar-2015 16:55:39 Australia/Sydney] PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /usr/src/phabricator/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php on line 166
[26-Mar-2015 16:55:39 Australia/Sydney] >>> UNRECOVERABLE FATAL ERROR <<<

Maximum execution time of 30 seconds exceeded

/usr/src/phabricator/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php:166


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻

Turning on the debug time limit, I see this:

Request aborted by debug time limit after 15 seconds.

STACK TRACE
PhutilParserGenerator.php:835 PhabricatorStartup->onDebugTick()
PhutilTypeSpec.php:1909 PhutilParserGenerator->parseTokensWithTables()
PhutilTypeSpec.php:285 PhutilTypeSpec->parseTokens()
PhutilTypeSpec.php:155 PhutilTypeSpec->newFromString()
PhabricatorCursorPagedPolicyAwareQuery.php:233 PhutilTypeSpec->checkMap()
ManiphestTaskQuery.php:1112 PhabricatorCursorPagedPolicyAwareQuery->buildPagingClauseFromMultipleColumns()
ManiphestTaskQuery.php:303 ManiphestTaskQuery->buildPagingClause()
PhabricatorPolicyAwareQuery.php:227 ManiphestTaskQuery->loadPage()
PhabricatorPolicyAwareQuery.php:167 PhabricatorPolicyAwareQuery->execute()
ManiphestTransactionEditor.php:673 PhabricatorPolicyAwareQuery->executeOne()
ManiphestTransactionEditor.php:687 ManiphestTransactionEditor->getAdjacentSubpriority()
...<snip many identical frames>...
ManiphestTransactionEditor.php:687 ManiphestTransactionEditor->getAdjacentSubpriority()
PhabricatorProjectMoveController.php:128 ManiphestTransactionEditor->getAdjacentSubpriority()
AphrontController.php:33 PhabricatorProjectMoveController->processRequest()
AphrontApplicationConfiguration.php:196 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:121 AphrontApplicationConfiguration->processRequest()
index.php:19 AphrontApplicationConfiguration->runHTTPRequest()

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Workboards.
joshuaspence added a subscriber: joshuaspence.

OK, I've managed to reproduce this issue. It only happen in the "Sort by priority" view (not in the "Natural" view).

epriestley triaged this task as Normal priority.

Presuming resolved, but let me know if that didn't actually fix it. D12116 has some more specific discussion.

jhurwitz reopened this task as Open.Apr 22 2015, 10:05 PM
jhurwitz added a subscriber: jhurwitz.

I'm seeing a similar (possibly the same?) bug. We just updated our install this week, so I presume that we have D12116.

I have a workboard with 100 tasks total, and trying to move one causes a 500 with the response:

>>> UNRECOVERABLE FATAL ERROR <<<

Maximum execution time of 30 seconds exceeded

/phab_path/libphutil/src/symbols/PhutilSymbolLoader.php:184


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻

Smaller workboards are fine, and this workboard only breaks in sort by priority mode (works fine in natural sort mode).

Set debug.time-limit to a value like 15 and hit it again? That should give you a trace.

epriestley updated the task description. (Show Details)Apr 22 2015, 10:07 PM
Request aborted by debug time limit after 15 seconds.

STACK TRACE
PhabricatorStandardCustomField.php:40 PhabricatorStartup->onDebugTick()
ManiphestConfiguredCustomField.php:17 PhabricatorStandardCustomField->buildStandardFields()
PhabricatorCustomField.php:117 ManiphestConfiguredCustomField->createFields()
PhabricatorCustomField.php:71 PhabricatorCustomField->buildFieldList()
PhabricatorCursorPagedPolicyAwareQuery.php:803 PhabricatorCustomField->getObjectFields()
ManiphestTaskQuery.php:912 PhabricatorCursorPagedPolicyAwareQuery->getOrderableColumns()
PhabricatorCursorPagedPolicyAwareQuery.php:831 ManiphestTaskQuery->getOrderableColumns()
ManiphestTaskQuery.php:422 PhabricatorCursorPagedPolicyAwareQuery->buildOrderClause()
PhabricatorPolicyAwareQuery.php:227 ManiphestTaskQuery->loadPage()
PhabricatorPolicyAwareQuery.php:167 PhabricatorPolicyAwareQuery->execute()
PhabricatorCursorPagedPolicyAwareQuery.php:380 PhabricatorPolicyAwareQuery->executeOne()
ManiphestTaskQuery.php:962 PhabricatorCursorPagedPolicyAwareQuery->loadCursorObject()
PhabricatorCursorPagedPolicyAwareQuery.php:330 ManiphestTaskQuery->getPagingValueMap()
ManiphestTaskQuery.php:384 PhabricatorCursorPagedPolicyAwareQuery->buildPagingClause()
PhabricatorPolicyAwareQuery.php:227 ManiphestTaskQuery->loadPage()
PhabricatorPolicyAwareQuery.php:167 PhabricatorPolicyAwareQuery->execute()
ManiphestTransactionEditor.php:661 PhabricatorPolicyAwareQuery->executeOne()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
ManiphestTransactionEditor.php:697 ManiphestTransactionEditor->getAdjacentSubpriority()
PhabricatorProjectMoveController.php:128 ManiphestTransactionEditor->getAdjacentSubpriority()
AphrontController.php:33 PhabricatorProjectMoveController->processRequest()
AphrontApplicationConfiguration.php:196 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:121 AphrontApplicationConfiguration->processRequest()
index.php:19 AphrontApplicationConfiguration->runHTTPRequest()

Ah, okay. Yeah, looks similar. I'll take a look, thanks!

Just to confirm, you're ahead of rP8efdc4aa? I'll keep digging, but I can't immediately reproduce this at HEAD.

Maybe it's naive of me to expect that our Monday update actually updated to the then-latest HEAD. Let me check and get back to you.

I suspect I'm just not hitting the right edge case, but this stuff did churn a lot recently so I just wanted to double check that it's definitely not me testing different code.

We are at rPd8ab5f59, which is ahead of rP8efdc4aa.

D12511 is a possible fix, although I was only sort of able to reproduce something sort-of similar to the issue you hit.

In the underlying code, we're approximating a double-linked list (which we don't have a reasonable way to represent in MySQL) with a double and sort of faking our way through things, which is why this is a bit of a mess.

Let me know if that fixes it when you have a chance? Still not totally sure I've nailed this down but it shouldn't be possible for it to recurse now.

(If you didn't restore debug.time-limit to 0, you should do that -- leaving it enabled imposes a global performance penalty.)

jhurwitz added a project: Restricted Project.Jun 6 2015, 3:38 AM

Sorry for the delay, we only just upgraded.

This is still broken for us, with Phabricator at rP1e4e12. With debug.time-limit at 15s, the AJAX request fails with:

Request aborted by debug time limit after 15 seconds.

STACK TRACE
AphrontBaseMySQLDatabaseConnection.php:168 PhabricatorStartup->onDebugTick()
queryfx.php:6 AphrontBaseMySQLDatabaseConnection->executeRawQuery()
ManiphestTransactionEditor.php:714 queryfx()
PhabricatorProjectMoveController.php:128 ManiphestTransactionEditor->getAdjacentSubpriority()
AphrontController.php:33 PhabricatorProjectMoveController->processRequest()
AphrontApplicationConfiguration.php:226 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:140 AphrontApplicationConfiguration->processRequest()
index.php:21 AphrontApplicationConfiguration->runHTTPRequest()

At rPee8de4a, this is (mostly) no longer timing out -- but the AJAX request is taking anywhere from 7 to 29 seconds to complete.

jhurwitz renamed this task from Fatal error while moving a task on a workboard to Moving a task on a workboard is very slow (and sometimes times out with a fatal error).Jun 16 2015, 9:11 PM
jhurwitz moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

I'm suspicious that this and T8522 may share a root cause (something else being slow) -- unless T8522 evaporated after the database recovery and seems attributable to corruption or ghosts or something.

So my inclination is to start by just writing a better guide for using the existing profiling/performance tools (per T8522), which is probably an hour of work, and then seeing if that turns up anything useful/obvious. If that isn't fruitful, or you just want this fixed ASAP, let's schedule a time when I can come take a look at exactly what's happening? This one might be related to something unusual in your data but T8522 I have zero ideas about and "some queries take 20+ seconds because of <some other root problem>" might explain both of them, and possibly other issues you've hit.

I just updated T8522 with some more information -- we had database issues that night (possibly related to the later corruption, unsure) and I can't repro it.

That being said, I think:

  • a better guide for using profiling/performance tools would be a great idea
  • this doesn't need to be fixed ASAP, so this can wait on the guide

A broad guide for this stuff is available here now:

https://secure.phabricator.com/book/phabricator/article/performance/

To briefly summarize:

  • Use Multimeter to understand load across the install.
  • Use DarkConsole + Services Tab to understand individual request performance at a coarse level. This is probably the best tool to start with for this problem.
  • Use XHProf to understand request performance at a granular level.

These tools all give you the lay of the land and can't identify specific problems without operator interpretation, so there's a good chance they won't be much help unless coupled with a detailed knowledge of Phabricator's query strategy and call map. However, I expect they'll at least let you narrow down anything like this which is hugely out of line. In particular, DarkConsole + Services can tell you if that 20+ second page is spending time in queries or in PHP application code by showing you how much time we spent in each query. If there's one on there that has like a 19,234,283 us duration, that's pretty clearly the root of the problem.

So, take a look at that stuff when you have a chance and see if anything sticks out as a possible clue?

According to DarkConsole's services tab, query takes 12s overall.

I'd love a way to export DarkConsole info to a .csv file. I want to quickly answer the question "what's the slowest query?" but this is difficult because of the sheer number of queries.

There are 3363 queries of the form:

UPDATE `maniphest_task` SET subpriority = 0.027180371630429 WHERE id = 43680

Each one takes about 3-8ms. Since 0.003s * 3363 is about 10s I'm going to suspect that the problem is just the number of queries -- but without an easy way to sort by duration, or to export to .csv, I can't promise that I've looked through all queries so it's very possible that there's a single query taking many seconds and I didn't notice it.

I'm unsure why there are 3363 queries. When I sum up the numbers in the column headers, I get 128. When I change the filter to "all tasks" and sum the column numbers, I get 241. When I execute a Maniphest custom query for all tasks in the project, I get 293. So I'm unsure exactly how many tasks are on this workboard, but it's definitely not in the thousands.

Now that I think about it, one possible explanation is if we're executing O(N^2) queries for each workboard column of N open tasks. (The sum of the squares of the number of open tasks in each column is 2924.)

Pawka added a subscriber: Pawka.Aug 28 2015, 7:29 AM

I want to resolve T8588 before pursuing this to make sure the issues aren't conflated, although I think this isn't related.

This is primarily difficult to move forward on because I haven't been able to reproduce it, even when using the unit test framework to generate pathological test cases (enormous numbers of tasks with colliding subpriorities, etc).

chad added a subscriber: chad.Feb 7 2016, 4:11 AM

Given all the refactoring, is this still relevant?

Yes, in theory -- this is a problem with assigning sub-priorities as a side effect of dragging cards around on a priority-ordered workboard. That code hasn't been touched.

kbrkbr moved this task from Backlog to Workboards v2 on the Workboards board.Feb 19 2016, 8:35 AM
athulj moved this task from Future Work to Workboards v2 on the Workboards board.

We're still seeing this issue pretty severely, on our shiny upgraded 2017 Week 17 version on our shiny new Phabricator host. So I took a look just now with DarkConsole.

I went to a moderately-sized workboard (a few dozen tasks), with "Sort By Priority" set, turned on DarkConsole, and moved a task from one column to another. It took something like a minute. I went into the /project/move/2532/ tab (supertab?) on the left, which appeared under "Main Request".

This in the "Services" tab looks like kind of a smoking gun:

Type		Count	Total Cost	Page Weight
kvcache-get	15		523 us	0.0%
connect		11	17,770 us	0.0%
query		17,536	42,223,020 us	87.6%
kvcache-set	2		52 us	0.0%
All Services	17,564	42,241,368 us	87.6%
Entire Page	0	48,196,499 us	100.0%

And below that, a huge number of rows that look like this:

Event	Start		Duration	Details										Analysis
...
query	+602 ms		1,458 us	UPDATE `maniphest_task` SET subpriority = 0.010067506249605 WHERE id = 133786	Disabled
...
query	+1,800 ms	2,392 us	UPDATE `maniphest_task` SET subpriority = 0.0092034124888917 WHERE id = 135796	Disabled
query	+1,803 ms	3,878 us	UPDATE `maniphest_task` SET subpriority = 0.0092011143671876 WHERE id = 136225	Disabled
query	+1,807 ms	2,948 us	UPDATE `maniphest_task` SET subpriority = 0.0091988162454836 WHERE id = 136213	Disabled
...
query	+47,594 ms	1,227 us	UPDATE `maniphest_task` SET subpriority = 0.0099951154159282 WHERE id = 133625	Disabled
...

It looks like all the queries in my ellipses in between those SET subpriority queries are just more queries of the same form.

Does this help for debugging the issue? Happy to answer follow-up questions.

Staring a bit more at the data I just posted, it occurs to me that it seems kind of sluggish for an UPDATE like those -- one row, indexed by what I assume is the PRIMARY KEY -- to take 1-4ms or so, as these are. There's probably a DB (or network-to-DB?) performance issue there that we'd benefit broadly from addressing.

Which may explain why this isn't such a big deal on other installs (like this upstream install), if their small-update DB performance is better. Still, 17k queries is a lot and would be a good thing to fix in the code too.

It looks like even after D12166, the subpriority-assignment logic doesn't do a great job of spreading our subpriorities. P2053 shows that very many of our tasks have subpriorities near 0 (in fact, the vast majority of our tasks; a separate query without the > 1 limit indicates that only about 4k tasks have subpriority neighborhoods of their very own.)

It doesn't seem likely that this will resolve itself. As the queries Greg pasted indicate, the subpriority updates seem to be shuffling many tasks a small distance, rather than spreading them out meaningfully.

Batching the subpriority updates into fewer, larger queries might be a helpful bandaid.

This and T8588 have been a multi-year fishing expedition which I'm far from thrilled about, but I'll deal with this. 🐳

I'm not 100% sure, but from reading the subpriority movement code, it seems like our bell-curve distribution is self-perpetuating--when a block of tasks are moved, most of the time the next task is still relatively close, so $delta is small. I think this might be resolved by a big one-time readjustment to distribute tasks evenly throughout the space. I don't have an intuition on whether this distribution will redevelop over time, or if it's an artifact from the time before D12166.

  • I believe D17959 should resolve this.
  • The new algorithm tends overall toward dispersing tasks over time, and clusters with more tasks should tend to cover an larger range, roughly linearly.
  • The first few drags after the new algorithm is available may be very slow, since they'll spread the existing clusters out. After each large cluster has dispersed, all future non-pathological operations should be fast. (One pathological operation is to insert a million tasks at the same priority without performing any dragging, then perform a drag.) It's possible that cluster dispersal is fast enough that this isn't a big deal, although I may also just write a migration to normalize task subpriorities for everyone in advance of the new stuff landing so we pay this cost upfront rather than on-first-interacion.
  • master has a bunch of churn already, it's late in the week, and this change is moderately risky (it's hard to undo damage if I got something wrong), so I don't expect it to make it into this week's stable release.

I shored up D17959 a bit, landed it, and deployed it to secure.phabricator.com, then spent some time fiddling with workboards here, but couldn't find any issues with it. We have ~6,500 "wishlist" tasks on this server and it distributed all tasks near 0 (approximately 2,700 tasks) evenly across the range [4096, 4096] fast enough that I didn't notice it. I'm currently inclined not to write a dedicated migration since it seems plausible that all existing installs will survive the first dispersal without much fanfare.

This change is currently available in master, and will promote to stable next week (on or around May 26th). Let us know if things look better once you pick it up.

Super! We'll report results when we've got it deployed.

chad added a comment.May 20 2017, 12:37 AM

That'll be 6 beers pls.

alexmv closed this task as Resolved.May 27 2017, 4:32 AM
alexmv added a subscriber: alexmv.

Based on two points of anecdata so far, things indeed seem snappier with D17959.

Thank you!

🍺 🍺 🍺
🍺 🍺 🍺
(coupon redeemable upon meatspace co-location)