Page MenuHomePhabricator

Version and lock search index rebuilds to reduce the cost of runaway writes
Closed, ResolvedPublic

Description

After a runaway bot wrote hundreds of comments to a single object, an install reported poor queue performance while processing the reindex tasks.

  • We could probably avoid doing most of this work by versioning the index, so that we can just skip a rebuild if it won't do anything.
  • We could probably reduce the cost of contention by explicitly locking objects during indexing.

Original Description

I'm not sure how to report this issue properly as this might sound a bit vague, but I have noticed that my taskmaster daemons (with a pool size of 4) consistently tried to execute the exact same MySQL queries as if these daemon served like a sort of replica. In my use case, this caused the "remaining" 3 daemons in the same pool to be rather useless, to say the least, as they are simply waiting for the table lock to be released in the search_documentfield table. (And after the lock release, to eventually to do the exact same thing...)

Please see the screenshot for an example:

issue.png (84×1 px, 14 KB)

The queries that I have been seeing are DELETE and INSERT queries in the search_documentfield MySQL table.
These queries seems to correspond with https://github.com/phacility/phabricator/blob/master/src/applications/search/engine/PhabricatorMySQLSearchEngine.php#L36

Nevertheless, each time the taskmaster daemon took about 30-320 seconds to process a relatively simple DELETE and INSERT query. And each time, the other taskmaster daemon threads are waiting for the lock to be released, in order to attempt the exact same query.
I've waited and manually have killed the queries in order to confirm this behavior (where again, another DELETE or INSERT query appeared which seems to be 100% replicated by the other taskmaster daemons).

So, eventually, I've set the daemon pool size to 1 to see whether this would have any effect. The same kind of queries still appeared - as expected - and were not replicated by any other taskmaster daemon threads - again, as expected, as they simply were not created anymore). However, the same type of queries, now handled by merely one taskmaster daemon, were suddenly and still remain to be super fast!

However, the description when modifying the pool size is such:

Raising this can increase the maximum throughput of the task queue. The pool will automatically scale down when unutilized.

In my case it did not increase the maximum throughput at all and I am not sure if the issue which I have encountered is a known issue / by design.
However, I can't seem to explain why - after setting the pool size to 1 - everything seems to be in place again and perhaps this is a bug?

Requesting some background information
I've searched like crazy in order to find any known or related issues of this. I've also spit through the documentation and configuration options to gather some useful information, but no dice. I assume this daemon and the queries are part of the reindexing process of the MySQL search engine (which I indeed am using, Elasticsearch is not configured) and I would love to have some background information on how why this process need to exists and how it works in order to understand Phabricator better.

Event Timeline

vintrax updated the task description. (Show Details)
vintrax added a project: Daemons.
vintrax added a subscriber: vintrax.

I can't reproduce this. Here's what I did:

  • I ran bin/search index --type TASK --background to queue up some search indexing tasks.
  • I made a local logging change to the taskmaster daemon (see below).
  • I ran bin/phd launch 8 task to run the queue at parallelism 8.
  • I timed how long the queue took to empty.
  • I repeated the process, but used bin/phd launch 1 task and timed how long the queue took to empty.

Here's the patch I applied:

diff --git a/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php b/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php
index 2b54b9a..182e90d 100644
--- a/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php
+++ b/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php
@@ -17,6 +17,10 @@ final class PhabricatorTaskmasterDaemon extends PhabricatorDaemon {
           $id = $task->getID();
           $class = $task->getTaskClass();
 
+          Filesystem::appendFile(
+            '/tmp/tasks.log',
+            pht("[%d] Working on task %d.\n", getmypid(), $id));
+
           $this->log(pht('Working on task %d (%s)...', $id, $class));
 
           $task = $task->executeTask();

Here's the log that was generated with launch 8 task:

8-tasks.log
[19044] Working on task 499574.
[19043] Working on task 496659.
[19045] Working on task 499575.
[19042] Working on task 499576.
[19047] Working on task 499577.
[19048] Working on task 499578.
[19046] Working on task 493740.
[19049] Working on task 499579.
[19042] Working on task 499580.
[19048] Working on task 499581.
[19044] Working on task 499582.
[19049] Working on task 499583.
[19045] Working on task 499584.
[19042] Working on task 499585.
[19047] Working on task 499586.
[19048] Working on task 499587.
[19044] Working on task 499588.
[19049] Working on task 499589.
[19044] Working on task 499590.
[19047] Working on task 499591.
[19048] Working on task 499592.
[19044] Working on task 499593.
[19049] Working on task 499594.
[19047] Working on task 499595.
[19049] Working on task 499596.
[19048] Working on task 499597.
[19044] Working on task 499598.
[19047] Working on task 499599.
[19048] Working on task 499600.
[19049] Working on task 499601.
[19044] Working on task 499602.
[19047] Working on task 499603.
[19048] Working on task 499604.
[19044] Working on task 499605.
[19049] Working on task 499606.
[19047] Working on task 499607.
[19044] Working on task 499608.
[19047] Working on task 499609.
[19048] Working on task 493741.
[19044] Working on task 499610.
[19047] Working on task 499611.
[19044] Working on task 499612.
[19047] Working on task 499613.
[19044] Working on task 499614.
[19047] Working on task 499615.
[19044] Working on task 499616.
[19047] Working on task 499617.
[19044] Working on task 499618.
[19047] Working on task 493742.
[19044] Working on task 499619.
[19044] Working on task 499620.
[19044] Working on task 499621.
[19044] Working on task 499622.
[19044] Working on task 499623.
[19044] Working on task 499624.
[19044] Working on task 499625.
[19044] Working on task 499626.
[19044] Working on task 499627.
[19044] Working on task 499628.
[19044] Working on task 499629.
[19044] Working on task 499630.
[19044] Working on task 499631.
[19044] Working on task 499632.
[19044] Working on task 499633.
[19044] Working on task 499634.
[19044] Working on task 499635.
[19044] Working on task 499636.
[19044] Working on task 499637.
[19044] Working on task 499638.
[19044] Working on task 499639.
[19044] Working on task 499640.
[19044] Working on task 499641.
[19044] Working on task 499642.
[19044] Working on task 499643.
[19044] Working on task 499644.
[19044] Working on task 499645.
[19044] Working on task 499646.
[19044] Working on task 499647.
[19044] Working on task 499648.
[19044] Working on task 499649.
[19044] Working on task 499650.
[19044] Working on task 499651.
[19044] Working on task 499652.
[19044] Working on task 499653.
[19044] Working on task 499654.
[19044] Working on task 499655.
[19044] Working on task 499656.
[19044] Working on task 499657.
[19044] Working on task 499658.
[19044] Working on task 499659.
[19044] Working on task 499660.
[19044] Working on task 499661.
[19044] Working on task 499662.
[19044] Working on task 499663.
[19044] Working on task 499664.
[19044] Working on task 499665.
[19044] Working on task 499666.
[19044] Working on task 499667.
[19044] Working on task 499668.
[19044] Working on task 499669.
[19044] Working on task 499670.
[19044] Working on task 499671.
[19044] Working on task 499672.
[19044] Working on task 499673.
[19049] Working on task 499674.
[19049] Working on task 499675.
[19049] Working on task 499676.
[19044] Working on task 499677.
[19049] Working on task 499678.
[19044] Working on task 499679.
[19049] Working on task 499680.
[19044] Working on task 493743.
[19049] Working on task 493744.
[19051] Working on task 493914.
[19052] Working on task 493915.
[19050] Working on task 493917.

Here's the log that was generated with launch 1 task:

1-task.log
[19069] Working on task 499681.
[19069] Working on task 499682.
[19069] Working on task 499683.
[19069] Working on task 499684.
[19069] Working on task 499685.
[19069] Working on task 499686.
[19069] Working on task 499687.
[19069] Working on task 499688.
[19069] Working on task 499689.
[19069] Working on task 499690.
[19069] Working on task 499691.
[19070] Working on task 499692.
[19071] Working on task 499693.
[19071] Working on task 499694.
[19071] Working on task 499695.
[19071] Working on task 499696.
[19071] Working on task 499697.
[19071] Working on task 499698.
[19071] Working on task 499699.
[19071] Working on task 499700.
[19071] Working on task 499701.
[19071] Working on task 499702.
[19071] Working on task 499703.
[19071] Working on task 499704.
[19071] Working on task 499705.
[19071] Working on task 499706.
[19071] Working on task 499707.
[19071] Working on task 499708.
[19071] Working on task 499709.
[19071] Working on task 499710.
[19071] Working on task 499711.
[19071] Working on task 499712.
[19071] Working on task 499713.
[19071] Working on task 499714.
[19071] Working on task 499715.
[19071] Working on task 499716.
[19071] Working on task 499717.
[19071] Working on task 499718.
[19071] Working on task 499719.
[19071] Working on task 499720.
[19071] Working on task 499721.
[19071] Working on task 499722.
[19071] Working on task 499723.
[19071] Working on task 499724.
[19071] Working on task 499725.
[19071] Working on task 499726.
[19071] Working on task 499727.
[19071] Working on task 499728.
[19071] Working on task 499729.
[19071] Working on task 499730.
[19071] Working on task 499731.
[19071] Working on task 499732.
[19071] Working on task 499733.
[19071] Working on task 499734.
[19071] Working on task 499735.
[19071] Working on task 499736.
[19071] Working on task 499737.
[19071] Working on task 499738.
[19071] Working on task 499739.
[19071] Working on task 499740.
[19071] Working on task 499741.
[19071] Working on task 499742.
[19071] Working on task 499743.
[19071] Working on task 499744.
[19071] Working on task 499745.
[19071] Working on task 499746.
[19071] Working on task 499747.
[19071] Working on task 499748.
[19071] Working on task 499749.
[19071] Working on task 499750.
[19071] Working on task 499751.
[19071] Working on task 499752.
[19071] Working on task 499753.
[19071] Working on task 499754.
[19071] Working on task 499755.
[19071] Working on task 499756.
[19071] Working on task 499757.
[19071] Working on task 499758.
[19071] Working on task 499759.
[19071] Working on task 499760.
[19071] Working on task 499761.
[19071] Working on task 499762.
[19071] Working on task 499763.
[19071] Working on task 499764.
[19071] Working on task 499765.
[19071] Working on task 499766.
[19071] Working on task 499767.
[19071] Working on task 499768.
[19071] Working on task 499769.
[19071] Working on task 499770.
[19071] Working on task 499771.
[19071] Working on task 499772.
[19071] Working on task 499773.
[19071] Working on task 499774.
[19071] Working on task 499775.
[19071] Working on task 499776.
[19071] Working on task 499777.
[19071] Working on task 499778.
[19071] Working on task 499779.
[19071] Working on task 499780.
[19071] Working on task 499781.
[19071] Working on task 499782.
[19071] Working on task 499783.
[19071] Working on task 499784.
[19071] Working on task 499785.
[19071] Working on task 499786.
[19071] Working on task 499787.
[19071] Working on task 499788.
[19071] Working on task 499789.
[19071] Working on task 499790.
[19071] Working on task 499791.
[19071] Working on task 499792.
[19071] Working on task 499793.
[19071] Working on task 499794.
[19071] Working on task 499795.
[19071] Working on task 499796.
[19071] Working on task 499797.
[19071] Working on task 499798.
[19104] Working on task 499799.
[19105] Working on task 499800.
[19105] Working on task 499801.
[19105] Working on task 499802.
[19105] Working on task 499803.
[19105] Working on task 499804.
[19105] Working on task 499805.
[19105] Working on task 499806.
[19105] Working on task 499807.
[19105] Working on task 499808.
[19105] Working on task 499809.
[19105] Working on task 499810.
[19105] Working on task 499811.
[19105] Working on task 499812.
[19105] Working on task 499813.
[19105] Working on task 499814.
[19105] Working on task 499815.
[19105] Working on task 499816.
[19105] Working on task 499817.
[19105] Working on task 499818.
[19105] Working on task 499819.
[19105] Working on task 499820.
[19105] Working on task 499821.
[19105] Working on task 499822.
[19105] Working on task 499823.
[19105] Working on task 499824.
[19105] Working on task 499825.
[19105] Working on task 499826.
[19105] Working on task 499827.
[19105] Working on task 499828.
[19105] Working on task 499829.
[19105] Working on task 499830.
[19105] Working on task 499831.
[19105] Working on task 499832.
[19105] Working on task 499833.
[19105] Working on task 499834.
[19105] Working on task 499835.
[19105] Working on task 499836.
[19105] Working on task 499837.
[19105] Working on task 499838.
[19105] Working on task 499839.
[19105] Working on task 499840.
[19105] Working on task 499841.
[19105] Working on task 499842.
[19105] Working on task 499843.
[19105] Working on task 499844.
[19105] Working on task 499845.
[19105] Working on task 499846.
[19105] Working on task 499847.
[19105] Working on task 499848.
[19105] Working on task 499849.
[19105] Working on task 499850.
[19105] Working on task 499851.
[19105] Working on task 499852.
[19105] Working on task 499853.
[19105] Working on task 499854.
[19105] Working on task 499855.
[19105] Working on task 499856.
[19105] Working on task 499857.
[19105] Working on task 499858.
[19105] Working on task 499859.
[19105] Working on task 499860.
[19105] Working on task 499861.
[19105] Working on task 499862.
[19105] Working on task 499863.
[19105] Working on task 499864.
[19105] Working on task 499865.
[19105] Working on task 499866.
[19105] Working on task 499867.
[19105] Working on task 499868.
[19105] Working on task 499869.
[19105] Working on task 499870.
[19105] Working on task 499871.
[19105] Working on task 499872.
[19105] Working on task 499873.
[19105] Working on task 499874.
[19105] Working on task 499875.
[19105] Working on task 499876.
[19105] Working on task 499877.
[19105] Working on task 499878.
[19105] Working on task 499879.
[19105] Working on task 499880.
[19105] Working on task 499881.
[19105] Working on task 499882.
[19105] Working on task 499883.
[19105] Working on task 499884.
[19105] Working on task 499885.
[19105] Working on task 499886.
[19105] Working on task 499887.
[19105] Working on task 499888.
[19105] Working on task 499889.
[19105] Working on task 499890.
[19105] Working on task 499891.
[19105] Working on task 499892.
[19105] Working on task 499893.
[19105] Working on task 499894.
[19105] Working on task 499895.
[19105] Working on task 499896.
[19105] Working on task 499897.
[19105] Working on task 499898.
[19105] Working on task 499899.
[19105] Working on task 499900.
[19105] Working on task 499901.
[19105] Working on task 499902.
[19105] Working on task 499903.
[19105] Working on task 499904.
[19105] Working on task 499905.

I only have 107 tasks locally so the timings weren't very good. I repeated the process, but ran search index --type TASK --background repeatedly until I had about 1,000 tasks.

With launch 8 task, this queue took about 11 seconds to process. With launch 1 task, this queue took about 108 seconds to process.

The timing difference isn't really measuring raw throughput changes: some of the local tasks fail with an error, so the 1 case was dominated by pauses for error recovery. This task is also probably CPU bound and I wouldn't expect a linear increase above 4 taskmasters, given that I have 4 cores in this machine. But it does suggest that launch 8 task is quite a bit faster (almost 10x faster in this case).

From the logs, you can see that the first log (8 taskmasters) has 11 distinct PIDs in the first column (this is 8 initial daemons, plus three re-launches after errors), and that every task is worked on only once. The second log (1 taskmaster) has 5 distinct PIDs in the first column (this is 1 initial daemon, plus four re-launches after errors), and that every task is still worked on only once. (The first log probably has fewer restarts after errors because one of the daemons that encountered an error didn't have a chance to grab any more tasks after it restarted, since other daemons had finished the work).

The important part here is that neither of these scenario resulted in taskmasters performing the same tasks. Broadly, we would expect dramatic, system-wide failures if the locks were not working properly. One highly visible example is that multiple copies of each email would be sent.


It's expected that multiple different tasks may be queued up which do the same work, particularly for search indexing. For example, every time you update a task, we queue a task to rebuild the search index. Search indexes are rebuilt from scratch every time -- we don't update them incrementally. This isn't as efficient as a theoretical incremental update might be, but it's much simpler and means that we don't need to have administrators do migrations if we make minor changes: the search index will tend to update itself fairly well over time.

I think your observations are consistent with a simpler explanation -- this particular query being very slow for some reason, combined and the task in question being updated several times (for example, several users commenting on it). If the query was extremely slow and updates came in faster than it could execute, we'd expect all of the copies of it to queue up.

The biggest question to me is why that first query has been executing for 32 seconds. Does bin/storage adjust report that all your keys are correct? Are you using MySQL (not RDS / Maria / something weird)? Does this happen all the time, or only some of the time (e.g., do you have a daily backup process, and the slowness coincides directly with the backups)?

Wauw, Evan, thank you so much for your debugging effort, for sharing your thoughts and your elaborate explanation. It's (seriously) highly appreciated!

Broadly, we would expect dramatic, system-wide failures if the locks were not working properly.

It's expected that multiple different tasks may be queued up which do the same work, particularly for search indexing.

That's exactly what I thought as well!

One highly visible example is that multiple copies of each email would be sent.

"Unfortunately", no multiple copies of emails have been spotted.

I'll attempt to follow your steps, share the log results and answer your questions ASAP.
(FYI, I rely on others to assist me in that as unfortunately, I am not authorised to be able to provide you the desired information independently)


What I, by the way, did forget to mention:

  • Phabricator is fully up-to-date (in order to exclude known bugs which might already have been fixed)
  • When this situation occurred, nobody was making use of Phabricator (in order to exclude user influence on this matter)

Hi, a co-worker of @vintrax here.
It seems this happened because I messed up. I wrote a bot that accidentally left hundreds of comments in a single task in the span of 20 minutes. The large amount of comments caused both the a lot of search index jobs to be slow (DELETEs/UPDATEs on a big MYSQL record) and big in amount (an index job per comment). To top it off these jobs couldn't be run in parallel efficiently because of search_documentfield table locks (because it's MyISAM). This caused the queue to still be clogged more than 24 hours after the 'bot mishap' happened, which is why I initially did not think this had caused the issues

I really want to thank you, @epriestley, for taking the time to look at our issue and sharing your insightful conclusion — it really helped a bunch with finding the problem.

Ah! That could definitely explain it.

One thing we could possibly do here is version the search index. For most objects, we can figure out what "version" they are by looking at the ID of the last applied transaction. When we write the index, we can write which version of the object we're indexing. When we go to update the index, we can just decline to do any work if the indexed version is as new as the current version.

If we had logic like this, the queue would try to keep up for about 20 minutes and then the next job would index the most current version of the task, so all the remaining tasks in queue should be able to exit immediately without doing any work.

Another thing we could do is use a per-object GlobalLock around the actual indexing logic, which could reduce the effects of situations like this by letting the other tasks suffer cheap temporary failures and drop to the end of the queue. Our current behavior is probably somewhat incorrect because multiple daemons may work on indexing the same object at the same time. Although the MySQL search engine might already use transactions to prevent them from stepping on one another, other engines may not and we'd be more correct to provide this guarantee in the infrastructure rather than requiring the engines to bring their own locks.

Since this situation is difficult to hit I don't expect to implement either of these changes anytime too soon, but they're both pretty reasonable things to do the next time we doing work on search.

epriestley renamed this task from Make parallel taskmaster daemons efficient by not letting them executing the same tasks to Version and lock search index rebuilds to reduce the cost of runaway writes.Dec 3 2015, 1:07 PM
epriestley triaged this task as Low priority.
epriestley edited projects, added Search; removed Daemons.

I suppose the "version number" I propose is a little tricky, because it won't be incremented if a user edits a comment -- this does not apply a new transaction. We could use a <MAX(transactionID), MAX(commentID)> version instead to catch edits.

T6281 is an older task discussing a possible deadlock under concurrency conditions in custom field reindexing. I think the versioning / locking approach above is probably the cleanest approach to resolving that, as well.