After a runaway bot wrote hundreds of comments to a single object, an install reported poor queue performance while processing the reindex tasks.
- We could probably avoid doing most of this work by versioning the index, so that we can just skip a rebuild if it won't do anything.
- We could probably reduce the cost of contention by explicitly locking objects during indexing.
Original Description
I'm not sure how to report this issue properly as this might sound a bit vague, but I have noticed that my taskmaster daemons (with a pool size of 4) consistently tried to execute the exact same MySQL queries as if these daemon served like a sort of replica. In my use case, this caused the "remaining" 3 daemons in the same pool to be rather useless, to say the least, as they are simply waiting for the table lock to be released in the search_documentfield table. (And after the lock release, to eventually to do the exact same thing...)
Please see the screenshot for an example:
The queries that I have been seeing are DELETE and INSERT queries in the search_documentfield MySQL table.
These queries seems to correspond with https://github.com/phacility/phabricator/blob/master/src/applications/search/engine/PhabricatorMySQLSearchEngine.php#L36
Nevertheless, each time the taskmaster daemon took about 30-320 seconds to process a relatively simple DELETE and INSERT query. And each time, the other taskmaster daemon threads are waiting for the lock to be released, in order to attempt the exact same query.
I've waited and manually have killed the queries in order to confirm this behavior (where again, another DELETE or INSERT query appeared which seems to be 100% replicated by the other taskmaster daemons).
So, eventually, I've set the daemon pool size to 1 to see whether this would have any effect. The same kind of queries still appeared - as expected - and were not replicated by any other taskmaster daemon threads - again, as expected, as they simply were not created anymore). However, the same type of queries, now handled by merely one taskmaster daemon, were suddenly and still remain to be super fast!
However, the description when modifying the pool size is such:
Raising this can increase the maximum throughput of the task queue. The pool will automatically scale down when unutilized.
In my case it did not increase the maximum throughput at all and I am not sure if the issue which I have encountered is a known issue / by design.
However, I can't seem to explain why - after setting the pool size to 1 - everything seems to be in place again and perhaps this is a bug?
Requesting some background information
I've searched like crazy in order to find any known or related issues of this. I've also spit through the documentation and configuration options to gather some useful information, but no dice. I assume this daemon and the queries are part of the reindexing process of the MySQL search engine (which I indeed am using, Elasticsearch is not configured) and I would love to have some background information on how why this process need to exists and how it works in order to understand Phabricator better.