Page MenuHomePhabricator

InnoDB FULLTEXT appears to fail catastrophically once it reaches a moderate size
Closed, ResolvedPublic

Description

Placeholder for now, but we've seen two of these so this is likely a real thing.

Revisions and Commits

Restricted Differential Revision
rPHU libphutil
D18492
rP Phabricator
Closed
D18594
D18593
D18592
D18590
D18589
D18588
D18587
D18586
D18581
D18580
D18579
D18576
D18573
D18572
D18569
D18568
D18567
D18566
D18565
D18564
D18556
D18555
D18554
D18553
D18552
D18551
D18550
D18548
D18547
D18544
D18540
D18539
D18536
D18534
D18533
D18513
D18503
D18502
D18500
D18499
D18498
D18497
D18487
D18484

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Unclear if building a search engine by just doing a lot of JOINs actually scales or not. Seems OK here.

I generated and indexed 65,000 tasks locally with bin/lipsum. The lipsum corpus is very small (approximately 60 "words") so this "should" be a more stressful test of the index than real data, because almost every task has the same words and the same ngrams (that is, the ngram index isn't very useful for distinguishing between tasks). The query "lorem" matches 28,000 tasks and resolves in ~5-10s for me locally. This drops to ~0.01s if I remove the ORDER constraints and just let MySQL return whatever part of the result set it wants to. This does a pile of I/O but memory doesn't do anything crazy, which is consistent with spending time doing a big grep over the primary index while processing the LIKE constraints, which is what I was hoping for in the worst case.

For uncommon words (e.g., any word not part of the lipsum corpus) the ngram index is effective and results are instantaneous. For sufficiently constrained queries (e.g., adding a priority and a tag so we hit a few hundred results instead of 28,000) the indexes seem to get used sensibly and results are also instantaneous.

The actual indexes are fairly large (3GB for ~500MB of task data, although lispum tasks have an exceptionally large amount of text).

None of this is perfect, but it all seems better than InnoDB fulltext in terms of having mostly reasonable behaviors and not completely exploding for no reason.

I briefly hit a bizarre case where a Ferret engine query took 10 seconds to find a document in 163 projects. However, running ANALYZE TABLE on the ngrams table resolved this completely. I suspect the ngrams join may require some tweaking (and maybe a bin/storage analyze). Analyzing the Maniphest table actually seems to improve performance by ~50% too (???) although that's not a very scientific measurement.

Here's a closer look at what's probably happening:

mysql> show indexes from phriction_document_fngrams;
+----------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                      | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| phriction_document_fngrams |          0 | PRIMARY    |            1 | id          | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
| phriction_document_fngrams |          1 | key_ngram  |            1 | ngram       | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
| phriction_document_fngrams |          1 | key_ngram  |            2 | documentID  | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
| phriction_document_fngrams |          1 | key_object |            1 | documentID  | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
+----------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

mysql> analyze table phriction_document_fngrams;
+--------------------------------------------+---------+----------+----------+
| Table                                      | Op      | Msg_type | Msg_text |
+--------------------------------------------+---------+----------+----------+
| local_phriction.phriction_document_fngrams | analyze | status   | OK       |
+--------------------------------------------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> show indexes from phriction_document_fngrams;
+----------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                      | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| phriction_document_fngrams |          0 | PRIMARY    |            1 | id          | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
| phriction_document_fngrams |          1 | key_ngram  |            1 | ngram       | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
| phriction_document_fngrams |          1 | key_ngram  |            2 | documentID  | A         |        3560 |     NULL | NULL   |      | BTREE      |         |               |
| phriction_document_fngrams |          1 | key_object |            1 | documentID  | A         |          46 |     NULL | NULL   |      | BTREE      |         |               |
+----------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

Note that the cardinality for the documentID row drops from 3560 (inaccurately estimating a unique value for each row) to 46 (more accurately estimating a unique value for each document) after analysis.

All the search which was previously driven by InnoDB FULLTEXT is now driven by the Ferret engine on this install.

The actual indexes are fairly large (3GB for ~500MB of task data, although lispum tasks have an exceptionally large amount of text).

D18580 improved this somewhat, but the tables are still large. On this install, 319MB of task data produces 1329MB of index data, i.e. the indexes are about 4x larger than the tasks.

This is generally fine -- trading away disk space, which is cheap, to get an index which doesn't explode all the time is broadly desirable (that is, any time we can trade disk space for basically any other benefit, we're probably eager to do so).

This data is also extremely compressible and the backup size increases by a much smaller amount (3.6G before Ferret engine indexing to 3.9GB afterwards on this install). Much of the data is indexes, not actually data, and constructed implicitly by MySQL and thus not present explicitly in the backup at all. The largest tables (like the changeset data) also aren't indexed and don't expand, so even at runtime this far from a 4x increase in size. Finally, we'll be able to drop the old InnoDB tables later which will reclaim some space, although these are not exceptionally large.

However, this presents a possible deployment problem for the Phacility cluster, as some of the older shards may not have the free space to accommodate this increase in data size (see T12932). We're also already populating these indexes for new objects, and thus rocketing toward our doom in some sense.

I expect to complete T12932 this week since that's the lowest hanging change, but the actual deployment schedule for the Ferret engine may end up being complicated by this: I don't want to unprototype it if the cluster isn't prepared to upgrade into it. We have two well-defined pathways available to resolve this by increasing the available space (either moving instances to separate shards as in T12817, or upgrading the volume) but both involve a meaningful amount of downtime for instances on those shards so I'd prefer to find less disruptive solutions if they're reasonably available.

From T12932, I don't think it's especially promising as a way forward, so I'm just going take an operations pathway instead (see T12978), so I expect to just move to unprototype the Ferret engine here.

epriestley closed this task as Resolved.EditedSep 15 2017, 5:16 PM

This has promoted to stable. See T12974 for upgrade guidance. See T12985 for one followup change. This may still have some bugs (or it may scale very differently on real-world workloads and need substantial additional work) but I think we can handle those separately.