Page MenuHomePhabricator

Upgrading: Changes to Fulltext Indexing (InnoDB Fulltext)
Closed, ResolvedPublic

Assigned To
Authored By
epriestley
Nov 25 2016, 11:31 PM
Referenced Files
F2589905: Unhandled_Exception___AphrontQueryException__.png
Feb 6 2017, 12:34 PM
Tokens
"Mountain of Wealth" token, awarded by vinzent."Love" token, awarded by salvian."Love" token, awarded by cspeckmim."Party Time" token, awarded by chad.

Description

We're making changes to fulltext indexing which impact existing installs. Updating through these changes may take more time and attention than usual because search indexes need to be rebuilt. This task discusses these changes at a high level.

These changes are not included in stable for 2016 Week 48 (rPcfcc3b83). Upgrading past that (including to HEAD of master) will trigger the changes.

Summary for operations:

  • Phabricator now manages table engines, and bin/storage upgrade will ALTER TABLE and change engines if any table is using an engine which Phabricator does not expect. Generally, this means that any manual engine changes you have made will now be undone.
  • If InnoDB FULLTEXT indexes are available in MySQL (which is common), Phabricator will now change the fulltext tables to use InnoDB. See T11741 for broader discussion.
  • This may take a long time if your install has a lot of data.
  • After the engine change, search indexes also need to be rebuilt for stemming support (see T6740). Without this rebuild, search won't be able to find older results consistently.

If you operate a large install, you can minimize downtime associated with the migration like this:

  • Warn users that search will be degraded for some time after the upgrade.
  • TRUNCATE the phabricator_search.search_documentfield table to destroy all the data it contains. This will break search.
  • Perform the upgrade. The engine change should execute quickly because no data needs to be migrated.
  • Phabricator will now be usable again, except that global search won't return results.
  • Run bin/search index --all --background --force to rebuild the entire search index. Search will gradually return to service as these tasks complete.

Benefits for operations:

  • If InnoDB FULLTEXT is available, Phabricator now upgrades all tables to InnoDB.
  • We saw various reports of these MyISAM tables crashing and requiring repair. Since (if possible) we no longer use MyISAM, this should no longer occur.
  • ft_boolean_syntax no longer needs to be configured on any system. This is particularly relevant to RDS, where it could not be configured (see T10642).
  • ft_min_word_len no longer need to be configured (if InnoDB FULLTEXT is available).
  • ft_stopword_file no longer needs to be configured (if InnoDB FULLTEXT is available).

Benefits for users:

  • Searches which return very large result sets should now execute faster.
  • Results where the object title matches the query are now ranked higher.
  • Search now uses stemming, and a query like "delete" can now find documents with the words "deletion", "deleting", "deletes", etc.
  • Search should now find more relevant results when backed by Amazon RDS.

We'll continue tuning this, but future changes should be significantly less disruptive.

Event Timeline

@csilvers, not sure if you got directed here from IRC. See also T11932. At HEAD, we'll now issue more explicit guidance about the required index rebuild.

After this update I couldn't find three letter words any more. I needed to add this change to the MySQL config:

innodb_ft_min_token_size=3

.. as ft_min_word_len=3 only applies to MyISAM tables.

What is innodb_ft_min_token_size set to? The default is 3.

mysql> select @@innodb_ft_min_token_size;

Found it and edited my comment, it was indeed innodb_ft_min_token_size at fault.

Note that full text indexing for InnoDB and hence innodb_ft_min_token_size were only introduced in MariaDB 10.0.5. RHEL/CentOS 7 ships with MariaDB 5.5 :( Ref: https://mariadb.com/kb/en/mariadb/fulltext-index-overview/

I've still got ft_min_word_len=3 in my config but some three letter word searches (lots of IT acronyms) aren't working. Not all results are missing.

FYI this is where things are at after storage upgrade on Week 49 stable using MariaDB 5.5. Is this what you would expect?

SELECT TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE ENGINE = 'MyISAM';
+---------------------------+--------+
| TABLE_NAME                | ENGINE |
+---------------------------+--------+
| columns_priv              | MyISAM |
| db                        | MyISAM |
| event                     | MyISAM |
| func                      | MyISAM |
| help_category             | MyISAM |
| help_keyword              | MyISAM |
| help_relation             | MyISAM |
| help_topic                | MyISAM |
| host                      | MyISAM |
| ndb_binlog_index          | MyISAM |
| plugin                    | MyISAM |
| proc                      | MyISAM |
| procs_priv                | MyISAM |
| proxies_priv              | MyISAM |
| servers                   | MyISAM |
| tables_priv               | MyISAM |
| time_zone                 | MyISAM |
| time_zone_leap_second     | MyISAM |
| time_zone_name            | MyISAM |
| time_zone_transition      | MyISAM |
| time_zone_transition_type | MyISAM |
| user                      | MyISAM |
| conpherence_index         | MyISAM |
| search_documentfield      | MyISAM |
+---------------------------+--------+

That's expected if innodb_ft_max_token_size is not defined; specifically if this returns an error:

mysql> select @@innodb_ft_max_token_size;

If your IT acronyms look like short English words, it's possible they're being stemmed under the token size limit, but I can't look into this without knowing which specific acronyms you're having difficulty with. The stemmer is not exposed in the UI, so there's no easy way for you to check.

> select @@innodb_ft_max_token_size;
ERROR 1193 (HY000): Unknown system variable 'innodb_ft_max_token_size'

What you describe sounds familiar. I can find NTP and EoL. I can find NIS* but not NIS, DNS* but not DNS.

Applied D17001 and confirmed fix. Thank you.

This comment was removed by constantx.

This has seemed stable for a while. We've fixed a few stemming issues and have T12137 and some other followups outstanding, but these generally are not issues that are fundamental to InnoDB Fulltext.