Use InnoDB for FULLTEXT indexes where supported
Open, Needs TriagePublic

Description

A couple of installs are seeing various concrete problems with the MyISAM search index tables, and we've seen sporadic issues in the past ("table needs repair", etc). Broadly, MyISAM is known to be a less-than-perfect database engine. However, it was our only option for FULLTEXT indexes for some time.

On relatively recent versions of MySQL, FULLTEXT indexes are supported on the InnoDB engine. Ideally, we should detect that MySQL has InnoDB FULLTEXT support and then have storage adjust use the InnoDB engine instead of the MyISAM engine for the FULLTEXT tables.

However, it's at least slightly more complicated than that:

  • We need to detect that this capability is present (what command can we run to ask MySQL if it supports InnoDB FULLTEXT?)
  • InnoDB uses a different stopword mechanism: a stopword table instead of a stopword file. We need to generate this table (just unconditionally?) and configure it or have users configure it.
  • Does InnoDB FULLTEXT respect ft_boolean_syntax? If no: is there another similar option? If no: we need to parse queries ourselves and submit the +A +B version to the backend (see also T10642).
  • Does InnoDB FULLTEXT respect ft_min_word_len? If no: is there another similar option? If no: maybe we need to filter short words out when indexing/querying?
  • Does swapping the ENGINE on an existing MyISAM table also rebuild the index? Do we need to get users to run search index --all after this change? If so, how do we tell them?

Just my 2c, but we've seen "table needs repair" on such a regular interval over the past 2-3 years of using Phabricator that our startup script includes a call to MySQL to automatically run "REPAIR TABLE" on startup. It's actually part of the Docker image because a broken MyISAM table will prevent bin/storage upgrade from running:

# The search database may need to be repaired, and if so, will prevent bin/storage upgrade from working
# so preemptively run a repair on that database now.
mysqlcheck --host="$MYSQL_HOST" --port="$MYSQL_PORT" --user="$MYSQL_USER" --password="$MYSQL_PASS" --databases "${MYSQL_STORAGE_NAMESPACE}_search" || true

Does InnoDB FULLTEXT respect ft_boolean_syntax? If no: is there another similar option? If no: we need to parse queries ourselves and submit the +A +B version to the backend (see also T10642).

Apparently, no

Also, reindexing is required after switching to innodb.

@epriestley: Jaime, Senior DBA at the WMF, has posted some answers for your list of questions:

Copied from https://phabricator.wikimedia.org/T146673#2704843

  • We need to detect that this capability is present (what command can we run to ask MySQL if it supports InnoDB FULLTEXT?)

if SELECT count(@@global.innodb_ft_max_token_size) AS innodb_fulltext_enabled; returns 1 row, it is enabled, else (the query fails), it is disabled. If you want to check versions, it is enabled by default since MySQL 5.6.4 and MariaDB 10.0.5, but detecting the configuration variables is simpler and more reliable. Both have 2 stable versions supporting this feature as of this writing.

  • InnoDB uses a different stopword mechanism: a stopword table instead of a stopword file. We need to generate this table (just unconditionally?) and configure it or have users configure it.

See how we do it at Wikimedia: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/manifests/role/mariadb.pp;85887e74bb5abd95de5bec729d3b9e9137a263e7$311 and https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/templates/mariadb/phabricator-stopwords-update.sql.erb and https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/templates/mariadb/phabricator.my.cnf.erb;85887e74bb5abd95de5bec729d3b9e9137a263e7$77

  • Does InnoDB FULLTEXT respect ft_boolean_syntax? If no: is there another similar option? If no: we need to parse queries ourselves and submit the +A +B version to the backend (see also T10642).

Innodb does not respect such an option, and it has not an alternative- it has to add the +signs at application layer. It is documented, but see verified bug: https://bugs.mysql.com/bug.php?id=71551

  • Does InnoDB FULLTEXT respect ft_min_word_len? If no: is there another similar option? If no: maybe we need to filter short words out when indexing/querying?

It does not respect ft_min_word_len, but it uses innodb_ft_min_token_size. See our config at: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/templates/mariadb/phabricator.my.cnf.erb;85887e74bb5abd95de5bec729d3b9e9137a263e7$79

  • Does swapping the ENGINE on an existing MyISAM table also rebuild the index? Do we need to get users to run search index --all after this change? If so, how do we tell them?

We rebuilt the index, but I believe that if the above changes are done (which requires in some cases a server restart), just a database table rebuilt/optimize (including engine conversion) would work. Whenever the fulltext parameters are changed, the index requires rebuilt (as with myisam).

@20after4, just a heads up:

  • After D16941, we will automatically change these tables back to MyISAM, from InnoDB, if you've converted them manually (that is: bin/storage adjust now adjusts table engines, and will make the tables use what it thinks is the "right" engine, which is currently still MyISAM on all systems/versions).
  • A future change will swap the default to InnoDB, but I'm going to wait until we promote stable this week to do that, since we have enough drama in master already from T11044.
  • Thus, you may want to skip stable this week if you were planning to update. Should be safe again next week.
  • (In the worst case, if these get converted back to MyISAM by accident, you can always re-convert them back to InnoDB again -- they'll stay in InnoDB as long as you don't run bin/storage adjust.)

From downstream, MariaDB has a fairly severe bug when indexing tokens with length greater than 127 bytes:

We could work around this by changing isInnoDBFulltextEngineAvailable(). It currently tests for SELECT @@innodb_ft_max_token_size executing without an error. We could extend it to contain an additional test against @@version or some similar variable so we detect InnoDB as unavailable for MariaDB 10.0 (and possibly affected versions of 10.1).

Our DBAs are currently in the process of testing the a patched mariadb package. I will try to keep you posted on that progress but it looks like the fix is now backported to 10.0

I just got this email from Google Cloud:

Dear Google Cloud SQL customer,

Cloud SQL will begin requiring the use of the InnoDB storage engine for all new instances starting on March 28, 2017. Instances created prior to March 28, 2017 will be unaffected. This change may affect compatibility with your future deployments because your existing Cloud SQL instance uses a storage engine other than InnoDB for one or more data tables.

...

Existing instances will be unaffected, but Cloud SQL instances created on or after March 28, 2017 will require the use of the InnoDB storage engine for all tables.

...

Why is Cloud SQL standardizing on InnoDB?

InnoDB supports transactions (support for the ACID property) and is more resistant to table corruption than other MySQL storage engines, such as MyISAM.

How can I make sure any new tables I create use InnoDB?

By default, tables are created using the InnoDB storage engine. Make sure your CREATE TABLE syntax does not include the ENGINE table option.

How can I convert MyISAM tables to InnoDB tables?

Cloud SQL strongly recommends converting existing tables to InnoDB. This can be done manually by following MySQL’s recommendations. Cloud SQL will not automatically convert any non-InnoDB table to InnoDB engine.

So at least one cloud provider is dropping support for anything other than InnoDB.

Just a note for the above - the initial DB seed script will need to be updated accordingly as well. Right now, trying to perform a clean install on a 2nd-gen Google Cloud SQL instance (which has the above limitation) simply errors out when it reaches one of the tables coded as MyISAM.