Quickstart can fail to initialize databases if MyISAM is not available (currently, only in Google Cloud)
Open, WishlistPublic

Description

A couple of installs are seeing various concrete problems with the MyISAM search index tables, and we've seen sporadic issues in the past ("table needs repair", etc). Broadly, MyISAM is known to be a less-than-perfect database engine. However, it was our only option for FULLTEXT indexes for some time.

On relatively recent versions of MySQL, FULLTEXT indexes are supported on the InnoDB engine. Ideally, we should detect that MySQL has InnoDB FULLTEXT support and then have storage adjust use the InnoDB engine instead of the MyISAM engine for the FULLTEXT tables.

However, it's at least slightly more complicated than that:

  • We need to detect that this capability is present (what command can we run to ask MySQL if it supports InnoDB FULLTEXT?)
  • InnoDB uses a different stopword mechanism: a stopword table instead of a stopword file. We need to generate this table (just unconditionally?) and configure it or have users configure it.
  • Does InnoDB FULLTEXT respect ft_boolean_syntax? If no: is there another similar option? If no: we need to parse queries ourselves and submit the +A +B version to the backend (see also T10642).
  • Does InnoDB FULLTEXT respect ft_min_word_len? If no: is there another similar option? If no: maybe we need to filter short words out when indexing/querying?
  • Does swapping the ENGINE on an existing MyISAM table also rebuild the index? Do we need to get users to run search index --all after this change? If so, how do we tell them?

Just my 2c, but we've seen "table needs repair" on such a regular interval over the past 2-3 years of using Phabricator that our startup script includes a call to MySQL to automatically run "REPAIR TABLE" on startup. It's actually part of the Docker image because a broken MyISAM table will prevent bin/storage upgrade from running:

# The search database may need to be repaired, and if so, will prevent bin/storage upgrade from working
# so preemptively run a repair on that database now.
mysqlcheck --host="$MYSQL_HOST" --port="$MYSQL_PORT" --user="$MYSQL_USER" --password="$MYSQL_PASS" --databases "${MYSQL_STORAGE_NAMESPACE}_search" || true

Does InnoDB FULLTEXT respect ft_boolean_syntax? If no: is there another similar option? If no: we need to parse queries ourselves and submit the +A +B version to the backend (see also T10642).

Apparently, no

Also, reindexing is required after switching to innodb.

@epriestley: Jaime, Senior DBA at the WMF, has posted some answers for your list of questions:

Copied from https://phabricator.wikimedia.org/T146673#2704843

  • We need to detect that this capability is present (what command can we run to ask MySQL if it supports InnoDB FULLTEXT?)

if SELECT count(@@global.innodb_ft_max_token_size) AS innodb_fulltext_enabled; returns 1 row, it is enabled, else (the query fails), it is disabled. If you want to check versions, it is enabled by default since MySQL 5.6.4 and MariaDB 10.0.5, but detecting the configuration variables is simpler and more reliable. Both have 2 stable versions supporting this feature as of this writing.

  • InnoDB uses a different stopword mechanism: a stopword table instead of a stopword file. We need to generate this table (just unconditionally?) and configure it or have users configure it.

See how we do it at Wikimedia: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/manifests/role/mariadb.pp;85887e74bb5abd95de5bec729d3b9e9137a263e7$311 and https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/templates/mariadb/phabricator-stopwords-update.sql.erb and https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/templates/mariadb/phabricator.my.cnf.erb;85887e74bb5abd95de5bec729d3b9e9137a263e7$77

  • Does InnoDB FULLTEXT respect ft_boolean_syntax? If no: is there another similar option? If no: we need to parse queries ourselves and submit the +A +B version to the backend (see also T10642).

Innodb does not respect such an option, and it has not an alternative- it has to add the +signs at application layer. It is documented, but see verified bug: https://bugs.mysql.com/bug.php?id=71551

  • Does InnoDB FULLTEXT respect ft_min_word_len? If no: is there another similar option? If no: maybe we need to filter short words out when indexing/querying?

It does not respect ft_min_word_len, but it uses innodb_ft_min_token_size. See our config at: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/templates/mariadb/phabricator.my.cnf.erb;85887e74bb5abd95de5bec729d3b9e9137a263e7$79

  • Does swapping the ENGINE on an existing MyISAM table also rebuild the index? Do we need to get users to run search index --all after this change? If so, how do we tell them?

We rebuilt the index, but I believe that if the above changes are done (which requires in some cases a server restart), just a database table rebuilt/optimize (including engine conversion) would work. Whenever the fulltext parameters are changed, the index requires rebuilt (as with myisam).

@20after4, just a heads up:

  • After D16941, we will automatically change these tables back to MyISAM, from InnoDB, if you've converted them manually (that is: bin/storage adjust now adjusts table engines, and will make the tables use what it thinks is the "right" engine, which is currently still MyISAM on all systems/versions).
  • A future change will swap the default to InnoDB, but I'm going to wait until we promote stable this week to do that, since we have enough drama in master already from T11044.
  • Thus, you may want to skip stable this week if you were planning to update. Should be safe again next week.
  • (In the worst case, if these get converted back to MyISAM by accident, you can always re-convert them back to InnoDB again -- they'll stay in InnoDB as long as you don't run bin/storage adjust.)

From downstream, MariaDB has a fairly severe bug when indexing tokens with length greater than 127 bytes:

We could work around this by changing isInnoDBFulltextEngineAvailable(). It currently tests for SELECT @@innodb_ft_max_token_size executing without an error. We could extend it to contain an additional test against @@version or some similar variable so we detect InnoDB as unavailable for MariaDB 10.0 (and possibly affected versions of 10.1).

Our DBAs are currently in the process of testing the a patched mariadb package. I will try to keep you posted on that progress but it looks like the fix is now backported to 10.0

I just got this email from Google Cloud:

Dear Google Cloud SQL customer,

Cloud SQL will begin requiring the use of the InnoDB storage engine for all new instances starting on March 28, 2017. Instances created prior to March 28, 2017 will be unaffected. This change may affect compatibility with your future deployments because your existing Cloud SQL instance uses a storage engine other than InnoDB for one or more data tables.

...

Existing instances will be unaffected, but Cloud SQL instances created on or after March 28, 2017 will require the use of the InnoDB storage engine for all tables.

...

Why is Cloud SQL standardizing on InnoDB?

InnoDB supports transactions (support for the ACID property) and is more resistant to table corruption than other MySQL storage engines, such as MyISAM.

How can I make sure any new tables I create use InnoDB?

By default, tables are created using the InnoDB storage engine. Make sure your CREATE TABLE syntax does not include the ENGINE table option.

How can I convert MyISAM tables to InnoDB tables?

Cloud SQL strongly recommends converting existing tables to InnoDB. This can be done manually by following MySQL’s recommendations. Cloud SQL will not automatically convert any non-InnoDB table to InnoDB engine.

So at least one cloud provider is dropping support for anything other than InnoDB.

Just a note for the above - the initial DB seed script will need to be updated accordingly as well. Right now, trying to perform a clean install on a 2nd-gen Google Cloud SQL instance (which has the above limitation) simply errors out when it reaches one of the tables coded as MyISAM.

eliaspro added a subscriber: eliaspro.EditedFeb 28 2017, 12:16 PM

Just a note for the above - the initial DB seed script will need to be updated accordingly as well. Right now, trying to perform a clean install on a 2nd-gen Google Cloud SQL instance (which has the above limitation) simply errors out when it reaches one of the tables coded as MyISAM.

A few more details here:

Running storage upgrade results in this backtrace:

Loading quickstart template onto "104.155.xx.xx"...
[2017-02-28 11:44:51] EXCEPTION: (AphrontQueryException) #3161: Storage engine MyISAM is disabled (Table creation is disallowed). at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:355]
arcanist(head=master, ref.master=3b6b523c2b23), phabricator(head=master, ref.master=be785d1e8ab6), phutil(head=master, ref.master=7b7820ccf85a)
  #0 AphrontBaseMySQLDatabaseConnection::throwQueryCodeException(integer, string) called at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:289]
  #1 AphrontBaseMySQLDatabaseConnection::throwQueryException(mysqli) called at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:185]
  #2 AphrontBaseMySQLDatabaseConnection::executeRawQuery(string) called at [<phutil>/src/xsprintf/queryfx.php:8]
  #3 queryfx(AphrontMySQLiDatabaseConnection, string, string) called at [<phabricator>/src/infrastructure/storage/management/PhabricatorStorageManagementAPI.php:268]
  #4 PhabricatorStorageManagementAPI::applyPatchSQL(string) called at [<phabricator>/src/infrastructure/storage/management/workflow/PhabricatorStorageManagementWorkflow.php:922]
  #5 PhabricatorStorageManagementWorkflow::doUpgradeSchemata(array, NULL, boolean, boolean) called at [<phabricator>/src/infrastructure/storage/management/workflow/PhabricatorStorageManagementWorkflow.php:840]
  #6 PhabricatorStorageManagementWorkflow::upgradeSchemata(array, NULL, boolean, boolean) called at [<phabricator>/src/infrastructure/storage/management/workflow/PhabricatorStorageManagementUpgradeWorkflow.php:78]
  #7 PhabricatorStorageManagementUpgradeWorkflow::didExecute(PhutilArgumentParser) called at [<phabricator>/src/infrastructure/storage/management/workflow/PhabricatorStorageManagementWorkflow.php:107]
  #8 PhabricatorStorageManagementWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:441]
  #9 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:333]
  #10 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/sql/manage_storage.php:249]

Up to this point, the following DBs have been created:

performance_schema
phabricator_audit
phabricator_calendar
phabricator_chatlog
phabricator_conduit
phabricator_countdown
phabricator_daemon
phabricator_differential
phabricator_draft
phabricator_drydock
phabricator_feed
phabricator_file
phabricator_flag
phabricator_harbormaster
phabricator_herald
phabricator_maniphest
phabricator_meta_data
phabricator_metamta

Running Phabricator stable/be785d1

The issue might be caused by resources/sql/quickstart.sql still creating tables with ENGINE=MyISAM.
As there's no way to initially create them with MyISAM and then migrate them to InnoDB in case MyISAM isn't supported at all, another solution has to be found.

Manually running sed -i 's|MyISAM|InnoDB|g' resources/sql/quickstart.sql before storage upgrade solved the problem in a test environment for me.
Is there a way to non-hackishly do a programmatic s/ENGINE=MyISAM/ENGINE=InnoDB/g when using the quickstart template for an initial setup?
Should the quickstart template possibly be completely migrated to InnoDB by now?

  • The SQL templating should define some new variable like {$FULLTEXT_ENGINE} which evaluates to the appropriate engine at runtime.
  • The Quickstart SQL generator should replace the raw engine string output by the build + dump process with this variable in the affected tables. This piece will be a little hacky, but I think that's unavoidable.
  • The Quickstart SQL should be regenerated.
epriestley moved this task from Backlog to v2 on the Search board.

This is essentially complete except for the Google Cloud specific issue with MyISAM being unavailable; I'm going to retarget it for that.

epriestley renamed this task from Use InnoDB for FULLTEXT indexes where supported to Quickstart can fail to initialize databases if MyISAM is not available (currently, only in Google Cloud).Apr 12 2017, 8:19 PM
epriestley triaged this task as Wishlist priority.