HomePhabricator

(stable) Stem fulltext tokens before filtering them for stopwords

Description

(stable) Stem fulltext tokens before filtering them for stopwords

Summary:
Fixes T12596. A query for a token (like "having") which stems to a stopword (like "have") currently survives filtering. Stem it first so it gets caught.

Also, for InnoDB, a custom stopword table can be configured. If it is, read that instead of the default stopword list (I configured it locally, but the default list is reasonable so we never formally recommended installs configure it).

Test Plan:
Queried for words that stem to stopwords, saw them filtered:

Screen Shot 2017-04-19 at 8.50.18 AM.png (554×746 px, 52 KB)

Queried for the original problem query and saw "having" caught with "have" in the stopword list:

Screen Shot 2017-04-19 at 8.56.20 AM.png (670×903 px, 66 KB)

Fiddled with local InnoDB stopword table config and saw the stopword list get loaded correctly.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12596

Differential Revision: https://secure.phabricator.com/D17728