Page MenuHomePhabricator

Stem fulltext tokens before filtering them for stopwords
ClosedPublic

Authored by epriestley on Apr 19 2017, 4:00 PM.
Tags
None
Referenced Files
F14053463: D17728.diff
Fri, Nov 15, 3:51 PM
F14049080: D17728.diff
Thu, Nov 14, 10:49 AM
F14033857: D17728.diff
Sat, Nov 9, 8:26 PM
F14021895: D17728.diff
Wed, Nov 6, 1:18 PM
F14011605: D17728.diff
Fri, Nov 1, 3:34 AM
F14005142: D17728.diff
Sun, Oct 27, 7:52 AM
F14002352: D17728.diff
Fri, Oct 25, 6:09 PM
F13987052: D17728.id42640.diff
Mon, Oct 21, 7:09 AM
Subscribers
None

Details

Summary

Fixes T12596. A query for a token (like "having") which stems to a stopword (like "have") currently survives filtering. Stem it first so it gets caught.

Also, for InnoDB, a custom stopword table can be configured. If it is, read that instead of the default stopword list (I configured it locally, but the default list is reasonable so we never formally recommended installs configure it).

Test Plan

Queried for words that stem to stopwords, saw them filtered:

Screen Shot 2017-04-19 at 8.50.18 AM.png (554×746 px, 52 KB)

Queried for the original problem query and saw "having" caught with "have" in the stopword list:

Screen Shot 2017-04-19 at 8.56.20 AM.png (670×903 px, 66 KB)

Fiddled with local InnoDB stopword table config and saw the stopword list get loaded correctly.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Apr 19 2017, 4:14 PM
This revision was automatically updated to reflect the committed changes.