Page MenuHomePhabricator

Stem fulltext tokens before filtering them for stopwords
ClosedPublic

Authored by epriestley on Apr 19 2017, 4:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 11, 7:40 AM
Unknown Object (File)
Wed, Apr 10, 12:16 AM
Unknown Object (File)
Mar 10 2024, 9:33 PM
Unknown Object (File)
Jan 23 2024, 9:54 AM
Unknown Object (File)
Jan 22 2024, 1:13 PM
Unknown Object (File)
Dec 28 2023, 6:18 PM
Unknown Object (File)
Dec 15 2023, 9:38 AM
Unknown Object (File)
Nov 30 2023, 12:52 AM
Subscribers
None

Details

Summary

Fixes T12596. A query for a token (like "having") which stems to a stopword (like "have") currently survives filtering. Stem it first so it gets caught.

Also, for InnoDB, a custom stopword table can be configured. If it is, read that instead of the default stopword list (I configured it locally, but the default list is reasonable so we never formally recommended installs configure it).

Test Plan

Queried for words that stem to stopwords, saw them filtered:

Screen Shot 2017-04-19 at 8.50.18 AM.png (554×746 px, 52 KB)

Queried for the original problem query and saw "having" caught with "have" in the stopword list:

Screen Shot 2017-04-19 at 8.56.20 AM.png (670×903 px, 66 KB)

Fiddled with local InnoDB stopword table config and saw the stopword list get loaded correctly.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Apr 19 2017, 4:14 PM
This revision was automatically updated to reflect the committed changes.