Page MenuHomePhabricator

Stem fulltext tokens before filtering them for stopwords
ClosedPublic

Authored by epriestley on Apr 19 2017, 4:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 12, 2:47 PM
Unknown Object (File)
Mon, Jan 6, 8:59 AM
Unknown Object (File)
Thu, Jan 2, 11:35 PM
Unknown Object (File)
Dec 13 2024, 1:35 PM
Unknown Object (File)
Dec 6 2024, 3:23 AM
Unknown Object (File)
Dec 5 2024, 7:31 PM
Unknown Object (File)
Nov 22 2024, 6:41 PM
Unknown Object (File)
Nov 18 2024, 3:43 AM
Subscribers
None

Details

Summary

Fixes T12596. A query for a token (like "having") which stems to a stopword (like "have") currently survives filtering. Stem it first so it gets caught.

Also, for InnoDB, a custom stopword table can be configured. If it is, read that instead of the default stopword list (I configured it locally, but the default list is reasonable so we never formally recommended installs configure it).

Test Plan

Queried for words that stem to stopwords, saw them filtered:

Screen Shot 2017-04-19 at 8.50.18 AM.png (554×746 px, 52 KB)

Queried for the original problem query and saw "having" caught with "have" in the stopword list:

Screen Shot 2017-04-19 at 8.56.20 AM.png (670×903 px, 66 KB)

Fiddled with local InnoDB stopword table config and saw the stopword list get loaded correctly.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Apr 19 2017, 4:14 PM
This revision was automatically updated to reflect the committed changes.