Page MenuHomePhabricator

Fix an issue with selecting the right stemmed ngrams with Ferret engine queries
ClosedPublic

Authored by epriestley on Sep 12 2017, 2:48 PM.
Tags
None
Referenced Files
F14056873: D18593.diff
Sat, Nov 16, 10:58 PM
F14003054: D18593.id44654.diff
Sat, Oct 26, 1:39 AM
F13983123: D18593.diff
Sun, Oct 20, 4:17 AM
F13976994: D18593.diff
Fri, Oct 18, 4:40 PM
Unknown Object (File)
Oct 7 2024, 11:27 PM
Unknown Object (File)
Sep 18 2024, 9:01 AM
Unknown Object (File)
Sep 13 2024, 2:06 AM
Unknown Object (File)
Sep 13 2024, 2:05 AM
Subscribers
None

Details

Summary

Ref T12819. In D18581, I corrected one bug (ngram selection for terms) but introduced a minor new bug. We now pass ' query ' (term corpus with boundary spaces) to the stemmer, but it bails out on this since English words don't start with spaces.

Trim these extra boundary spaces off before invoking the stemmer.

The practical effect of this is that searching for non-stem variations of a word ("detection") now finds stemmed variations again ("detect"). Prior to fixing this bug, the stem could find longer variations but not the other way around.

Test Plan

Searched for "detection", found results matching "detect" after patch (and saw same results for "detect" and "detection").

Diff Detail

Repository
rP Phabricator
Branch
ferret8
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 18426
Build 24811: Run Core Tests
Build 24810: arc lint + arc unit