Page MenuHomePhabricator

Add a "terms" corpus to Ferret fields
ClosedPublic

Authored by epriestley on Aug 30 2017, 2:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 19, 6:46 PM
Unknown Object (File)
Feb 3 2024, 4:35 PM
Unknown Object (File)
Jan 6 2024, 3:47 PM
Unknown Object (File)
Jan 1 2024, 11:03 AM
Unknown Object (File)
Dec 27 2023, 11:59 AM
Unknown Object (File)
Dec 27 2023, 11:59 AM
Unknown Object (File)
Dec 27 2023, 11:59 AM
Unknown Object (File)
Dec 22 2023, 9:37 PM
Subscribers
None

Details

Summary

Ref T12819. Ferret currently does substring search, but this is not the default mode users expect: when you search for the "RICO" act, you do not expect to find documents containing "apRICOt" even though "RICO" is a substring.

To support term search, index the corpus as a list of terms with puncutation removed and whitespace normalized so the engine can match against it.

Test Plan

Ran storage upgrade, ran search index, saw sensible database results:

   rawCorpus: This is the task description.

Hark! Whom'st'dve eaten this "food" shall surely ~perish~?? #blessed
normalCorpus: thi the task descript hark whom dve eaten food shall sure perish bless
  termCorpus:  This is the task description Hark Whom'st'dve eaten this food shall surely perish blessed

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Aug 30 2017, 2:59 PM
This revision was automatically updated to reflect the committed changes.