Page MenuHomePhabricator

Add a "terms" corpus to Ferret fields
ClosedPublic

Authored by epriestley on Aug 30 2017, 2:40 PM.
Tags
None
Referenced Files
F14007033: D18498.id44444.diff
Mon, Oct 28, 10:26 PM
F13986476: D18498.id44444.diff
Mon, Oct 21, 3:15 AM
F13960058: D18498.diff
Oct 14 2024, 9:48 PM
Unknown Object (File)
Oct 7 2024, 4:22 PM
Unknown Object (File)
Sep 19 2024, 4:15 AM
Unknown Object (File)
Sep 15 2024, 4:49 AM
Unknown Object (File)
Sep 13 2024, 2:40 AM
Unknown Object (File)
Sep 13 2024, 2:39 AM
Subscribers
None

Details

Summary

Ref T12819. Ferret currently does substring search, but this is not the default mode users expect: when you search for the "RICO" act, you do not expect to find documents containing "apRICOt" even though "RICO" is a substring.

To support term search, index the corpus as a list of terms with puncutation removed and whitespace normalized so the engine can match against it.

Test Plan

Ran storage upgrade, ran search index, saw sensible database results:

   rawCorpus: This is the task description.

Hark! Whom'st'dve eaten this "food" shall surely ~perish~?? #blessed
normalCorpus: thi the task descript hark whom dve eaten food shall sure perish bless
  termCorpus:  This is the task description Hark Whom'st'dve eaten this food shall surely perish blessed

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Aug 30 2017, 2:59 PM
This revision was automatically updated to reflect the committed changes.