Page MenuHomePhabricator

Use stemming in the MySQL fulltext search engine
ClosedPublic

Authored by epriestley on Nov 25 2016, 10:44 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 11, 9:05 AM
Unknown Object (File)
Thu, Apr 11, 3:47 AM
Unknown Object (File)
Thu, Apr 4, 3:07 AM
Unknown Object (File)
Tue, Apr 2, 12:40 AM
Unknown Object (File)
Sun, Mar 31, 9:09 PM
Unknown Object (File)
Sun, Mar 31, 9:08 AM
Unknown Object (File)
Sat, Mar 30, 9:10 PM
Unknown Object (File)
Thu, Mar 28, 6:00 PM
Subscribers
None

Details

Summary

Ref T6740. When we index a document, also save a copy of the stemmed version.

When querying, search the combined corpus for the terms.

(We may need to tune this a bit later since it's possible for literal, quoted terms to match in the stemmed section, but I think this wil rarely cause issues in practice.)

A downside here is that search sort of breaks if you upgrade into this and don't reindex. I wasn't able to find a way to issue the query that remained compatible with older indexes and didn't have awful performance, so my plan is:

  • Put this on secure.
  • Rebuild the index.
  • If things look good after a couple of days, add a way that we can tell people they need to rebuild the search index with a setup warning.

We might get some reports between now and then, but if this is super awful we should know by the end of the weekend.

Test Plan

WOW AMAZING

Screen Shot 2016-11-25 at 2.38.44 PM.png (1×1 px, 171 KB)

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Use stemming in the MySQL fulltext search engine.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.Nov 25 2016, 10:55 PM
This revision was automatically updated to reflect the committed changes.