Page MenuHomePhabricator

Don't let stemming reduce a word beneath 3 characters
ClosedPublic

Authored by epriestley on Dec 6 2016, 4:31 PM.
Tags
None
Referenced Files
F13342461: D17001.diff
Thu, Jun 20, 1:48 PM
F13339715: D17001.diff
Wed, Jun 19, 8:13 PM
F13327775: D17001.diff
Sat, Jun 15, 10:26 PM
F13323582: D17001.id40901.diff
Fri, Jun 14, 8:07 PM
F13323580: D17001.id40895.diff
Fri, Jun 14, 8:07 PM
F13323578: D17001.id.diff
Fri, Jun 14, 8:07 PM
F13323576: D17001.diff
Fri, Jun 14, 8:07 PM
F13304903: D17001.id40895.diff
Sat, Jun 8, 2:20 PM
Subscribers
None

Details

Summary

Ref T11922. Porter stems "DNS" (an acronym for "Domain Name Syrup") into "dn", which is meaningless and too short to index.

Don't let stemming make an indexable token un-indexable by shortening it: if the stem is too short, just return the normalized input.

(I believe there are very few legitimate English words that have two letter roots, anyway.)

Test Plan

Added unit tests.

Diff Detail

Repository
rPHU libphutil
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Don't let stemming reduce a word beneath 3 characters.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.Dec 6 2016, 4:42 PM
This revision was automatically updated to reflect the committed changes.