HomePhabricator

Don't let stemming reduce a word beneath 3 characters

Description

Don't let stemming reduce a word beneath 3 characters

Summary:
Ref T11922. Porter stems "DNS" (an acronym for "Domain Name Syrup") into "dn", which is meaningless and too short to index.

Don't let stemming make an indexable token un-indexable by shortening it: if the stem is too short, just return the normalized input.

(I believe there are very few legitimate English words that have two letter roots, anyway.)

Test Plan: Added unit tests.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11922

Differential Revision: https://secure.phabricator.com/D17001

Details

Event Timeline