Page MenuHomePhabricator

Implement basic ngram search for Owners Package names

Authored by epriestley on Dec 21 2015, 9:07 PM.
Referenced Files
Unknown Object (File)
Wed, Nov 23, 8:57 AM
Unknown Object (File)
Sun, Nov 20, 7:00 PM
Unknown Object (File)
Fri, Nov 18, 11:15 PM
Unknown Object (File)
Mon, Nov 14, 5:16 AM
Unknown Object (File)
Thu, Nov 10, 6:04 AM
Unknown Object (File)
Thu, Nov 10, 2:45 AM
Unknown Object (File)
Oct 19 2022, 10:55 AM
Unknown Object (File)
Oct 14 2022, 2:48 PM



Ref T9979. This uses ngrams (specifically, trigrams) to build a reasonably efficient index for substring matching. Specifically, for a package like "Example", with ID 123, we store rows like this:

< ex, 123>
<exa, 123>
<xam, 123>
<amp, 123>
<mpl, 123>
<ple, 123>
<le , 123>

When the user searches for exam, we join this table for packages with tokens exa and xam. MySQL can do this a lot more efficiently than it can process a LIKE "%exam%" query against a huge table.

When the user searches for a one-letter or two-letter string, we only search the beginnings of words. This is probably what they want, the only thing we can do quickly, and a reasonable/expected behavior for typeaheads.

Test Plan
  • Ran storage upgrades and search indexer.
  • Searched for stuff with "name contains".
  • Used typehaead and got sensible results.
  • Searched for aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz and saw only 16 joins.

Diff Detail

rP Phabricator
Lint Not Applicable
Tests Not Applicable