Page MenuHomePhabricator

Implement basic ngram search for Owners Package names
ClosedPublic

Authored by epriestley on Dec 21 2015, 9:07 PM.
Tags
None
Referenced Files
F18825947: D14846.id35881.diff
Fri, Oct 24, 1:23 AM
F18813952: D14846.id35903.diff
Mon, Oct 20, 8:17 PM
F18807889: D14846.diff
Sun, Oct 19, 3:40 AM
F18791342: D14846.id35903.diff
Thu, Oct 16, 9:58 AM
F18787812: D14846.id.diff
Wed, Oct 15, 1:39 AM
F18772068: D14846.id35903.diff
Thu, Oct 9, 12:04 AM
F18771077: D14846.id35881.diff
Wed, Oct 8, 5:24 PM
F18735423: D14846.id.diff
Wed, Oct 1, 1:34 AM
Subscribers
None

Details

Summary

Ref T9979. This uses ngrams (specifically, trigrams) to build a reasonably efficient index for substring matching. Specifically, for a package like "Example", with ID 123, we store rows like this:

< ex, 123>
<exa, 123>
<xam, 123>
<amp, 123>
<mpl, 123>
<ple, 123>
<le , 123>

When the user searches for exam, we join this table for packages with tokens exa and xam. MySQL can do this a lot more efficiently than it can process a LIKE "%exam%" query against a huge table.

When the user searches for a one-letter or two-letter string, we only search the beginnings of words. This is probably what they want, the only thing we can do quickly, and a reasonable/expected behavior for typeaheads.

Test Plan
  • Ran storage upgrades and search indexer.
  • Searched for stuff with "name contains".
  • Used typehaead and got sensible results.
  • Searched for aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz and saw only 16 joins.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable