Page MenuHomePhabricator

Implement basic ngram search for Owners Package names
ClosedPublic

Authored by epriestley on Dec 21 2015, 9:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 4, 1:38 AM
Unknown Object (File)
Mon, Dec 30, 12:30 PM
Unknown Object (File)
Sat, Dec 28, 3:09 PM
Unknown Object (File)
Fri, Dec 27, 8:21 AM
Unknown Object (File)
Dec 7 2024, 6:28 AM
Unknown Object (File)
Nov 28 2024, 6:42 AM
Unknown Object (File)
Nov 24 2024, 10:57 PM
Unknown Object (File)
Nov 24 2024, 1:45 PM
Subscribers
None

Details

Summary

Ref T9979. This uses ngrams (specifically, trigrams) to build a reasonably efficient index for substring matching. Specifically, for a package like "Example", with ID 123, we store rows like this:

< ex, 123>
<exa, 123>
<xam, 123>
<amp, 123>
<mpl, 123>
<ple, 123>
<le , 123>

When the user searches for exam, we join this table for packages with tokens exa and xam. MySQL can do this a lot more efficiently than it can process a LIKE "%exam%" query against a huge table.

When the user searches for a one-letter or two-letter string, we only search the beginnings of words. This is probably what they want, the only thing we can do quickly, and a reasonable/expected behavior for typeaheads.

Test Plan
  • Ran storage upgrades and search indexer.
  • Searched for stuff with "name contains".
  • Used typehaead and got sensible results.
  • Searched for aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz and saw only 16 joins.

Diff Detail

Repository
rP Phabricator
Branch
ngram1
Lint
Lint Passed
SeverityLocationCodeMessage
Advicesrc/applications/owners/query/PhabricatorOwnersPackageFulltextEngine.php:13XHP16TODO Comment
Unit
Tests Passed
Build Status
Buildable 9700
Build 11637: Run Core Tests
Build 11636: arc lint + arc unit