Page MenuHomePhabricator

Improve Ferret engine indexing performance for large blocks of text
ClosedPublic

Authored by epriestley on Sep 26 2017, 2:15 AM.
Tags
None
Referenced Files
F19092853: D18647.id.diff
Wed, Dec 3, 6:52 PM
F19085412: D18647.diff
Tue, Dec 2, 6:08 PM
F19059267: D18647.id44766.diff
Sat, Nov 29, 4:59 AM
F18862269: D18647.id.diff
Nov 2 2025, 6:55 PM
F18857666: D18647.diff
Nov 1 2025, 5:00 PM
F18849957: D18647.id.diff
Oct 30 2025, 1:29 PM
F18762410: D18647.id44766.diff
Oct 6 2025, 6:50 PM
F18754765: D18647.diff
Oct 5 2025, 1:12 AM
Subscribers
None

Details

Summary

See PHI87. Ref T12974. Currently, we do a lot more work here than we need to: we call phutil_utf8_strtolower() on each token, but can do it once at the beginning on the whole block.

Additionally, since ngrams don't care about order, we only need to convert unique tokens into ngrams. This saves us some phutil_utf8v(). These calls can be slow for large inputs.

Test Plan

Diff Detail

Repository
rP Phabricator
Branch
utf81
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 18544
Build 24982: Run Core Tests
Build 24981: arc lint + arc unit