HomePhabricator

(stable) Improve Ferret engine indexing performance for large blocks of text

Description

(stable) Improve Ferret engine indexing performance for large blocks of text

Summary:
See PHI87. Ref T12974. Currently, we do a lot more work here than we need to: we call phutil_utf8_strtolower() on each token, but can do it once at the beginning on the whole block.

Additionally, since ngrams don't care about order, we only need to convert unique tokens into ngrams. This saves us some phutil_utf8v(). These calls can be slow for large inputs.

Test Plan:

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18647

Details

Provenance
epriestleyAuthored on Sep 26 2017, 2:11 AM
epriestleyPushed on Sep 27 2017, 5:25 PM
Reviewer
amckinley
Differential Revision
D18647: Improve Ferret engine indexing performance for large blocks of text
Parents
rP7ae4d93043c8: (stable) Promote 2017 Week 38
Branches
Unknown
Tags
Unknown
Tasks
T12974: Upgrading: "Ferret" Fulltext Engine
Build Status
Buildable 18565
Build 25009: Run Core Tests