HomePhabricator

Improve performance of Ferret engine ngram extraction, particularly for large…

Description

Improve performance of Ferret engine ngram extraction, particularly for large input strings

Summary:
See PHI87. Ref T12974. The array_slice() method of splitting the string apart can perform poorly for large input strings. I think this is mostly just the large number of calls plus building and returning an array being not entirely trivial.

We can just use substr() instead, as long as we're a little bit careful about keeping track of where we're slicing the string if it has UTF8 characters.

Test Plan:

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18649