Improve search stemmer performance for large inputs
Summary:
Ref T12974. See PHI87. As in D18647, we can improve the performance of some UTF8 operations here.
Instead of calling phutil_utf8_strtolower() on each token separately, call it once on the entire input up front. This has the same effect.
Test Plan:
- See D18647; indexed a 4MB task description.
- Before, a bit over 9s. After, a little under 6s, with about 3s spent in 8 calls to mb_convert_case() that we probably can't improve easily.
- Before: https://secure.phabricator.com/xhprof/profile/PHID-FILE-efxv56q2hulr6kjrxbx6/
- After: https://secure.phabricator.com/xhprof/profile/PHID-FILE-pj4uv2pkdfoujxq44piq/
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12974
Differential Revision: https://secure.phabricator.com/D18648