Paths

Table of Contentst

Differential D18649

Improve performance of Ferret engine ngram extraction, particularly for large input strings
ClosedPublic
Actions

Authored by epriestley on Sep 26 2017, 4:21 PM.

Tags

None

Referenced Files

	F19200176: D18649.id44786.diff
	Tue, Dec 16, 3:33 PM

	F19102044: D18649.id44786.diff
	Fri, Dec 5, 5:26 AM

	F18813079: D18649.diff
	Oct 20 2025, 1:54 PM

	F18807023: D18649.id44786.diff
	Oct 18 2025, 10:37 PM

	F18763407: D18649.diff
	Oct 7 2025, 12:04 AM

	F18756367: D18649.id44787.diff
	Oct 5 2025, 11:10 AM

	F18705495: D18649.id44769.diff
	Sep 28 2025, 11:37 AM

	F18629977: D18649.diff
	Sep 16 2025, 10:25 AM

Subscribers

None

Details

Reviewers

Maniphest Tasks

T12974: Upgrading: "Ferret" Fulltext Engine

Commits

rPc034306320b5: (stable) Improve performance of Ferret engine ngram extraction, particularly…
rP086a125ad5ee: Improve performance of Ferret engine ngram extraction, particularly for large…

Summary

See PHI87. Ref T12974. The array_slice() method of splitting the string apart can perform poorly for large input strings. I think this is mostly just the large number of calls plus building and returning an array being not entirely trivial.

We can just use substr() instead, as long as we're a little bit careful about keeping track of where we're slicing the string if it has UTF8 characters.

Test Plan

Created a task with a single, unbroken blob of base64 encoded data as the description, roughly 100KB long.
Saw indexing performance improve from ~6s to ~1.5s after patch.
Before: https://secure.phabricator.com/xhprof/profile/PHID-FILE-nrxs4lwdvupbve5lhl6u/
After: https://secure.phabricator.com/xhprof/profile/PHID-FILE-6vs2akgjj5nbqt7yo7ul/

Diff Detail

Repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley created this revision.Sep 26 2017, 4:21 PM

Harbormaster completed remote builds in B18547: Diff 44769.Sep 26 2017, 4:22 PM

I've spent a lot of time staring at this and I'm pretty convinced it works. Maybe add a few more unit tests for strings of length {0,1,2}?

This revision is now accepted and ready to land.Sep 27 2017, 5:28 PM

Add a couple more test cases for short strings.

Closed by commit rP086a125ad5ee: Improve performance of Ferret engine ngram extraction, particularly for large… (authored by epriestley). · Explain WhySep 27 2017, 5:41 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster completed remote builds in B18566: Diff 44786.Sep 27 2017, 5:42 PM

Revision Contents
Changeset List

Path

Size

src/

applications/

search/

ferret/

PhabricatorFerretEngine.php

22 lines

__tests__/

PhabricatorFerretEngineTestCase.php

30 lines

Diff 44787

src/applications/search/ferret/PhabricatorFerretEngine.php

Loading...

src/applications/search/ferret/tests/PhabricatorFerretEngineTestCase.php

Loading...