Paths

Table of Contentst

Differential D18647

Improve Ferret engine indexing performance for large blocks of text
ClosedPublic
Actions

Authored by epriestley on Sep 26 2017, 2:15 AM.

Tags

None

Referenced Files

	F19092853: D18647.id.diff
	Wed, Dec 3, 6:52 PM

	F19085412: D18647.diff
	Tue, Dec 2, 6:08 PM

	F19059267: D18647.id44766.diff
	Sat, Nov 29, 4:59 AM

	F18862269: D18647.id.diff
	Nov 2 2025, 6:55 PM

	F18857666: D18647.diff
	Nov 1 2025, 5:00 PM

	F18849957: D18647.id.diff
	Oct 30 2025, 1:29 PM

	F18762410: D18647.id44766.diff
	Oct 6 2025, 6:50 PM

	F18754765: D18647.diff
	Oct 5 2025, 1:12 AM

Subscribers

None

Details

Reviewers

Maniphest Tasks

T12974: Upgrading: "Ferret" Fulltext Engine

Commits

rP9288cad0edc8: (stable) Improve Ferret engine indexing performance for large blocks of text
rPa1d9a2389db4: Improve Ferret engine indexing performance for large blocks of text

Summary

See PHI87. Ref T12974. Currently, we do a lot more work here than we need to: we call phutil_utf8_strtolower() on each token, but can do it once at the beginning on the whole block.

Additionally, since ngrams don't care about order, we only need to convert unique tokens into ngrams. This saves us some phutil_utf8v(). These calls can be slow for large inputs.

Test Plan

Created a ~4MB task description.
Ran bin/search index Txxx --profile ... to profile indexing performance before and after the change.
Saw total runtime drop form 38s to 9s.
Before: https://secure.phabricator.com/xhprof/profile/PHID-FILE-wiht5d7lkyazaywwxovw/
After: https://secure.phabricator.com/xhprof/profile/PHID-FILE-efxv56q2hulr6kjrxbx6/

Diff Detail

Repository

Branch

utf81

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 18544
Build 24982: Run Core Tests
Build 24981: arc lint + arc unit

Event Timeline

epriestley created this revision.Sep 26 2017, 2:15 AM

Harbormaster completed remote builds in B18544: Diff 44766.Sep 26 2017, 2:16 AM

epriestley mentioned this in D18648: Improve search stemmer performance for large inputs.Sep 26 2017, 2:18 AM

amckinley accepted this revision.Sep 26 2017, 2:22 AM

This revision is now accepted and ready to land.Sep 26 2017, 2:22 AM

Closed by commit rPa1d9a2389db4: Improve Ferret engine indexing performance for large blocks of text (authored by epriestley). · Explain WhySep 27 2017, 3:15 PM

This revision was automatically updated to reflect the committed changes.

epriestley mentioned this in rPHU388d16d298a3: Improve search stemmer performance for large inputs.Sep 27 2017, 5:24 PM

epriestley mentioned this in rPHU5764bd8cafb8: (stable) Improve search stemmer performance for large inputs.

Revision Contents
Changeset List

Path

Size

src/

applications/

search/

ferret/

PhabricatorFerretEngine.php

13 lines

Diff 44766

src/applications/search/ferret/PhabricatorFerretEngine.php

Loading...