Paths

Table of Contentst

Differential D18579

Split Ferret engine strings for tokenization on any sequence of whitespace
ClosedPublic
Actions

Authored by epriestley on Sep 8 2017, 3:08 PM.

Tags

None

Referenced Files

	Unknown Object (File)
	Apr 17 2026, 1:52 AM

	Unknown Object (File)
	Apr 16 2026, 10:52 AM

	Unknown Object (File)
	Apr 15 2026, 6:20 AM

	Unknown Object (File)
	Mar 27 2026, 7:31 AM

	Unknown Object (File)
	Mar 27 2026, 7:31 AM

	Unknown Object (File)
	Mar 27 2026, 7:21 AM

	Unknown Object (File)
	Mar 10 2026, 12:08 PM

	Unknown Object (File)
	Mar 9 2026, 11:37 PM

Subscribers

None

Details

Reviewers

Maniphest Tasks

T12819: InnoDB FULLTEXT appears to fail catastrophically once it reaches a moderate size

Commits

rP7ea6de6e9c9d: Split Ferret engine strings for tokenization on any sequence of whitespace

Summary

Ref T12819. Currently, strings are split only on spaces, but newlines (and, if they exist, tabs) should also split strings.

Without this, we can fail to get the proper term boundary tokens for words which begin at the start of a line or end at the end of a line.

Test Plan

Reindexed a document with "xyz\nabc", saw "yz " and " ab" term boundary tokens generate properly.

Diff Detail

Repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley created this revision.Sep 8 2017, 3:08 PM

Harbormaster completed remote builds in B18397: Diff 44616.Sep 8 2017, 3:10 PM

chad accepted this revision.Sep 8 2017, 4:07 PM

This revision is now accepted and ready to land.Sep 8 2017, 4:07 PM

Closed by commit rP7ea6de6e9c9d: Split Ferret engine strings for tokenization on any sequence of whitespace (authored by epriestley). · Explain WhySep 8 2017, 4:40 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

src/

applications/

search/

ferret/

PhabricatorFerretEngine.php

2 lines

Diff 44619

src/applications/search/ferret/PhabricatorFerretEngine.php

Loading...