Paths

Table of Contentst

Split Ferret engine strings for tokenization on any sequence of whitespace
ClosedPublic
Actions

Authored by epriestley on Sep 8 2017, 3:08 PM.

Tags

None

Referenced Files

	F13239184: D18579.id44616.diff
	Wed, May 22, 12:54 AM

	F13236326: D18579.diff
	Tue, May 21, 9:13 AM

	F13217671: D18579.diff
	Sat, May 18, 6:58 AM

	F13180615: D18579.id44619.diff
	Thu, May 9, 1:30 AM

	F13178964: D18579.diff
	Wed, May 8, 8:49 PM

	F13174637: D18579.id.diff
	Wed, May 8, 12:21 AM

	Unknown Object (File)
	Sat, May 4, 6:40 PM

	Unknown Object (File)
	Tue, Apr 30, 11:23 PM

View All Files

Subscribers

None

Details

Reviewers

chad

Maniphest Tasks

T12819: InnoDB FULLTEXT appears to fail catastrophically once it reaches a moderate size

Commits

rP7ea6de6e9c9d: Split Ferret engine strings for tokenization on any sequence of whitespace

Summary

Ref T12819. Currently, strings are split only on spaces, but newlines (and, if they exist, tabs) should also split strings.

Without this, we can fail to get the proper term boundary tokens for words which begin at the start of a line or end at the end of a line.

Test Plan

Reindexed a document with "xyz\nabc", saw "yz " and " ab" term boundary tokens generate properly.

Diff Detail

Repository

rP Phabricator

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley created this revision.Sep 8 2017, 3:08 PM

Harbormaster completed remote builds in B18397: Diff 44616.Sep 8 2017, 3:10 PM

chad accepted this revision.Sep 8 2017, 4:07 PM

This revision is now accepted and ready to land.Sep 8 2017, 4:07 PM

Closed by commit rP7ea6de6e9c9d: Split Ferret engine strings for tokenization on any sequence of whitespace (authored by epriestley). · Explain WhySep 8 2017, 4:40 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

src/

applications/

search/

ferret/

PhabricatorFerretEngine.php

2 lines

Diff 44619

View Options

src/applications/search/ferret/PhabricatorFerretEngine.php

Split Ferret engine strings for tokenization on any sequence of whitespaceClosedPublicActions