Paths

Table of Contentst

Differential D14441

Use unicode mode when tokenizing strings like user realnames
ClosedPublic
Actions

Authored by epriestley on Nov 8 2015, 1:49 PM.

Tags

None

Referenced Files

	F15574241: D14441.id34894.diff
	Tue, May 6, 12:25 AM

	F15552013: D14441.diff
	Sun, Apr 27, 7:16 PM

	F15547411: D14441.id34897.diff
	Sat, Apr 26, 7:01 PM

	F15546777: D14441.id34897.diff
	Sat, Apr 26, 4:49 PM

	F15536730: D14441.id34894.diff
	Thu, Apr 24, 10:51 AM

	F15536729: D14441.id.diff
	Thu, Apr 24, 10:51 AM

	F15536078: D14441.id34894.diff
	Thu, Apr 24, 8:21 AM

	F15533977: D14441.diff
	Wed, Apr 23, 10:43 PM

Subscribers

Details

Reviewers

Maniphest Tasks

T9732: Can't register new account when "Real Name" contains some special unicode character

Commits

Restricted Diffusion Commit
rP152ddf57092e: Use unicode mode when tokenizing strings like user realnames

Summary

Fixes T9732. We currently tokenize strings (like user realnames) in the default non-unicode mode, which can cause patterns like \s to work incorrectly.

Use /u to use unicode-aware tokenization instead.

Test Plan

The behavior of "\s" depends upon environmental settings like LC_ALL.

With LC_ALL set to "C", \xA0 is not considered a whitespace character.
With LC_ALL set to "en_US", it is:

$ php -r 'setlocale(LC_ALL, "C"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";'
1
$ php -r 'setlocale(LC_ALL, "en_US"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";'
2

To reproduce the original issue, I added an explicit:

setlocale(LC_ALL, "en_US");

...call before the preg_split() call. This caused "忠" to be improperly split.

I then added "/u", and observed proper tokenization.

Diff Detail

Repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley updated this revision to Diff 34894.Nov 8 2015, 1:49 PM

epriestley retitled this revision from to Use unicode mode when tokenizing strings like user realnames.

epriestley updated this object.

epriestley edited the test plan for this revision. (Show Details)

epriestley added a reviewer: chad.

epriestley added a task: T9732: Can't register new account when "Real Name" contains some special unicode character.

Perfect!

Thanks, @epriestley @chad.

chad accepted this revision.Nov 8 2015, 2:58 PM

chad edited edge metadata.

This revision is now accepted and ready to land.Nov 8 2015, 2:58 PM

Closed by commit rP152ddf57092e: Use unicode mode when tokenizing strings like user realnames (authored by epriestley, committed by epriestley). · Explain WhyNov 8 2015, 3:03 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

src/

applications/

typeahead/

datasource/

PhabricatorTypeaheadDatasource.php

2 lines

Diff 34897

src/applications/typeahead/datasource/PhabricatorTypeaheadDatasource.php

Loading...