Paths

Table of Contentst

Diffusion Phabricator 152ddf57092e

Use unicode mode when tokenizing strings like user realnames
152ddf57092e
Actions

Tags

None

Referenced Files

None

Subscribers

None

Description

Use unicode mode when tokenizing strings like user realnames

Summary:
Fixes T9732. We currently tokenize strings (like user realnames) in the default non-unicode mode, which can cause patterns like \s to work incorrectly.

Use /u to use unicode-aware tokenization instead.

Test Plan:
The behavior of "\s" depends upon environmental settings like LC_ALL.

With LC_ALL set to "C", \xA0 is not considered a whitespace character.
With LC_ALL set to "en_US", it is:

$ php -r 'setlocale(LC_ALL, "C"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";'
1
$ php -r 'setlocale(LC_ALL, "en_US"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";'
2

To reproduce the original issue, I added an explicit:

setlocale(LC_ALL, "en_US");

...call before the preg_split() call. This caused "忠" to be improperly split.

I then added "/u", and observed proper tokenization.

Reviewers: chad

Reviewed By: chad

Subscribers: qiu8310

Maniphest Tasks: T9732

Differential Revision: https://secure.phabricator.com/D14441

Details

Provenance

epriestley	Authored on
epriestley	Pushed on Nov 8 2015, 3:03 PM

Reviewer

Differential Revision

D14441: Use unicode mode when tokenizing strings like user realnames

Parents

rP37df419266d4: Add Can Create Policy Capability to Phame Blogs

Branches

Unknown

Tags

Unknown

Tasks

T9732: Can't register new account when "Real Name" contains some special unicode character

Build Status

Buildable 8706
Build 10106: Run Core Tests

Event Timeline

epriestley committed rP152ddf57092e: Use unicode mode when tokenizing strings like user realnames (authored by epriestley).Nov 8 2015, 3:03 PM

epriestley added a task: T9732: Can't register new account when "Real Name" contains some special unicode character.

Harbormaster completed building B8706: rP152ddf57092e: Use unicode mode when tokenizing strings like user realnames.Nov 8 2015, 3:03 PM

Changes (1)

Path

Size

src/

applications/

typeahead/

datasource/

PhabricatorTypeaheadDatasource.php

rP152ddf57092e

src/applications/typeahead/datasource/PhabricatorTypeaheadDatasource.php

Loading...