Page MenuHomePhabricator

Can't register new account when "Real Name" contains some special unicode character
Closed, ResolvedPublic

Assigned To
Authored By
qiu8310
Nov 8 2015, 5:36 AM
Tags
None
Referenced Files
F952877: QQ20151108-1.png
Nov 8 2015, 5:52 AM
F952844: phab-error-2.min.png
Nov 8 2015, 5:36 AM
F952841: phab-error-1.min.png
Nov 8 2015, 5:36 AM
Subscribers

Description

phab-error-1.min.png (728×978 px, 39 KB)

phab-error-2.min.png (1×1 px, 166 KB)

I tracked the source code, found out that UserName and RealName will be tokenized, and the tokenized source code located in applications/typeahead/datasource/PhabricatorTypeaheadDatasource.php$110.

The problem is that the preg regexp "/\s+/" will split one unicode "忠" into two.

I created a gist to describe why unicode "忠" will be splited.

I wonder if there is a setting in php which can disable "/\s/" to match code points in the range 128-255 ? If not, I think "\s" should be replaced with "[\t\n\f\r ]".

Event Timeline

qiu8310 updated the task description. (Show Details)
qiu8310 updated the task description. (Show Details)
qiu8310 added a subscriber: qiu8310.

I register a account in secure.phabricator.com is also ok, can't reproduce it.

I think is php config problem, see the split result in my computer

QQ20151108-1.png (120×654 px, 16 KB)

As php manual said:

The "whitespace" characters are HT (9), LF (10), FF (12), CR (13), and space (32).
However, if locale-specific matching is happening, characters with code points in
the range 128-255 may also be considered as whitespace characters,
for instance, NBSP (A0).

In php

"忠" === "\xE5\xBF\xA0"

Because "\xA0" is whitespace, so "忠" will be splited.

We need full reproduction steps. If it's a config issue, we need to know what so we can detect it and have you correct it. https://secure.phabricator.com/book/phabcontrib/article/bug_reports/

Unfortunately, I haven't been able to reproduce any issue on this server or on my local dev machines.

Maybe you can try this to see if it is php's config issue.

Runing the code in your server :

php -r 'echo count(preg_split("/\s/", "忠")) . "\n";'

// if result is 1, then you can't reproduce any issue
// if result is 2, then you can reproduce it

Back to php manual

The "whitespace" characters are HT (9), LF (10), FF (12), CR (13), and space (32).
However, if locale-specific matching is happening, characters with code points in
the range 128-255 may also be considered as whitespace characters,
for instance, NBSP (A0).

If you can't reproduce the issue, maybe it means "locale-specific matching is not happening" in your machine.

In my machine, the result of last code is 2, so it means "locale-specific matching is happening" in my machine.

But I don't know how to disable the so called "locale-specific matching".

You still haven't told us anything we can use to reproduce the issue. Some things might be useful like:

PHP Version
Server OS
Phabricator Version
Server Locale

Sorry

PHP Version:

PHP 5.5.27 (cli) (built: Jul 23 2015 00:21:59)
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2015 Zend Technologies

Server OS

OSX Yosemite  v10.10.5

Phabricator Version ( result of Config -> Versions, all the source is latest and git branch is master)

Current Versions	
Phabricator Version Unknown 
Arcanist Version Unknown 
libphutil Version Unknown

Server Locale

zh_CN.UTF-8

Did you install php via homebrew? If so can you update to 5.6 or later?

No, I use xampp.

Tried it on ubuntu 14 and centos 6.5, all ok.

mora@mora:~$ php -v
PHP 5.5.9-1ubuntu4 (cli) (built: Apr  9 2014 17:11:57)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies

mora@mora:~$ php -r 'echo count(preg_split("/\s/u", "忠")) . "\n";'
1

mora@mora:~$ echo $LANG
en_US.UTF-8
[mora@ceph ~]$ php -v
PHP 5.3.3 (cli) (built: Jul  9 2015 17:39:00)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

[mora@ceph ~]$ php -r 'echo count(preg_split("/\s/", "忠")) . "\n";'
1

[mora@ceph ~]$ echo $LANG
en_US.UTF-8

This might be T7339, you can try some of the advice in there and see.

epriestley triaged this task as Normal priority.