I tracked the source code, found out that UserName and RealName will be tokenized, and the tokenized source code located in applications/typeahead/datasource/PhabricatorTypeaheadDatasource.php$110.
The problem is that the preg regexp "/\s+/" will split one unicode "忠" into two.
I created a gist to describe why unicode "忠" will be splited.
I wonder if there is a setting in php which can disable "/\s/" to match code points in the range 128-255 ? If not, I think "\s" should be replaced with "[\t\n\f\r ]".