Change Details

The behavior of "\s" depends upon environmental settings like LC_ALL. With LC_ALL set to "C", `\xA0` is not considered a whitespace character. With LC_ALL set to "en_US", it is: ``` $ php -r 'setlocale(LC_ALL, "C"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";' 1 $ php -r 'setlocale(LC_ALL, "en_US"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";' 2 ``` To reproduce the original issue, I added an explicit: ``` setlocale(LC_ALL, "en_US"); ``` ...call before the `preg_split()` call. This caused "忠" to be improperly split. I then added "/u", and observed proper tokenization.