The behavior of "\s" depends upon environmental settings like LC_ALL.
With LC_ALL set to "C", `\xA0` is not considered a whitespace character.
With LC_ALL set to "en_US", it is:
```
$ php -r 'setlocale(LC_ALL, "C"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";'
1
$ php -r 'setlocale(LC_ALL, "en_US"); echo count(preg_split("/\s/", "\xE5\xBF\xA0")) . "\n";'
2
```
To reproduce the original issue, I added an explicit:
```
setlocale(LC_ALL, "en_US");
```
...call before the `preg_split()` call. This caused "忠" to be improperly split.
I then added "/u", and observed proper tokenization.