Page MenuHomePhabricator

Diffusion's "Grep File Content" doesn't work with UTF characters
Closed, ResolvedPublic

Description

Reproduction steps:

Try to search some japaneese/cyrillic/emoji/any other non-latin content in any repository

https://secure.phabricator.com/source/phabricator/browse/master/?grep=%E7%89%87%E4%BB%AE%E5%90%8D
https://secure.phabricator.com/source/phabricator/browse/master/?grep=%E2%98%9D%F0%9F%8F%BB
https://secure.phabricator.com/source/phabricator/browse/master/?grep=%D1%84%D0%B0%D0%B1%D1%80%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80

Version information: this install

Actual result: looks like it greps for empty string, which returns all lines from all files in repo.

Event Timeline

avivey updated the task description. (Show Details)

This works properly on my local machine (OSX) so it's some kind of Ubuntu UTF8 CLI thing.

A possible workaround is to use this construction instead:

git grep -f - ...

...and then write the pattern to stdin.

See also T7339 and T5554. But the approach we are generally taking there is "wherever possible, use a method where locale configuration is not relevant", which is consistent with using -f -.

(We also likely have a similar problem for hg that D18105 won't fix, and can't use the same strategy there since hg grep has no -f flag or equivalent.)

See T13060 and T7339 for followups and additional discussion.