Always setlocale() to en_US.UTF-8 for the main process


Always setlocale() to en_US.UTF-8 for the main process

Depends on D18987. See PHI343. Fixes T13060. See also T7339.

When the main process starts up with LANG=POSIX (this is the default on Ubuntu) and we later try to run a subprocess with a UTF8 character in the argument list (like git cat-file blob ๐Ÿ‘.txt), the argument is not passed to the subprocess correctly.

We already set LANG=en_US.UTF-8 in the subprocess environment, but this only controls behavior for the subprocess itself. It appears that the argument list encoding before the actual subprocess starts depends on the parent process's locale setting, which makes some degree of sense.

Setting putenv('LANG=en_US.UTF-8') has no effect on this, but my guess is that the parent process's locale setting is read at startup (rather than read anew from LANG every time) and not changed by further modifications of LANG.

Using setlocale(...) does appear to fix this.

Ideally, installs would probably set some UTF-8-compatible LANG setting as the default. However, this makes setup harder and I couldn't figure out how to do it on our production Ubuntu AMI after spending a reasonable amount of time at it (see T13060).

Since it's very rare that this setting matters, try to just do the right thing. This may fail if "en_US.UTF-8" isn't available, but I think warnings/remedies to this are in the scope of T7339, since we want this locale to exist for other legitimate reasons anyway.

Test Plan:

  • Applied this fix in production, processed the failing worker task from PHI343 after kicking Apache hard enough.
  • Ran locally with setlocale(LC_ALL, 'duck.quack') to make sure a bad/invalid/unavailable setting didn't break anything, didn't hit any issues.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13060

Differential Revision: https://secure.phabricator.com/D18988