User Details
- User Since
- Aug 5 2015, 2:21 PM (489 w, 4 d)
- Availability
- Available
Apr 30 2021
I don't think HGENCODING applies to the content of changesets, since that would invalidate the changeset hashes and the whole revision graph with it.
Apr 23 2021
Is this workaround now the recommended way to deal with this issue / ever going to be merged into the code base?
Aug 6 2015
If having the rest of the code run under en_US.UTF-8 is not a problem or even desirable, then that is quite true. It would have implications for eventual localization, though.
Sure, parsing locale -a would be a possibility. It's also possible to stay in PHP, for example:
Configurability is probably not a good idea -- the comment in the code says that other code parses the output from external tools and expects them to give that output in a fixed manner (particularly in English). This looks like it is hard-coded for a reason.
A solution/workaround for this problem appears to be to generate the en_US.UTF-8 locale on the Phabricator host. Restarting Phabricator does not appear to be necessary for the change to take effect, but already-imported repositories will have to be reparsed with bin/repository reparse --all $repository --message --owners.
Aug 5 2015
Tarball of a Mercurial repository that contains a non-utf8-encoded file name. Can be used to reproduce T7260: EXCEPTION: (AphrontCharacterSetQueryException) Attempting to construct a query using a non-utf8 string when utf8 is expected.
I believe I have the same problem. The cause appears to be a filename that is not fully encodeable in UTF-8 -- in my case, it contains an umlaut. I have uploaded a tarball of a Mercurial repository that can be used to reproduce the problem: