HomePhabricator

Force all mercurial commands to use UTF-8 encoding

Description

Force all mercurial commands to use UTF-8 encoding

Summary:
When non-ascii characters appear in revision titles/summaries the patch and diff (to update) commands will fail on Windows systems. This often occurs due to “smart quotes” or "em—dash" characters being inserted into commit messages by editors on "user-friendly" operating systems like macOS.

This can be worked around by forcing all mercurial commands to use the global option --encoding utf-8 which applies for any mercurial command. This option was added in ~2006 so this should work across all supported versions of mercurial.

Refs T13649

Test Plan:
I created a diff on a mercurial repository using smart quotes in the "Title" and "Summary" fields as well as in the content of a file being changed. Then on macOS, Windows (PowerShell), and Windows (cmd.exe) I was able to patch down the revision, make a modification, and diff the change back up to Phabricator, as well as land the change. I verified the commit and content looked correct on macOS as well as on Windows by using nvim which seems to properly detect and render the encoding, whereas mercurial displays the smart quotes and em-dashes with odd characters instead.

I did a grep through Arcanist codebase to find other places where --encoding might be specified for mercurial commands and could not find any. In the event that somehow this argument is added elsewhere I verified that multiple specifications of --encoding utf-8 does not cause any issues and the later specification of --encoding appears to "win".

$ hg --encoding utf-8 --encoding utf-8 log -r tip
# prints out results in UTF-8 without issue

$ hg --encoding utf-8 log --encoding latin-1 -r tip
# prints out results in latin-1 without issue

Reviewers: epriestley, Blessed Reviewers

Reviewed By: epriestley, Blessed Reviewers

Subscribers: Korvin

Maniphest Tasks: T13649

Differential Revision: https://secure.phabricator.com/D21676