Transcode the HTML part of incoming email into UTF-8 as well
Summary:
D1093 did this for just the text/plain part of incoming
email. Most text/html parts choose to either use entity encoding
or are already UTF-8, thus obviating the need to transcode the
HTML part. However, this is not always the case, and leads to dropped
messages, by way of:
EXCEPTION: (Exception) Failed to JSON encode value (#5: Malformed UTF-8 characters, possibly incorrectly encoded): Dictionary value at key "html" is not valid UTF8, and cannot be JSON encoded: [snip HTML part of message content]
Generalize the charset transcoding to not apply to just the text/plain part, but
both text/plain and text/html parts.
Test Plan:
Fed in a Windows-1252-encoded text/html part with 0x92
bytes in it; verified that $content only contained valid UTF-8 after
this change.
Reviewers: Blessed Reviewers, epriestley
Reviewed By: Blessed Reviewers, epriestley
Subscribers: Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D18776