HomePhabricator

Transcode the HTML part of incoming email into UTF-8 as well

Authored by alexmv on Nov 16 2017, 8:43 AM.

Description

Transcode the HTML part of incoming email into UTF-8 as well

Summary:
D1093 did this for just the text/plain part of incoming
email. Most text/html parts choose to either use entity encoding
or are already UTF-8, thus obviating the need to transcode the
HTML part. However, this is not always the case, and leads to dropped
messages, by way of:

EXCEPTION: (Exception) Failed to JSON encode value (#5: Malformed UTF-8 characters, possibly incorrectly encoded): Dictionary value at key "html" is not valid UTF8, and cannot be JSON encoded: [snip HTML part of message content]

Generalize the charset transcoding to not apply to just the text/plain part, but
both text/plain and text/html parts.

Test Plan:
Fed in a Windows-1252-encoded text/html part with 0x92
bytes in it; verified that $content only contained valid UTF-8 after
this change.

Reviewers: Blessed Reviewers, epriestley

Reviewed By: Blessed Reviewers, epriestley

Subscribers: Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D18776

Details

Committed
alexmvNov 16 2017, 6:08 PM
Pushed
alexmvNov 16 2017, 6:08 PM
Reviewer
Blessed Reviewers
Differential Revision
D18776: Transcode the HTML part of incoming email into UTF-8 as well
Parents
rPbea45e90d324: Add yaml files to differential.whitespace-matters
Branches
Unknown
Tags
Unknown
Build Status
Buildable 18837
Build 25391: Run Core Tests