Page MenuHomePhabricator

phutil_utf8ize() does not normalize overlong forms of 3-byte and 4-byte characters, but json_encode() refuses to encode them
Closed, ResolvedPublic

Description

phutil_utf8ize() accepts and emits overlong 3-byte and 4-byte unicode character sequences, like "\xE0\x83\x83". These are "sort of" valid and normally this doesn't cause any real ssues.

However, json_encode() refuses to accept these. Among other things, this can lead to DarkConsole failing because it is trying to report that the page issued a query which includes sequences like these, like a blob insert for the file content of a profile, particularly if file encryption is enabled. I hit this, in particular, in D16432.

But, more generally, this fails:

json_encode(phutil_utf8ize("\xE0\x83\x83"));

Instead, it should succeed.