Page MenuHomePhabricator

Add phutil_is_utf8_with_only_bmp_characters()
ClosedPublic

Authored by epriestley on Feb 23 2014, 7:41 PM.

Details

Summary

Ref T1191. We currently use utf8 charsets in MySQL, which silently truncate characters outside of the basic multilingual plane (larger than U+FFFF) like "Musical G-Clef".

Add a method to detect that a string contains characters outside of this range, and thus that inserting it will silently truncate data.

Test Plan

Added and executed unit tests.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I don't see a minimum required mysql version running, so I assume running an alter table for utf8mb4 is out of the question?

https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html

I don't see a minimum required mysql version running, so I assume running an alter table for utf8mb4 is out of the question?

English is hard.

The phabricator installation guide doesn't define a minimum required mysql version, so I assume running an alter table for utf8mb4 is out of the question? (requires 5.5+)

Yeah, it doesn't exist until 5.5.3 (released 2010-03-24) which I think is too recent to set as a minimum version. The minimum version of PHP (5.2.3) is about twice as old (2007-05-31).

Even if we decided to move to 5.5.3, I'd probably want to take some time to set it up rather than landing a single big migration like we did for utf8 originally, since we have more installs with a larger amount of data now.