Page MenuHomePhabricator

Add phutil_is_utf8_with_only_bmp_characters()
ClosedPublic

Authored by epriestley on Feb 23 2014, 7:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 26, 4:01 PM
Unknown Object (File)
Tue, Nov 26, 1:34 PM
Unknown Object (File)
Thu, Nov 14, 3:26 AM
Unknown Object (File)
Tue, Nov 12, 9:36 PM
Unknown Object (File)
Oct 21 2024, 5:00 AM
Unknown Object (File)
Aug 27 2024, 4:20 AM
Unknown Object (File)
Aug 27 2024, 12:19 AM
Unknown Object (File)
Aug 26 2024, 8:03 AM
Subscribers

Details

Summary

Ref T1191. We currently use utf8 charsets in MySQL, which silently truncate characters outside of the basic multilingual plane (larger than U+FFFF) like "Musical G-Clef".

Add a method to detect that a string contains characters outside of this range, and thus that inserting it will silently truncate data.

Test Plan

Added and executed unit tests.

Diff Detail

Repository
rPHU libphutil
Branch
bmp1
Lint
Lint Passed
Unit
Tests Passed

Event Timeline

I don't see a minimum required mysql version running, so I assume running an alter table for utf8mb4 is out of the question?

https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html

I don't see a minimum required mysql version running, so I assume running an alter table for utf8mb4 is out of the question?

English is hard.

The phabricator installation guide doesn't define a minimum required mysql version, so I assume running an alter table for utf8mb4 is out of the question? (requires 5.5+)

Yeah, it doesn't exist until 5.5.3 (released 2010-03-24) which I think is too recent to set as a minimum version. The minimum version of PHP (5.2.3) is about twice as old (2007-05-31).

Even if we decided to move to 5.5.3, I'd probably want to take some time to set it up rather than landing a single big migration like we did for utf8 originally, since we have more installs with a larger amount of data now.