Page MenuHomePhabricator

Collation of "sort" columns in MySQL prior to 5.5 causes uppercase/lowercase problems
Closed, ResolvedPublic

Description

(Copied associated projects from somewhat similar task to T3725)

For most users in my set up, @mholden and @epriestley mentions work just great. For a random handful of users it does not, and I can't discern a reason why.

See these two screenshots.

Works great:

Works less great

I can help run any queries you'd like to provide additional information -- I imagine this report isn't very helpful by itself.

Event Timeline

mholden raised the priority of this task from to Needs Triage.
mholden updated the task description. (Show Details)
mholden added subscribers: mholden, epriestley.

Do all the problematic users have uppercase letters in their names?

Actually yes, I believe they do

I am a master sleuth.

mholden added a comment.EditedFeb 16 2015, 9:45 PM

:p So is that a known bug, expected behavior?

can't type engrish

Just a guess. I actually can't reproduce this, though, at least immediately:

Two questions:

  • What version are you at?
  • Does bin/storage adjust report adjustments?

I updated about to the upstream 2 hours ago, I didn't notice any adjustments when I ran my "update.sh" script.

let me run bin/storage adjust manually...

Nope, no adjustments

And, my head commit is at 5a9d70707b9e16169b8a298135b36dd133e2ae

This has been happening for a few months.

And, my head commit is at 5a9d70707b9e16169b8a298135b36dd133e2ae

This has been happening for a few months.

Hmm, I'm not sure then. Can you show me the output of this?

mysql> SHOW CREATE TABLE phabricator_user.user;

I can't reproduce it on this install, either.

There is also a very small chance that running this will fix it:

phabricator/ $ ./bin/search index --type USER

That's a shot in the dark, although it definitely won't hurt anything.

./bin/search index --type USER had no effect.

Here's the mysql output:


mysql> SHOW CREATE TABLE phabricator_user.user;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user  | CREATE TABLE `user` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `phid` varbinary(64) NOT NULL,
  `userName` varbinary(64) NOT NULL,
  `realName` varbinary(128) NOT NULL,
  `sex` varbinary(4) DEFAULT NULL,
  `translation` varbinary(64) DEFAULT NULL,
  `passwordSalt` varbinary(32) DEFAULT NULL,
  `passwordHash` varbinary(128) DEFAULT NULL,
  `dateCreated` int(10) unsigned NOT NULL,
  `dateModified` int(10) unsigned NOT NULL,
  `profileImagePHID` varbinary(64) DEFAULT NULL,
  `consoleEnabled` tinyint(1) NOT NULL,
  `consoleVisible` tinyint(1) NOT NULL,
  `consoleTab` varbinary(64) NOT NULL,
  `conduitCertificate` varbinary(255) NOT NULL,
  `isSystemAgent` tinyint(1) NOT NULL DEFAULT '0',
  `isDisabled` tinyint(1) NOT NULL,
  `isAdmin` tinyint(1) NOT NULL,
  `timezoneIdentifier` varbinary(255) NOT NULL,
  `isEmailVerified` int(10) unsigned NOT NULL,
  `isApproved` int(10) unsigned NOT NULL,
  `accountSecret` binary(64) NOT NULL,
  `isEnrolledInMultiFactor` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `userName` (`userName`),
  UNIQUE KEY `phid` (`phid`),
  KEY `realName` (`realName`),
  KEY `key_approved` (`isApproved`)
) ENGINE=InnoDB AUTO_INCREMENT=47 DEFAULT CHARSET=binary |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Aaah, okay. Upgrading to MySQL 5.5 or newer and running bin/storage adjust will fix this. I'll look into getting it fixed for older MySQL versions.

The simplest fix for this would be to make the pre-5.5 sorting collation utf8_general_ci. Most columns with sort collations have only ASCII text (email addresses, usernames, callsigns) but these do not:

PhrictionDocument->slug
PhabricatorProject->name
ManiphestNameIndex->indexedObjectName
PhamePost->phameTitle
PhabricatorRepository->name
PhabricatorUser (name token table)->token
ManiphestTask->title
PhrictionContent->title
CustomFieldStringIndex->indexValue

Some of these would be easy to fix/degrade but it's hard for others. But it's probably cleanest overall to restrict this set of fields to 3-byte characters until 5.5, since that seems less bad than not having these columns collate letter case properly.

mholden added a comment.EditedFeb 16 2015, 10:45 PM

Is upgrading as painless as a "sudo apt-get upgrade mysql" -- or am I asking for huge amounts of trouble with that level of naivete on my production install?

Sorry, I pretty much have no idea.

iiam

You can follow this to get a backup if you want to plunge into the abyss:

https://secure.phabricator.com/book/phabricator/article/configuring_backups/

storage adjust may take a long time to complete after upgrading, too.

Is upgrading as painless as a "sudo apt-get upgrade mysql" -- or am I asking for huge amounts of trouble with that level of naivete on my production install?

Ohai! It's not THAT simple, but nevertheless it's not something that is going to mess Your day if You do it right. It all depends from what You are upgrading etc.generally for mysql follow http://dev.mysql.com/doc/refman/5.5/en/upgrading.html and then https://secure.phabricator.com/book/phabricator/article/configuring_backups/ and You'll be fine.

The simplest fix for this would be to make the pre-5.5 sorting collation utf8_general_ci. Most columns with sort collations have only ASCII text (email addresses, usernames, callsigns) but these do not:

PhrictionDocument->slug
PhabricatorProject->name
ManiphestNameIndex->indexedObjectName
PhamePost->phameTitle
PhabricatorRepository->name
PhabricatorUser (name token table)->token
ManiphestTask->title
PhrictionContent->title
CustomFieldStringIndex->indexValue

Some of these would be easy to fix/degrade but it's hard for others. But it's probably cleanest overall to restrict this set of fields to 3-byte characters until 5.5, since that seems less bad than not having these columns collate letter case properly.

Alternatively / additionally, should we have a setup warning about pre 5.5 not being the best? Also, maybe the installation / configuration docs should get a massage?

I vote for an "unresolved setup issue" if you have an old MySQL version

We should, yes. Until this was reported I wasn't aware of this issue and believed we had comparable support across versions.

I think we should probably use a case-insensitive collation and sacrifice the ability to put utf8mb4 in these fields. That seems like the smaller price on the balance, by a wide margin.

epriestley renamed this task from @mention detection in comments broken for *some* users to Collation of "sort" columns in MySQL prior to 5.5 causes uppercase/lowercase problems.Feb 26 2015, 3:14 PM

This is relatively simple to fix, but we need to:

  • Make sure that ALTERing a binary collated column which contains utf8mb4 characters into utf8 column succeeds with --unsafe.
  • Make sure that the messaging in this case reasonably points users toward --unsafe.
  • Accept some possible breaks in Phriction documents, Project slugs, Phame posts, etc., which have utf8mb4 characters in them.
    • Even most of the "bad" cases here aren't major, but fields like slug and phameTitle might break URLs.
  • Write the setup warning about old MySQL.

Make sure that ALTERing a binary collated column which contains utf8mb4 characters into utf8 column succeeds with --unsafe.

It does.

Make sure that the messaging in this case reasonably points users toward --unsafe.

We hit error "1366 Incorrect String Value", which is already described in the documentation.

Accept some possible breaks in Phriction documents, Project slugs, Phame posts, etc., which have utf8mb4 characters in them.

Because --unsafe truncates data, this could create problems.

For example, if you have two Phriction documents named:

  • Animal ? Splish Splash
  • Animal ? Meow Meow

...they will probably have slugs like:

  • animal_?_splish_splash
  • animal_?_meow_meow

...and truncate to:

  • animal_
  • animal_

...when migrated. This will fail the unique key constraint, and stop the adjust with an error message. There's nothing we can do about this except point installs here and walk them through massaging the data so it can be migrated.

There's a similar problem if you have multiple values which are unique in the binary collation but nonunique in the utf8_general_ci collation, like "ALincoln" and "alincoln". They will collide after changing the collation, violate the unique key constraint, and fail to adjust. Again, we just have to walk installs through massaging the data. I don't think we can reasonably do anything automated here.

The good news is that these cases should be very rare, hopefully, and the failure just prevents the adjustment from moving forward (it does not stop the install from working).