Page MenuHomePhabricator

Comments are truncated at first non-base-plane character
Closed, ResolvedPublic

Subscribers
Tokens
"Mountain of Wealth" token, awarded by Krenair."Mountain of Wealth" token, awarded by joshuaspence."The World Burns" token, awarded by btrahan."Mountain of Wealth" token, awarded by staticshock."Mountain of Wealth" token, awarded by hach-que."Piece of Eight" token, awarded by chad.
Assigned To
None
Authored By
erling, May 6 2012

Description

Posting Unicode characters in Differential comments cause the string to be truncated.

If you do not see a G clef in brackets here, then the same bug applies here: [

Details

Commits
D10800 / rPa17a36869254: Apply storage adjustments as part of storage upgrade
D10799 / rPfbc175aa6e19: Force Differential draft uniqueness
D10798 / rPHU0135e57181a9: Assume utf8mb4 support
D10797 / rPdbef5660fccc: Update the quickstart.sql
D10806 / rPaa0ca6fe4c22: Make user emails case-insensitive
D10804 / rPe58259b4f52b: Retroactively populate Phriction mailKey column
D10786 / rPf7fbf9ccc174: Change usernames to `sort64` to make them case-insensitive
D10771 / rP000760b645b7: Do a better job of handling spec errors during schema adjustment
D10760 / rP18161d00a08d: Update some storage documentation for new adjustment workflows
D10758 / rPf5c426639ced: Document the adjustment workflow and warn users about adjusting old MySQL
D10757 / rP917da084176c: Fix various MySQL version issues with new charset stuff
D10632 / rP3463ce8a514f: Create new databases with appropriate collation
D10627 / rPd67b7f0f47c4: Correct column mutations for old versions of MySQL
D10621 / rP410c2ecbfdf7: Remove UTF8 BMP unit test and replace it with a general UTF8 test
D10620 / rP8fa8415c07d2: Automatically build all Lisk schemata
D10623 / rPHUb7c05df35726: Quiet utf8mb4 warnings when establish MySQL connections
D10619 / rPe4e5a2004fe0: Use `sort`, not `text`, for search index table
D10616 / rP0a6473138fd6: Purge readthrough caches before applying schema adjustments
D10615 / rP3a644cf6cc22: Truncate very old, overlong Maniphest mail keys
D10611 / rP3629ebebe9cb: Drop very old schema_version table if it exists
D10614 / rP4eb439964003: Convert SavedQuery / NamedQuery to text, not bytes
D10613 / rPf881b5dd7fe8: Convert DivinerLiveSymbol to text, not bytes
D10612 / rP5ce3575fb534: Fix `adjust` phases for keys
D10604 / rPHU29136aa2b421: Make libphutil compatible with utf8mb4
D10607 / rP300172e79991: Support AUTO_INCREMENT in `bin/storage adjust`
D10603 / rP0d7489da79bc: Provide `bin/storage quickstart` to automate generation of `quickstart.sql`
D10602 / rP1dfa94e57127: Use binary collations for most text
D10601 / rP4fcc634a99dc: Fix almost all remaining schemata issues
D10599 / rPa5ce56aa76e1: Allow `bin/storage adjust` to make key changes
D10598 / rP22ee8432d2bb: Allow `bin/storage adjust` to correct column types and collations
D10595 / rPf7ee2c7467c6: Add `bin/storage adjust`, for adjusting schemata
D10594 / rPab6c6836f4c2: Remove the "note" database issue status
D10593 / rP03519c53bbcc: Mark questionable column nullability for later
D10592 / rP4f87adc43846: Ignore keys with trailing index on table primary key for now
D10591 / rP8bf24f53b334: Destroy surplus columns
D10590 / rP943c62d1e9f6: Add missing expected keys and uniqueness
D10589 / rP2880732a49a8: Generate expected schemata for Search
D10588 / rPdc8b2ae6d226: Generate expected schemata for Fact, Owners, Herald and Diviner
D10587 / rPcfbcd69e9b74: Generate expected schemata for Pholio, Phortune, Phragment, Phriction and Policy
D10583 / rPe7b590a1cf0d: Generate expected schemata for Harbormaster
D10582 / rP152a62db7abb: Generate expected Ponder schemata
D10581 / rPac9182af5870: Generate expected Project schemata
D10580 / rP098d0d93d60c: Generate expected schemata for User/People tables
D10605 / rP855f752814eb: Support `:emoji:` in Remarkup
D10577 / rP93681fcdbc5c: Generate expected schemata for Differential
D10576 / rP9be2bf211904: Generate expected schemata for Releeph
D10575 / rPb149cb7e99e5: Generate expected schemata for Repository
D10528 / rP502d18ede400: Generate expected scheamta for Passphrase, Paste, Phlux, Phame
D10527 / rPd6639b68d5a3: Generate expected schemata for MetaMTA, Nuance, MetaData, OAuthServer
D10526 / rP84568eba8465: Generate expected schemata for Maniphest
D10525 / rP7dabc21154f9: Load all keys, support unique keys, and provide an "all issues" view
D10524 / rP6bfe8b598470: Generate expected schemata for Calendar
D10523 / rPa42e4a867ebc: Remove SlowvoteComment and storage
D10521 / rPe9ac3f436ace: Add expected schemata for Fund, Files, Flags and Legalpad
D10520 / rP263dbe7bfe8e: Drop old Audit tables; make markup cache binary
D10519 / rP67fbfe6ccc9f: Generate expected schemata for Doorkeeper, Draft, Drydock, Feed
D10518 / rP8d0f0d1391a3: Generate expected schemata for Dashboards and Conpherence
D10516 / rP1ead50c2cce3: Generate reasonable expected schemata for Chatlog, Conduit, Config, Countdown…
D10501 / rP9b63f84ff93f: Generate reasonable expected schemata for Cache tables
D10500 / rP0f73b15a704d: Generate reasonable expected schemata for Audit and Auth
D10499 / rPfb8da6f4afe5: Support key schemata and column nullability
D10497 / rPaa481dba570d: Begin generating meaningful expected schemata
D10496 / rPb24e36706d40: Generate expected and comparison schemata
D10494 / rP12b53e003b22: Add a UI for reviewing database schemata
D8316 / rPa298a79bdac6: Convert Phabricator to handle "%s" / "%B" properly
D8314 / rP70b008d18da5: Add test coverage that our definition of BMP agrees with MySQL
D8315 / rPHU5a9002ebed4f: Change "%s" in qsprintf to exclude non-BMP characters
D8313 / rPHU563e600eba85: Reject nonminmal representations of UTF8 at the beginning of the 3-byte BMP…
D8310 / rPHU0879582275a5: Add phutil_is_utf8_with_only_bmp_characters()

Related Objects

StatusAssignedTask
Resolvedbtrahan
ResolvedNone
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley
Resolvedepriestley

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fabe added a subscriber: fabe.Oct 2 2014, 4:53 PM

there seems to be some major changes between the mysql versions with the charsets and the syntax.
When running the adjust script against a mysql 5.1 i get:
#1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'binary NULL' at line 1

'Collation binary' seems not to be valid for 5.1
when i change just the charset to binary the column type goes to varbinary and the collation is set to 'NULL' (according to a show full columns)

Oh, that's just my fault -- I ran binary migrations but it was a few patches back and they worked, but didn't take them into account after more changes occurred. Let me sort them out locally and you can try again.

beng removed a subscriber: beng.Oct 2 2014, 5:33 PM
epriestley added subscribers: oles, hach-que.

Support Impact This has widespread impact, notably preventing import of repositories with an emoji anywhere in them with no workaround.
High Support Impact Massive data migrations.

Support Impact This has widespread impact, notably preventing import of repositories with an emoji anywhere in them with no workaround.

As a fun note, that now includes the Phabricator repository itself ever since you made rPfda0b086b565aa :)

While I remember:

  • quickstart will currently do the wrong thing for old MySQL.
  • adjust should warn you if you're about to adjust an old MySQL instance, and advise you that updating MySQL first may save you time in the long run.

And:

  • Databases also create with the wrong collation for old MySQL.

Other installs should NOT run bin/storage adjust yet: we have some things that still need cleanup.

What's the status on those things that need cleanup? Looking at the landed diffs, it seems like they have all been taken care of. We don't have old MySQL (5.5.38). Is it (relatively) safe to adjust?

aklapper added a subscriber: aklapper.

Perhaps I'm misunderstanding, but it seems like storage adjust could be made more efficient. Each time you alter a table, MySQL makes a copy of it, alters the copy then replaces the original with the modified one. If we perform several alters, this adds up to several table copies. On large tables this takes considerable time.

I believe that if we ran one alter statement per table it could speed up the process considerably.

Yes, the current implementation doesn't pursue that optimization because it's simpler to omit it and simpler to debug issues when the queries only do one thing each.

I'm getting the following error when trying to apply storage adjustments:

> ./bin/storage adjust
Verifying database schemata...


Database                 Table              Name       Issues     
phabricator_differential differential_draft key_unique Missing Key

Found 1 issues(s) with schemata, detailed above.

You can review issues in more detail from the web interface, in Config > Database Status. To better understand the adjustment workflow, see "Managing Storage Adjustments" in the documentation.

MySQL needs to copy table data to make some adjustments, so these migrations may take some time.


    Fix these schema issues? [y/N] y

Dropping caches, for faster migrations...
Purging remarkup cache...done.
Purging changeset cache...done.
Purging general cache...done.
Fixing schema issues...
Done.                                                                         

Target                                                 Error                                                                                                         
phabricator_differential.differential_draft.key_unique #1062: Duplicate entry 'PHID-DREV-qjg6xlf327vscfghhsy2-PHID-USER-hfenohlvucsbthy2su73-di' for key 'key_unique'

Failed to make some schema adjustments, detailed above.
For help troubleshooting adjustments, see "Managing Storage Adjustments" in the documentation.

I think some user has somehow managed to end up with two drafts for the same revision. The right fix is probably to remove one and then adjust again.

Also want to add that we had several duplicate differential_drafts on our install as well. About 15 objectPHID-authorPHID combinations had duplicates, ranging from 2-10 duplicates each. Cleaning them up was pretty straightforward.

Additionally, one of the adjustments failed for us. Specifically: phabricator_repository.repository_lintmessage.code Wrong Column Type. PHPCS creates codes that are rather silly in length, like "PHPCS.E.Generic.Functions.FunctionCallArgumentSpacing.NoSpaceAfterComma" so the suggest length of 32 would truncate data, so that adjust fails.

Lastly, from IRC, we couldn't save wiki pages, but applied the diff in P1415 and it resolved that problem.

locutus added a subscriber: locutus.Nov 6 2014, 5:56 PM

D10800 marks this as fixed. For more details (or if you run into issues) see T6485 (which will be updated shortly to reflect the current state of the world).

D10804 fixes the wiki issue directly. D10798 (which is the patch in P1415) will also fix it indirectly.

chad added a comment.Nov 7 2014, 10:28 PM

I feel we owe @epriestley many beers and chips for this change. Or a lifetime's worth.

If only beers could be purchased with mountain of wealth tokens... :3

Well, let's see if I destroyed everything before we celebrate too much.

I'm sure there are some lessons here, I'm just not sure what they are. Maybe:

If you're going to implement 3/4ths of utf8, a good name would be "utf8_but_only_somewhat".

There's also some kind of mess with MySQL 6.0 (which is 6 years old but doesn't exist since there was a transition to a "new release model" after Oracle bought MySQL AB?) and renaming utf8 to utf8mb3, and maybe renaming utf8mb4 to utf8. I haven't been able to figure out exactly what the story is here yet, but we should be well-prepared to deal with whatever happens there now that we have the storage adjust infrastructure.

Another thank you note. You guys are great.