Page MenuHomePhabricator

Importing linux git repository fails on one commit claiming non-utf8 string is used instead of utf8
Open, Needs TriagePublic

Description

Description

Importing Linux repository fails on 1 commit:

rLINUXf77621cc640a Message, Change, Owners, Herald

While trying to reparse them with ./bin/repository, the following error happened:

>>> [2] <connect> phabricator_repository
<<< [2] <connect> 988 us
>>> [3] <query> SELECT `r`.*, r.* FROM `repository` r  WHERE ((r.callsign IN ('LINUX')))   ORDER BY `r`.`id` DESC 
<<< [3] <query> 627 us
>>> [4] <query> SELECT `commit`.* FROM `repository_commit` commit  WHERE (((commit.repositoryID = 152
                  AND commit.commitIdentifier LIKE 'f77621cc640a%')))   ORDER BY `commit`.`id` DESC 
<<< [4] <query> 760 us
>>> [5] <query> SELECT `r`.*, r.* FROM `repository` r  WHERE (r.id IN (152))   ORDER BY `r`.`id` DESC 
<<< [5] <query> 592 us
>>> [6] <query> SELECT `commit`.* FROM `repository_commit` commit  WHERE (commit.id IN (1083744))   ORDER BY `commit`.`id` DESC 
<<< [6] <query> 562 us
>>> [7] <query> SELECT `r`.*, r.* FROM `repository` r  WHERE (r.id IN (152))   ORDER BY `r`.`id` DESC 
<<< [7] <query> 619 us
>>> [8] <conduit> diffusion.querycommits()
>>> [9] <query> SELECT `commit`.* FROM `repository_commit` commit  WHERE (commit.phid IN ('PHID-CMIT-hizezi6745j25mfxk4mk'))   ORDER BY `commit`.`id` DESC LIMIT 101
<<< [9] <query> 705 us
>>> [10] <query> SELECT `r`.*, r.* FROM `repository` r  WHERE (r.id IN (152))   ORDER BY `r`.`id` DESC 
<<< [10] <query> 607 us
>>> [11] <query> SELECT * FROM `repository_commitdata` WHERE commitID in (1083744) 
<<< [11] <query> 559 us
>>> [12] <exec> $ git log -n 1 --encoding='UTF-8' --format='%e%x00%cn%x00%ce%x00%an%x00%ae%x00%T%x00%at%x00%s%n%n%b' 'f77621cc640a7c50b3d8c5254ecc5d91eaa99d0d' --
<<< [12] <exec> 9,814 us
<<< [8] <conduit> 22,360 us
>>> [13] <query> SELECT * FROM `repository_commitdata` WHERE commitID = 1083744 
<<< [13] <query> 586 us
>>> [14] <connect> phabricator_user
<<< [14] <connect> 886 us
>>> [15] <query> SELECT * FROM `user` WHERE userName = 'Poddar, Sourav <sourav.poddar@ti.com>' 
<<< [15] <query> 526 us
>>> [16] <query> SELECT * FROM `user_email` WHERE address = 'Poddar, Sourav <sourav.poddar@ti.com>' 
<<< [16] <query> 460 us
>>> [17] <query> SELECT * FROM `user` WHERE realName = 'Poddar, Sourav <sourav.poddar@ti.com>' 
<<< [17] <query> 614 us
>>> [18] <query> SELECT * FROM `user_email` WHERE address = 'sourav.poddar@ti.com' 
<<< [18] <query> 433 us
>>> [19] <query> SELECT * FROM `user` WHERE userName = 'Poddar, Sourav' 
<<< [19] <query> 472 us
>>> [20] <query> SELECT * FROM `user` WHERE realName = 'Poddar, Sourav' 
<<< [20] <query> 508 us
>>> [21] <event> diffusion.lookupUser <listeners = 1>
<<< [21] <event> 168 us
>>> [22] <query> SELECT * FROM `user` WHERE userName = 'Dmitry Torokhov <dmitry.torokhov@gmail.com>' 
<<< [22] <query> 446 us
>>> [23] <query> SELECT * FROM `user_email` WHERE address = 'Dmitry Torokhov <dmitry.torokhov@gmail.com>' 
<<< [23] <query> 386 us
>>> [24] <query> SELECT * FROM `user` WHERE realName = 'Dmitry Torokhov <dmitry.torokhov@gmail.com>' 
<<< [24] <query> 459 us
>>> [25] <query> SELECT * FROM `user_email` WHERE address = 'dmitry.torokhov@gmail.com' 
<<< [25] <query> 340 us
>>> [26] <query> SELECT * FROM `user` WHERE userName = 'Dmitry Torokhov' 
<<< [26] <query> 358 us
>>> [27] <query> SELECT * FROM `user` WHERE realName = 'Dmitry Torokhov' 
<<< [27] <query> 374 us
>>> [28] <event> diffusion.lookupUser <listeners = 1>
<<< [28] <event> 134 us
>>> [29] <conduit> differential.parsecommitmessage()
>>> [30] <connect> phabricator_auth
<<< [30] <connect> 688 us
>>> [31] <query> SELECT * FROM `auth_providerconfig`  ORDER BY `id` DESC 
<<< [31] <query> 453 us
>>> [32] <query> SELECT `user`.* FROM `user` user  WHERE (user.userName IN ('Andrew', 'Morton', '<akpm@linux-foundation.org>', 'Signed-off-by:', 'Felipe', 'Balbi', '<balbi@ti.com>', 'G', 'Manjunath', 'Kondaiah', '<manjugk@ti.com>', 'Sourav', 'Poddar', '<sourav.poddar@ti.com>', 'Dmitry', 'Torokhov', '<dtor@mail.ru>'))   ORDER BY `user`.`id` DESC 
<<< [32] <query> 1,096 us
<<< [29] <conduit> 66,884 us
>>> [33] <connect> phabricator_differential
<<< [33] <connect> 688 us
>>> [34] <query> (SELECT `r`.* FROM `differential_revision` r JOIN `differential_revisionhash` hash_rel ON hash_rel.revisionID = r.id WHERE (((hash_rel.type = 'gtcm' AND hash_rel.hash = 'f77621cc640a7c50b3d8c5254ecc5d91eaa99d0d') OR (hash_rel.type = 'gttr' AND hash_rel.hash = 'e5674d3f114399e1d6cd855a15ed42f8e56a9795')))   ORDER BY `r`.`dateModified` DESC, `r`.`id` DESC )
<<< [34] <query> 1,626 us
>>> [35] <connect> phabricator_repository
<<< [35] <connect> 643 us
>>> [36] <query> UPDATE `repository_commit` SET `repositoryID` = '152', `phid` = 'PHID-CMIT-hizezi6745j25mfxk4mk', `commitIdentifier` = 'f77621cc640a7c50b3d8c5254ecc5d91eaa99d0d', `epoch` = '1336714438', `mailKey` = '676ee7druebk7ofriueb', `authorPHID` = NULL, `auditStatus` = '0', `summary` = 'Input: omap-keypad - dynamically handle register offsets', `id` = '1083744' WHERE `id` = '1083744'
<<< [36] <query> 4,353 us
[2016-02-01 09:51:00] EXCEPTION: (AphrontCharacterSetQueryException) Attempting to construct a query using a non-utf8 string when utf8 is expected. Use the `%B` conversion to escape binary strings data. at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:362]
arcanist(head=master, ref.master=57f6fb59d739), phabricator(head=master, ref.master=8900f363266e), phutil(head=master, ref.master=f43291e99d36)
  #0 AphrontBaseMySQLDatabaseConnection::validateUTF8String(string) called at [<phutil>/src/aphront/storage/connection/mysql/AphrontMySQLiDatabaseConnection.php:10]
  #1 AphrontMySQLiDatabaseConnection::escapeUTF8String(string) called at [<phutil>/src/xsprintf/qsprintf.php:178]
  #2 xsprintf_query(AphrontMySQLiDatabaseConnection, string, integer, string, integer) called at [<phutil>/src/xsprintf/xsprintf.php:70]
  #3 xsprintf(string, AphrontMySQLiDatabaseConnection, array) called at [<phutil>/src/xsprintf/qsprintf.php:64]
  #4 qsprintf(AphrontMySQLiDatabaseConnection, string, string, string) called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:1146]
  #5 LiskDAO::update() called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:1077]
  #6 LiskDAO::save() called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php:252]
  #7 PhabricatorRepositoryCommitMessageParserWorker::updateCommitData(DiffusionCommitRef) called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryGitCommitMessageParserWorker.php:11]
  #8 PhabricatorRepositoryGitCommitMessageParserWorker::parseCommitWithRef(PhabricatorRepository, PhabricatorRepositoryCommit, DiffusionCommitRef) called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php:40]
  #9 PhabricatorRepositoryCommitMessageParserWorker::parseCommit(PhabricatorRepository, PhabricatorRepositoryCommit) called at [<phabricator>/src/applications/repository/worker/PhabricatorRepositoryCommitParserWorker.php:39]
  #10 PhabricatorRepositoryCommitParserWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:122]
  #11 PhabricatorWorker::executeTask() called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementReparseWorkflow.php:325]
  #12 PhabricatorRepositoryManagementReparseWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:408]
  #13 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:301]
  #14 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/repository/manage_repositories.php:22]

Mysql version:

[phab@phab phabricator]$ mysql --version
mysql  Ver 15.1 Distrib 5.5.44-MariaDB, for Linux (x86_64) using readline 5.1

Reproducing

  • Import linux repository into Diffusion (https://github.com/torvalds/linux.git), wait a couple of hours, and see that the a commits is not imported.
  • Try reparsing it with ./bin/repository reparse --message --change --herald --owners rLINUXf77621cc640a --trace and see the failure

Event Timeline

I cloned Linux and examined the commit, but couldn't immediately see anything wrong with it. I'll do the full import when I have a chance.

Ok, thanks, note that this problem is probably the same as this one: T7260
But it seems to be easily reproducible with linux repo.