Page MenuHomePhabricator

Git commits with no commit message fail to import/parse
Open, NormalPublic

Description

See PHI1739, which reports that Git commits with no message fail to parse.

[19-May-2020 00:52:13 UTC] [2020-05-19 00:52:13] EXCEPTION: (PhutilProxyException) Error while executing Task ID 256505576. {>} (AphrontQueryException) #1048: Column 'commitMessage' cannot be null at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:386]
...

These seem to be impossible to construct normally:

$ git commit --allow-empty -m ''
Aborting commit due to empty commit message.

Previously, see T11537 for commits with impossible timestamps. A similar method of production likely works here:

$ git cat-file commit HEAD > message.original
$ nano message.original # Delete all the useful text.
$ git hash-object -t commit --stdin -w < message.original
$ git reset --hard <hash>
$ git show
commit f6abfc515e98edaddadede599e640b4fbbad74e0 (HEAD -> blank1)
Author: epriestley <git@epriestley.com>
Date:   Fri May 8 13:50:07 2020 -0700

For completeness, note that we don't actually need an Author or Committer and can remove every line except tree. Here's the perfect commit:

epriestley@orbital ~/dev/spellbook $ (echo 'tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904'; echo) | git hash-object -t commit --stdin -w
8d7ff291d28b7f1109200d31f87a6f98fe7df90e
epriestley@orbital ~/dev/spellbook $ git show 8d7ff291d28b7f1109200d31f87a6f98fe7df90e
commit 8d7ff291d28b7f1109200d31f87a6f98fe7df90e

Event Timeline

epriestley triaged this task as Normal priority.May 19 2020, 2:37 AM
epriestley created this task.

Alas, you can't actually push the perfect commit:

$ git push origin
...
ref blank1:: Error in git rev-list --stdin --objects --not --remotes=origin --: exit status 128 fatal: missing tree object '4b825dc642cb6eb9a060e54bf8d69288fbee4904'

error: failed to push some refs to 'ssh://epriestley@local.phacility.com/source/spellbook.git'

First issue:

$ ./bin/worker execute --id 248007
Executing task 248007 (PhabricatorRepositoryGitCommitMessageParserWorker)...
[2020-05-18 19:41:34] EXCEPTION: (AphrontQueryException) #1048: Column 'identityNameRaw' cannot be null at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:386]
...
arcanist(head=stable, ref.master=e3030ebcad53, ref.stable=a5bfb968cd90), corgi(head=master, ref.master=72dc914db4d4), instances(head=stable, ref.master=b69a26a197c5, ref.stable=8b4667b1d59d), libcore(), phabricator(head=master, ref.master=f86d822a37ea, custom=11), services(head=master, ref.master=35f9c1919842)
  #0 AphrontBaseMySQLDatabaseConnection::throwQueryCodeException(integer, string) called at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:320]
  #1 AphrontBaseMySQLDatabaseConnection::throwQueryException(mysqli) called at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:216]
  #2 AphrontBaseMySQLDatabaseConnection::executeQuery(PhutilQueryString) called at [<phabricator>/src/infrastructure/storage/xsprintf/queryfx.php:8]
  #3 queryfx(AphrontMySQLiDatabaseConnection, string, PhutilQueryString, PhabricatorRepositoryIdentity, array, array) called at [<phabricator>/src/infrastructure/storage/connection/AphrontDatabaseConnection.php:58]
  #4 AphrontDatabaseConnection::query(string, PhutilQueryString, PhabricatorRepositoryIdentity, array, array) called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:1103]
  #5 LiskDAO::insertRecordIntoDatabase(string) called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:939]
  #6 LiskDAO::insert() called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:908]
  #7 LiskDAO::save() called at [<phabricator>/src/applications/repository/storage/PhabricatorRepositoryIdentity.php:130]
  #8 PhabricatorRepositoryIdentity::save() called at [<phabricator>/src/applications/diffusion/identity/DiffusionRepositoryIdentityEngine.php:106]
  #9 DiffusionRepositoryIdentityEngine::updateIdentity(PhabricatorRepositoryIdentity) called at [<phabricator>/src/applications/diffusion/identity/DiffusionRepositoryIdentityEngine.php:48]
  #10 DiffusionRepositoryIdentityEngine::newResolvedIdentity(NULL) called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php:76]
  #11 PhabricatorRepositoryCommitMessageParserWorker::updateCommitData(DiffusionCommitRef) called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php:42]
  #12 PhabricatorRepositoryCommitMessageParserWorker::parseCommit(PhabricatorRepository, PhabricatorRepositoryCommit) called at [<phabricator>/src/applications/repository/worker/PhabricatorRepositoryCommitParserWorker.php:51]
  #13 PhabricatorRepositoryCommitParserWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124]
  #14 PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:158]
  #15 PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/management/PhabricatorWorkerManagementExecuteWorkflow.php:93]
  #16 PhabricatorWorkerManagementExecuteWorkflow::execute(PhutilArgumentParser) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:492]
  #17 PhutilArgumentParser::parseWorkflowsFull(array) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:377]
  #18 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/setup/manage_worker.php:21]

Constructing an Identity from null fails.

The most correct fix here is to treat null and '' as separate identities, because they are distinct values. However, Identity can't represent null easily, and these cases are both degenerate. An easier fix is to string-cast the author string.

Second issue:

$ ./bin/worker execute --id 248007
Executing task 248007 (PhabricatorRepositoryGitCommitMessageParserWorker)...
[2020-05-18 19:48:05] EXCEPTION: (AphrontQueryException) #1048: Column 'commitMessage' cannot be null at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:386]
...
arcanist(head=stable, ref.master=e3030ebcad53, ref.stable=a5bfb968cd90), corgi(head=master, ref.master=72dc914db4d4), instances(head=stable, ref.master=b69a26a197c5, ref.stable=8b4667b1d59d), libcore(), phabricator(head=master, ref.master=f86d822a37ea, custom=11), services(head=master, ref.master=35f9c1919842)
  #0 AphrontBaseMySQLDatabaseConnection::throwQueryCodeException(integer, string) called at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:320]
  #1 AphrontBaseMySQLDatabaseConnection::throwQueryException(mysqli) called at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:216]
  #2 AphrontBaseMySQLDatabaseConnection::executeQuery(PhutilQueryString) called at [<phabricator>/src/infrastructure/storage/xsprintf/queryfx.php:8]
  #3 queryfx(AphrontMySQLiDatabaseConnection, string, PhabricatorRepositoryCommitData, array, string, string) called at [<phabricator>/src/infrastructure/storage/connection/AphrontDatabaseConnection.php:58]
  #4 AphrontDatabaseConnection::query(string, PhabricatorRepositoryCommitData, array, string, string) called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:985]
  #5 LiskDAO::update() called at [<phabricator>/src/infrastructure/storage/lisk/LiskDAO.php:910]
  #6 LiskDAO::save() called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php:166]
  #7 PhabricatorRepositoryCommitMessageParserWorker::updateCommitData(DiffusionCommitRef) called at [<phabricator>/src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php:42]
  #8 PhabricatorRepositoryCommitMessageParserWorker::parseCommit(PhabricatorRepository, PhabricatorRepositoryCommit) called at [<phabricator>/src/applications/repository/worker/PhabricatorRepositoryCommitParserWorker.php:51]
  #9 PhabricatorRepositoryCommitParserWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124]
  #10 PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:158]
  #11 PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/management/PhabricatorWorkerManagementExecuteWorkflow.php:93]
  #12 PhabricatorWorkerManagementExecuteWorkflow::execute(PhutilArgumentParser) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:492]
  #13 PhutilArgumentParser::parseWorkflowsFull(array) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:377]
  #14 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/setup/manage_worker.php:21]

This proceeds along roughly the same lines, and also seems most reasonable to resolve with a string cast.

From PHI1739, --allow-empty-message supports constructing these commits more easily (via a normal git commit rather than git hash-object).

These particular issues blame to D21027, which began extracting null in all cases where a field has no value for a given commit. This is not entirely faithful, as it can not distinguish between "field absent" and "field present, but with an empty value". At least for author, these are distinct states that can be independently constructed using git hash-object:

No Author
$ cat no-author.txt 
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

$ git hash-object -t commit --stdin -w < no-author.txt 
8d7ff291d28b7f1109200d31f87a6f98fe7df90e
$ git show 8d7ff291d28b7f1109200d31f87a6f98fe7df90e
commit 8d7ff291d28b7f1109200d31f87a6f98fe7df90e
$
Empty Author
$ cat empty-author.txt 
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author

$ git hash-object -t commit --stdin -w < empty-author.txt 
466338b59e04fbec45f808efd6efcfcc600c1b45
$ git show 466338b59e04fbec45f808efd6efcfcc600c1b45
commit 466338b59e04fbec45f808efd6efcfcc600c1b45
$

These objects appear the same under git show. They also appear the same under any git log --format ... incantation I can come up with:

$ git log -n1 --format='<%an> (%ae)' 8d7ff291d28b7f1109200d31f87a6f98fe7df90e
<> ()
$ git log -n1 --format='<%an> (%ae)' 466338b59e04fbec45f808efd6efcfcc600c1b45
<> ()

But these objects are different, and the field-existence is preserved:

$ git cat-file commit 8d7ff291d28b7f1109200d31f87a6f98fe7df90e
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

$ git cat-file commit 466338b59e04fbec45f808efd6efcfcc600c1b45
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author

Remaining work here is:

  • Diffusion should render "No commit message." or similar for these commits; it currently generates some links with no link text that can not be clicked.
  • Diffusion should also do something sensible with "No author".

Here's an example of one of the screens with issues:

Screen Shot 2020-05-18 at 7.59.36 PM.png (307×516 px, 22 KB)