Page MenuHomePhabricator

Increase storage size of "summary" column on DiffusionRepositoryCommit
Closed, ResolvedPublic

Assigned To
None
Authored By
saggid
Aug 9 2016, 7:52 PM
Referenced Files
F1756246: pasted_file
Aug 9 2016, 7:52 PM
F1756248: pasted_file
Aug 9 2016, 7:52 PM
Subscribers
Tokens
"Party Time" token, awarded by saggid.

Description

Hello and first have to say thanks for this beautiful instrument)

Root of problem: non-latin comment messages cut off too soon and I find it difficult to analyze them.

Lets see difference between english and russian commits:

English:

pasted_file (895×857 px, 170 KB)

Russian:

pasted_file (808×785 px, 134 KB)

And because in our project we use prefixes, description of commit cut off in many times. I think that this is due to the fact that you do not use multibyte PHP-functions.

Mabye create parameter in Phabricator that set length of cutting of commit message?

Event Timeline

We truncate these summaries at 80 bytes because the underlying storage column is text80, which is only 80 bytes long when represented using binary collation in older MySQL without utf8mb4 support. We can not store more text in the general case without changing this column size.

As Cyrillic characters are 2-bytes, this effectively gives you only 40 characters. We also attempt to truncate before a word boundary, so you often get significantly fewer characters than this.

I think Cyrillic is probably more affected by this than many other languages (latin languages have 1-byte characters, while languages like Chinese have more expressive characters and are probably less hurt by the word boundary rules), but agree that this limit should probably be increased. 80 characters would probably be fine, but the column size was selected before we had a technical pathway forward through the MySQL utf8 / utf8mb4 issue and didn't realize that column sizes would eventually also imply a strict byte limit.

There is no real technical reason to keep this limit short, either (e.g., it is not part of a key), except that the migration may be somewhat expensive for installs with large amounts of data.

Mabye create parameter in Phabricator that set length of cutting of commit message?

We are very unlikely to add options like this. See T8227 for discussion.

epriestley renamed this task from Non-latin commit messages cut off too soon :( to Increase storage size of "summary" column on DiffusionRepositoryCommit.Aug 9 2016, 8:12 PM

D16385 increases the storage length of the underlying column. Summaries will now be cut to 255 bytes or 80 display glyphs, whichever is shorter.

This change is not retroactive, so only new commits will receive longer summaries. You may be able to use bin/repository reparse --message ... to apply this change retroactively if you want.