Page MenuHomePhabricator

Truncate object fields in Herald transcripts
ClosedPublic

Authored by epriestley on Dec 18 2013, 12:35 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 4, 2:45 PM
Unknown Object (File)
Thu, Jan 2, 9:44 PM
Unknown Object (File)
Tue, Dec 31, 12:19 PM
Unknown Object (File)
Sat, Dec 28, 3:19 AM
Unknown Object (File)
Sat, Dec 21, 12:46 AM
Unknown Object (File)
Tue, Dec 17, 4:41 AM
Unknown Object (File)
Sun, Dec 8, 6:42 PM
Unknown Object (File)
Sun, Dec 8, 7:30 AM
Subscribers

Details

Reviewers
btrahan
Commits
Restricted Diffusion Commit
rP181bfffaa126: Truncate object fields in Herald transcripts
Summary

A few users have hit cases where Herald transcripts of large commits exceed the MySQL packet limit, because one of the fields in the transcript is an enomrous textual diff.

There's no value in saving these huge amounts of data. Transcripts are useful for understanding the action of Herald rules, but can be reconstructed later. Instead of saving all of the data, limit each field to 4KB of data.

For strings, we just truncate at 4KB. For arrays, we truncate after 4KB of values.

Test Plan

Ran unit tests. Artificially decreased limit and ran transcripts, saw them truncate properly.

Diff Detail

Branch
htrunc
Lint
No Lint Coverage
Unit
Tests Passed

Event Timeline

And here I thought python unicode support was bad...

Looks good to me, thanks!

In PHP, we get to write our own unicode support!

phutil_utf8_shorten() is just particularly bad for this sort of thing since it shortens to a display length, not a byte length. The resulting string may contain an arbitrarily large number of bytes, e.g. a single "e" followed by a billion combining characters. The function is appropriate when shortening text for inclusion in tables/buttons/UI elements, but not for truncating text.

We should write some phutil_utf8_truncate() which shortens to a byte length or something and has something closer to O(1) runtime (see T3307) but haven't gotten around to it yet, since most of the shortening we do cares about display length.