Page MenuHomePhabricator

Create `phutil_utf8_truncate()` or make `phutil_utf8_shorten()` more sophisticated
Closed, ResolvedPublic

Description

We recently fixed phutil_utf8_shorten() to account for combining characters, so it now does a generally reasonable job of shortening an input to a given number of characters and producing a valid UTF8 output string.

Shortening to a given number of characters is generally what we want, since we most often use this function to shorten titles or summaries and make things fit in a limited display area.

However, sometimes we want to shorten an input string to a given number of bytes. One example is D6118, where we want to reduce an email's length to under a certain size. phutil_utf8_shorten() does not guarantee a minimum byte size. Theoretically, an input might have one x followed by an arbitrarily large number of combining characters.

There are a couple of approaches here; we could introduce a second function (phutil_utf8_truncate(), or phutil_utf8_shorten_bytes()). We could also add another parameter to phutil_utf8_shorten(), e.g. add $byte_length.

The implementation of byte truncation is nearly identical to the existing implementation, we just need to count strlen($char) against the length instead of implicitly counting 1.

Event Timeline

epriestley added a subscriber: epriestley.

See https://github.com/phacility/phabricator/issues/632 for a specific issue: we're truncating commit summaries at 80 display characters, but this may be an arbitrary number of codepoints and not fit into a varchar(80) column.

  • Find phutil_utf8_shorten() callsites in Phabricator and Arcanist.
  • Replace them with PhutilUTF8StringTruncator. Use:
    • Byte limits for things like email.
    • Codepoint limits for things like database columns.
    • Glyph limits (current behavior) for display strings.
    • Multiple limits if there are weird special cases?
chad changed the visibility from "All Users" to "Public (No Login Required)".Jun 24 2016, 1:37 AM