Page MenuHomePhabricator

Added initial class for displaying invisible chars

Authored by jcox on Sep 12 2016, 1:04 PM.
Referenced Files
Unknown Object (File)
Sat, Nov 26, 9:38 PM
Unknown Object (File)
Mon, Nov 21, 6:22 AM
Unknown Object (File)
Fri, Nov 11, 3:32 AM
Unknown Object (File)
Thu, Nov 10, 8:11 PM
Unknown Object (File)
Wed, Nov 9, 4:43 PM
Unknown Object (File)
Nov 9 2022, 2:15 PM
Unknown Object (File)
Nov 7 2022, 6:36 AM
Unknown Object (File)
Nov 5 2022, 4:51 AM
"100" token, awarded by epriestley.



Fixes T11586. First pass at a class for displaying invisible characters. Still need to:

  • Write a couple unit tests
  • Add some styling to the .invisible-special spans
  • Actually start using the class when displaying form errors to users

Currently this makes the string "\nab\x00c\x01d\te\nf" look like:

pasted_file (100×573 px, 7 KB)

Test Plan

Unit tests all pass and run in <1ms:

pasted_file (150×542 px, 49 KB)

Diff Detail

rP Phabricator
Lint Not Applicable
Tests Not Applicable

Event Timeline

jcox retitled this revision from to Added initial class for displaying invisible chars.
jcox updated this object.
jcox edited the test plan for this revision. (Show Details)
jcox edited edge metadata.

I think some of these might cause more confusion if we convert them to the actual name vs. just rendering the codepoint representation (ie. \x11 should probably just be displayed as \x11 rather than <DEVICE CONTROL 1>).

That could be just because I'm more familiar with some characters than others. Consistency might be the best bet here. Any thoughts on which ones I should remove/keep?


I initially had this as a const but apparently you can't have const arrays until PHP 5.6. I currently have 5.5.9 installed. Is this the right way to do const arrays in phab's codebase or should I use the 5.6 way?

jcox planned changes to this revision.Sep 12 2016, 1:17 PM

Yeah, I agree on only spelling out the commonly-used ones. It's more clear while debugging to see something like <0x14> (which maybe you can guess at the origin of based on the value -- a bad string length or constant getting added to the string incorrectly or something) than "DEVICE CONTROL 4" which no one uses regularly.

I'd probably be fairly conservative: spell out NULL and stuff that has actual keys on a keyboard, and just leave most of the other stuff as hex escapes?


This is fine/correct. We support PHP way back to the stone ages so you can never use anything cool (well, cool for PHP, like "const"). Sorry.


This should probably be phutil_utf8v() ("turn a byte-string of UTF8 into a list/vector of UTF8 characters") to handle UTF8 correctly.

I'm also not really sure what we should have this method do for weird UTF8. There are a lot of bizarre things that users can type that won't be obvious when displayed -- T6003 has a couple of examples like all this garbage:


Probably fine to largely leave it as-is for now and we can deal with it once it becomes a real problem.


By convention, prefer $ii over $i -- it responds slightly better to "find / find next" in non-syntax-aware editors, but also brings fortune and good luck.

Added unit tests, pared down the named list, and added plaintext option

epriestley added a reviewer: epriestley.

This looks reasonable to me as written. Here's an alternate way to structure it which might be worth considering at some point:

  • Move the actual logic to a shared getStringParts() method, which returns a list of parts:
    'special' => false,
    'value' => 'a',
    'special' => true,
    'value' => '<0x01>',
  • Have both renderPlainText() and renderHTMLArray() call that method, then just decorate the results appropriately (e.g., wrap the "special" parts in a span so they get highlighted).

The advantages I see in this approach are:

  • If we do special stuff with UTF8 in the future, it only needs to go in one place. Right now, it would need to go in both renderPlaintextArray() and renderHTMLArray().
  • Because almost all the code would be shared, it would no longer be possible for HTML/text to have significantly different behavior, and the HTML tests could be stripped down (e.g., to one simple test to make sure the decoration works) or even eliminated completely. This is nice because the HTML tests are a big pain to write and relatively fragile (e.g., they'll break if we change CSS, even though they probably should not).

Consider making this a separate property with accessors (setPlainText(true)). This avoids new Thing($object, true, true, true, false, 3, -1, null); if we add more stuff later -- or, realistically, avoids needing to find/replace every constructor once we get to 2-3 parameters so we don't ever end up in 7-parameter territory.

(Or, make renderAsPlainText() a public method, and keep render() as the way to get HTML.)

This revision is now accepted and ready to land.Sep 12 2016, 4:39 PM
jcox marked 6 inline comments as done.
jcox edited edge metadata.

Shared some code between the 2 modes, simplified unit tests
Remove anon function because old PHP

Added some really basic CSS for invis chars

I'm also not sure where to actually start using this. I haven't been able to find any obvious error messages that echo back user input. I could land this independently, but presumably it should be used if it's actually going to fix the issue that was reported.


I pulled this from the remarkup styling. The resulting blocks look like this:

pasted_file (77×456 px, 5 KB)


  • AuthRegisterController.php near line 91:
if (!PhabricatorUserEmail::isAllowedAddress($default_email)) {
  return $this->renderError(
        'The account you are attempting to register with has an invalid '.
        'email address (%s). This Phabricator install only allows '.
        'registration with specific email addresses:',
  • Maybe make describeValidUsername() take a parameter (the username the user tried to use), then have it use this to show "The username X is not valid: ..."

Less-valuable, maybe we just fix these up as we get there:

  • Phurl aliases.
  • Project hashtags.
  • AlmanacNames has some of these.
  • Repository callsigns.
  • Repository short names.
  • (Phame and Phriction have slug-like names but I don't think there are any possible user-facing errors.)
  • Phame blog "Parent Site URI".
  • OAuth server "Redirect URI".
jcox edited edge metadata.

Added invisible character formatting to auth controller and phurl aliases

I'm going to land this and create a separate ticket to fix up the other places it can be used.

This revision was automatically updated to reflect the committed changes.