Display some invisible/nonprintable characters in diffs by default
Currently, we display various nonprintable characters (ZWS, nonbreaking space, various control characters) as themselves, so they're generally invisible.
In T12822, one user reports that all their engineers frequently type ZWS characters into source somehow? I don't really believe this (??), and this should be fixed in lint.
That said, the only real reason not to show these weird characters in a special way was that it would break copy/paste: if we render ZWS as "🐑", and a user copy-pastes the line including the ZWS, they'll get a sheep.
In particular, we can render any character X as <span data-copy-text="Y">X</span>, and then copy "Y" instead of "X" when the user copies the node. Limitations:
- If users select only "X", they'll get "X" on their clipboard. This seems fine. If you're selecting our ZWS marker *only*, you probably want to copy it?
- If "X" is more than one character long, users will get the full "Y" if they select any part of "X". At least here, this only matters when "X" is several spaces and "Y" is a tab. This also seems fine.
- We have to be kind of careful because this approach involves editing an HTML blob directly. However, we already do that elsewhere and this isn't really too hard to get right.
With those tools in hand:
- Replace "\t" (raw text / what gets copied) with the number of spaces to the next tab stop for display.
- Replace ZWS and NBSP (raw text) with a special marker for display.
- Replace control characters 0x00-0x19 and 0x7F, except for "\t", "\r", and "\n", with the special unicode "control character pictures" reserved for this purpose.
- Generated and viewed a file like this one:
- Copied text out of it, got authentic raw original source text instead of displayed text.
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D20194