We currently use Zero Width Space characters to "cleverly" make side-by-side diffs somewhat-copyable (we want to copy only one side of the text, and do not want to copy line numbers). This implementation has some rough edges:
- On the face of it, it's obviously a giant hack.
- When ZWS characters appear in actual text, they break things (see below).
- You can only copy the right-hand side of a diff (see PHI974 and PHI504 for users wanting to copy the left side of a diff).
- We copy the displayed text. We'd like to always copy the exact original text, but we'd sometimes like to display different text. We currently have to display the original text to make copy behavior authentic. (See PHI973 for \r.)
We can fix all three cases with a great Javascript ritual:
- When the user starts a selection, tag either the left or right hand side of the diff for visual selection based on where they clicked.
- When the user copies, walk the whole DOM and build the text manually.
- Allow nodes to have alternate copy-text which is used instead of the display text when the two differ.
I believe the ZWS implementation is an outgrowth of a previous, very ancient implementation was motivated partly by concerns that node.textValue was wildly unreliable prior to 2011, but modern (non-IE) browsers seem less crazy about this and ReviewBoard has a node.textValue-based Javascript ritual which suggests this probably isn't too awful anymore nowadays.
Previously
Zero width characters, e.g. <U+200B>, are usually accidental and unintended. They are also insidious in that they are hard to detect during code review or later on when they've caused issues in production. In particular, the differential will highlight the affected line but there's no individual character to highly in dark green.
The differential tool should make the diff painfully obvious to make sure it's not an accidental change. A good example of this is what's shown by git show b4409d8c95885b22277d30d031f405d7bfbc0618 (this is a value commit in Phacility):
-echo "Test complete" +<U+200B><U+200B>echo "Test complete"
This has caught out other users, for example:
https://phabricator.haskell.org/D3344?id=11749#96150
As an addition to this (possibly a separate feature), add simple character match check to the text linter. This could also be used to check for tabs or other undesirable characters. As a workaround, I configured a herald rule - this works but the checks should really be done by the arc linter.
Can a lint check be easily added without the complexity of using the script-and-regex lint tool?
Other reference found to zero width space:
https://secure.phabricator.com/D8727