Page MenuHomePhabricator

[Wilds] Sanitize UTF8 output in `tsprintf(...)` under Windows
ClosedPublic

Authored by epriestley on Oct 2 2018, 5:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 9, 3:33 PM
Unknown Object (File)
Thu, Nov 28, 1:32 AM
Unknown Object (File)
Sat, Nov 23, 3:38 AM
Unknown Object (File)
Nov 20 2024, 3:28 AM
Unknown Object (File)
Nov 16 2024, 3:40 PM
Unknown Object (File)
Nov 12 2024, 3:11 PM
Unknown Object (File)
Nov 12 2024, 3:33 AM
Unknown Object (File)
Nov 8 2024, 5:54 AM
Subscribers
None

Details

Summary

Ref T13209. In PHP, when you echo or print certain invalid sequences to the cmd.exe terminal under Windows 10, the entire string just vanishes into the ether.

I ran into this because arc unit was reporting "1 failing test" but not actually printing a test failure. That's because the failing test was the surrogate filtering test, and the test failure contained a reserved UTF16 surrogate sequence ("Expected: <filtered result>; Actual: <unfiltered result>"). See D19724.

To try to limit the damage this can cause, explicitly phutil_utf8ize(...) the output under Windows. When we don't need to do this I think it's slightly better not to (occasionally, the raw input might be useful in debugging or understanding something) which is why I'm not just doing it unconditionally.

Test Plan
  • Wrote a script which did echo tsprintf("%s", "<invalid surrogate sequence>");.
  • On Windows 10 in cmd.exe, saw it print something instead of printing nothing.

Diff Detail

Repository
rARC Arcanist
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 2 2018, 5:57 PM
Harbormaster failed remote builds in B20976: Diff 47134!
This revision is now accepted and ready to land.Oct 2 2018, 7:18 PM
This revision was automatically updated to reflect the committed changes.