Page MenuHomePhabricator

[Wilds] Sanitize UTF8 output in `tsprintf(...)` under Windows
ClosedPublic

Authored by epriestley on Oct 2 2018, 5:57 PM.

Details

Summary

Ref T13209. In PHP, when you echo or print certain invalid sequences to the cmd.exe terminal under Windows 10, the entire string just vanishes into the ether.

I ran into this because arc unit was reporting "1 failing test" but not actually printing a test failure. That's because the failing test was the surrogate filtering test, and the test failure contained a reserved UTF16 surrogate sequence ("Expected: <filtered result>; Actual: <unfiltered result>"). See D19724.

To try to limit the damage this can cause, explicitly phutil_utf8ize(...) the output under Windows. When we don't need to do this I think it's slightly better not to (occasionally, the raw input might be useful in debugging or understanding something) which is why I'm not just doing it unconditionally.

Test Plan
  • Wrote a script which did echo tsprintf("%s", "<invalid surrogate sequence>");.
  • On Windows 10 in cmd.exe, saw it print something instead of printing nothing.

Diff Detail

Repository
rARC Arcanist
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley created this revision.Oct 2 2018, 5:57 PM
Harbormaster returned this revision to the author for changes because remote builds failed.Oct 2 2018, 5:57 PM
Harbormaster failed remote builds in B20976: Diff 47134!
epriestley requested review of this revision.Oct 2 2018, 6:04 PM
amckinley accepted this revision.Oct 2 2018, 7:18 PM
This revision is now accepted and ready to land.Oct 2 2018, 7:18 PM
This revision was automatically updated to reflect the committed changes.