At the time of writing, arc does not escape all inputs to command line functions correctly on Windows. That is, for a snippet like this:
list($stdout) = execx('echo -n %s', $input);
..there is a set of possible values for $input where the code will execute without errors but $stdout and $input will have different values at the end of execution. Some examples of such input are:
%APPDATA% \0 \n
As an aside, this is also true on Linux systems (the code can execute without errors but the input and output may differ), because there does not appear to be any way to make echo echo the string -n, at least on OS X:
$ echo -n $ echo -n -n $ echo -n -- -n -- -n
For the purposes of this discussion, assume echo is a platonic ideal of echo which can truly echo any string.
For prior discussion, see T8298 and connected tasks. This discussion is extensive and meandering, but roughly summarizes as:
- no one really has any idea how to escape things on Windows; and
- it appears to be impossible to escape a large set of inputs on Windows.
Environments
Part of the reason this is difficult is that Windows is not a single shell environment, but at least three: cmd.exe, MSYS, and Git Bash. (We may, today or in the future, also need to deal with Powershell.) The three shells have different behavior, with cmd.exe generally having the most absurd behavior.
I currently believe that cmd.exe, at least, can not escape all inputs, and that the behavior of csprintf() under cmd.exe must sometimes be to throw an exception saying "this input can not be escaped, use a different input (or use Git Bash / some other shell / some obscure workaround)". This is wholly absurd, but at least an improvement over the current behavior.
Executables vs Arguments
Windows escaping behavior in at least some shells appears to be different for the first argument (the binary/executable) than for other arguments. This may require separate escaping behavior.
For example, in Linux, ls, 'ls', and "ls" all do the same thing. In cmd.exe, echo and "echo" have different behavior -- "echo" does not work. However, where, where.exe, "where.exe" and "where" all work.
Bypass Shell
Under Windows, proc_open() in PHP has a bypass_shell mode. It is broadly unclear what this option actually does or how it affects command escaping.
PHP
In PHP, command line escaping is handled by escapeshellarg(). This function is completely broken on Windows and makes no effort to escape inputs correctly. It explicitly produces incorrect output without raising an error:
On Windows, escapeshellarg() instead replaces percent signs, exclamation marks (delayed variable substitution) and double quotes with spaces and adds double quotes around the string.
Python
Python's subprocess (in Python 3) has run(...) and list2cmdline(...). These seem generally more promising as models for escaping behavior. One calls the other. This is under Python 3; under Python 2 it looks like run() is call()?
By default, run(...) appears to use a bypass_shell-like mode and successfully escape most inputs. You can pass shell=True to get a shell mode where everything is broken, as one might expect.
For most of these experiments, I've written a version of echo as <?php echo $argv[1]; and am executing it as php -f echo.php -- .... When I say phpecho below I mean this construction, as distinct from the cmd.exe builtin echo.
When run with shell=True, Python does not escape %, so phpecho %APPDATA% prints the environmental variable. phpecho a\nb prints only "a". To its credit, python is at least somewhat better than PHP here and throws when passed a NULL byte, although this behavior (on OSX, under Python 2.7, with actual echo) is confusing to me:
>>> subprocess.call(["echo", "\0"]); ... TypeError: execv() arg 2 must contain only strings
With shell=False, arguments with spaces are quoted, with internal quotes and backslashes escaped with backslashes. From cursory examination, no other characters appear to receive special treatment.
The Python 3 documentation claims:
When using shell=True, the shlex.quote() function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.
I think this is not correct. In particular, shlex.quote(...) does not quote %APPDATA% and this:
>>> subprocess.run([..., shlex.quote('%APPDATA%')], shell=True)
...prints the value of the variable, not the literal %APPDATA%. So this is perhaps a point to PHP, as being completely broken, while bad, is still better than documenting something unsafe to be safe.