At the time of writing, `arc` does not escape all inputs to command line functions correctly on Windows. That is, for a snippet like this:
list($stdout) = execx('echo -n %s', $input);
..there is a set of possible values for `$input` where the code will execute without errors but `$stdout` and `$input` will have different values at the end of execution. Some examples of such input are:
As an aside, this is also true on Linux systems (the code can execute without errors but the input and output may differ), because there does not appear to be any way to make `echo` echo the string `-n`, at least on OS X:
$ echo -n
$ echo -n -n
$ echo -n -- -n
For the purposes of this discussion, assume `echo` is a platonic ideal of `echo` which can truly echo any string.
For prior discussion, see T8298 and connected tasks. This discussion is extensive and meandering, but roughly summarizes as:
- no one really has any idea how to escape things on Windows; and
- it appears to be impossible to escape a large set of inputs on Windows.
Part of the reason this is difficult is that Windows is not a single shell environment, but at least three: `cmd.exe`, MSYS, and Git Bash. (We may, today or in the future, also need to deal with Powershell.) The three shells have different behavior, with `cmd.exe` generally having the most absurd behavior.
I currently believe that `cmd.exe`, at least, can not escape all inputs, and that the behavior of `csprintf()` under `cmd.exe` must sometimes be to throw an exception saying "this input can not be escaped, use a different input (or use Git Bash / some other shell / some obscure workaround)". This is wholly absurd, but at least an improvement over the current behavior.
**Executables vs Arguments**
Windows escaping behavior in at least some shells appears to be different for the first argument (the binary/executable) than for other arguments. This may require separate escaping behavior.
For example, in Linux, `ls`, `'ls'`, and `"ls"` all do the same thing. In `cmd.exe`, `echo` and `"echo"` have different behavior -- `"echo"` does not work. However, `where`, `where.exe`, `"where.exe"` and `"where"` all work.
Under Windows, `proc_open()` in PHP has a `bypass_shell` mode. It is broadly unclear what this option actually does or how it affects command escaping.
In PHP, command line escaping is handled by `escapeshellarg()`. This function is completely broken on Windows and makes no effort to escape inputs correctly. It explicitly produces incorrect output without raising an error:
> On Windows, escapeshellarg() instead replaces percent signs, exclamation marks (delayed variable substitution) and double quotes with spaces and adds double quotes around the string.
Python's `subprocess` (in Python 3) has `run(...)` and `list2cmdline(...)`. These seem generally more promising as models for escaping behavior. One calls the other. This is under Python 3; under Python 2 it looks like `run()` is `call()`?
By default, `run(...)` appears to use a `bypass_shell`-like mode and successfully escape most inputs. You can pass `shell=True` to get a shell mode where everything is broken, as one might expect.
For most of these experiments, I've written a version of `echo` as `<?php echo $argv;` and am executing it as `php -f echo.php -- ...`. When I say `phpecho` below I mean this construction, as distinct from the `cmd.exe` builtin `echo`.
When run with `shell=True`, Python does not escape `%`, so `phpecho %APPDATA%` prints the environmental variable. `phpecho a\nb` prints only "a". To its credit, `python` is at least somewhat better than PHP here and throws when passed a NULL byte, although this behavior (on OSX, under Python 2.7, with actual `echo`) is confusing to me:
>>> subprocess.call(["echo", "\0"]);
TypeError: execv() arg 2 must contain only strings
With `shell=False`, arguments with spaces are quoted, with internal quotes and backslashes escaped with backslashes. From cursory examination, no other characters appear to receive special treatment.
The Python 3 documentation claims:
> When using shell=True, the shlex.quote() function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.
I think this is not correct. In particular, `shlex.quote(...)` does not quote `%APPDATA%` and this:
>>> subprocess.run([..., shlex.quote('%APPDATA%')], shell=True)
...prints the value of the variable, not the literal `%APPDATA%`. So this is perhaps a point to PHP, as being completely broken, while bad, is still better than documenting something unsafe to be safe.