Page MenuHomePhabricator

Fix Windows Escaping
Closed, ResolvedPublic

Description

Command escaping on Windows is a broken mess, we don't know how to fix it, and it's not clear that it's even fixable.

Revisions and Commits

Related Objects

Event Timeline

epriestley raised the priority of this task from to Low.
epriestley updated the task description. (Show Details)
epriestley added a project: Windows.
epriestley added a subscriber: epriestley.

Possibly a bug with PHP's implementation of escapeshellarg for Windows?

I think the proposed patch in that second link is even wrong though I've not bothered to look through full source. According to behaviors "replaces % with a space" it looks like his change would just replace the % with the escape character instead of inserting the ^.

It sounds like it's still not fixed, and according to the last comment from 2012, the escape character should not be used if the argument/command is wrapped in a quote.


Not sure if this is wholly related but through some of the other tasks referenced in this "windows hell" tree I don't think it isever explicitly stated (or I just missed it) but it sounds like there's also an issue with how to escape arguments when run from different environments -- CommandLine, PowerShell, GitBash, Cygwin, etc. Would a potential workaround for this issue be to change running git args into cmd.exe /C git args? That way we're always invoking one comand-line to interpret the arguments, regardless of running from other environments?

oops-cthulhu.jpg (99×81 px, 7 KB)

I think the way forward here is likely:

Implement all escaping ourselves, with selectable rulesets. We already do this to some degree because of Drydock. In the long run, we should ideally never use escapeshellarg() because we can't choose which behavior it will use. It always does escaping for the current system, you can't run it on a Linux system to build a command for execution on a Windows system or vice versa (which can legitimately occur in Drydock, at least some day).

We can add additional rulesets for all the shell flavors, and make the default behavior do some kind of smart autodetection of shell escaping rules for the current environment (hopefully just testing OS / environment, but possibly doing empirical tests if needbe).

Having selectable rulesets will also let us build a better test suite.

Build a test suite. Start building a test suite to cover escaping behavior on each system. We can run all the rules on all systems, and we can evaluate all the rules for the current shell environment (by running echo ...) and skip the rest.

Fail explicitly when parameters can't be escaped. Most of the mystery around Windows rules would be dispelled if we failed more aggressively. We'd be better off bailing out with an explicit error ("We don't know how to escape this properly and don't believe it is possible, see Txxx.") than doing things mysteriously / wrong.


I believe Windows escaping in the cmd environment is not possible for all inputs. In particular, I recall these three situations as problematic:

  • I was unable to find a way to escape newlines.
  • I was unable to find a way to interact with null bytes at all.
  • I was unable to find a way to escape strings like %PROGRAMFILES%, as a literal string.

The AutoHotkey link above discusses many cases, but I think it does not discuss these cases. For example, the table in section 7.2 lists %XYZ% where XYZ is not an environmental variable, but does not discuss the case where it is an environmental variable. Generally, the discussion and examples largely avoid environment variable handling.

It is possible that we can control the environment and reject only %...% sequences which will actually be environmental variables when the string is evaluated, but I'm not sure we have total control over the cmd environment. Getting this wrong is also potentially dangerous. Trivially, it is clearly bad if we do rm %s -> parameter is %FILENAME% -> rm %FILENAME% -> FILENAME is defined in the environment as "-rf /" -> rm -rf /. I think this is why PHP chooses a clearly broken behavior over an unsafe one.

I suspect that we will never be able to escape the string %PROGRAMFILES% \0 " \r \n (where %PROGRAMFILES is literal and \0, \r and \n are NULL and newlines) in cmd.


D6070 / T3272 discuss a vaguely related issue which might have some impact here.


It may be useful to look at what other languages (like Python and maybe Node) do here. I suspect the answer is either "they are also broken" or "they cheat", maybe by bypassing the shell entirely. If they cheat, perhaps we can find a way to cheat, too.


Generally, it's OK if we're never able to escape all inputs, we should just give users a clear error with an explanation and a pathway forward when we can't escape something, even if that pathway forward is cumbersome or complex ("remove the string '%PROGRAMFILES%' from your input, or switch to Git Bash, see Txxx for discussion").

bypass_shell (windows only): bypass cmd.exe shell when set to TRUE
5.2.1 Added the bypass_shell option to the other_options parameter.

So may be the answer is this?

D14246 / D11396 / T6966 are about a specific Windows issue where the command name is treated differently from other arguments. That is, these commands mean different things:

$ jshint
$ "jshint"

On Linux, these are the same command. In at least some Windows environments, the second variant does not work.

It is possible that we may need to introduce a new csprintf() conversion for "binary" to handle this. I'm not sure what the escaping rules under Windows are to run a binary named "binary with spaces in it"; it may not be possible. If so, this conversion would have some ruleset like:

  • If the input is a path (how can we tell?), escape the path using %s rules.
  • If the input is not a path and has no spaces, escape it using some other set of rules (what rules? What if the binary is %PROGRAMFILES%?).
  • If the input is a path and has spaces, or otherwise is something we don't know how to escape, throw an exception explaining that we don't know the rule to safely run a binary named %PROGRAMFILES WITH SPACES% in Windows.

I'm not sure if this is the actual best approach, but this is the general category of approach I expect we will need to pursue here.

Possibly. We've used it in PhutilExecPassthru since D6774.

Oh, that diff just moved it, so it's actually older than that.

D1938 is the actual origin, I think.

Oh god, and I thought Python's subprocess module was pretty bad.

Okay, proposal: how about we implement the equivalent of this: https://github.com/scopatz/xonsh/blob/fa356d728b2852274f8aa0260dc7da3b7b0f08ab/xonsh/built_ins.py#L268

And then default to bypass_shell everywhere? (this also means getting rid of PhutilExecPassthru since it will be obsolete)

How does that obsolete PhutilExecPassthru?

The major use case for "Passthru" is invoking an $EDITOR like vim or nano, where stdout/stderr need to be redirected to the subprocess. If you execx() it, it doesn't get access to stdout so it doesn't show up on your console, and it doesn't get access to stdin so you can't type anything.

How does that obsolete PhutilExecPassthru?

Because my proposal is to make execx pass bypass_shell=true at all times. execx would implement that function to always find the executable path itself so its current usage does not need to change. It will just be more robust.

I'm pretty sure that if execx() an editor like vim or nano, whether you use bypass_shell or not, it just executes it uselessly in the background and hangs forever. That is, the behavior of these two programs is very different on Linux, and I believe it will be very different on Windows no matter what bypass_shell is set to:

<?php

// This opens up `nano` as though you had typed "nano" from the CLI.
// You can use the program to edit documents, etc.
phutil_passthru('nano');
<?php

// This executes `nano` as a subprocess with no access to stdin or stdout.
// You can't use it, and the invocation is wholly pointless.
execx('nano');

@epriestley - okay, may be the name mislead me. Anyways, I'm basically proposing making bypass_shell on by default in execx and implementing a mechanism that automatically finds the full path of the executable on Windows by implementing that Python function I referred to. Does this make sense?

That seems tentatively fine, although it may only solve some of the problems here and may create new problems.

(Does bypass_shell definitely disable binary resolution with PATH? That is, are you sure we even need to do the resolution ourselves?)

I'm fairly certain since it seems to be using the CreateProcess API as others do. I'm also using xonsh (the project using that Python code) for quite some time now and it works without any issues.

can you please raise the severity for this issue ? It is really hard to work like that
when renaming a file 'svn mv'. for now we are just committing the code and using an Audit.

can you please raise the severity for this issue ? It is really hard to work like that
when renaming a file 'svn mv'. for now we are just committing the code and using an Audit.

see Planning for details about how to raise the severity of the issue

currently phabricator is POC on our team, if the company would like it they would purchase it. currently we don't have any budget for POCs at all.
The only thing I can do right now is look into the code and fix it (contribute code).

@eadler - I'd like to remind you that there are 3 patches waiting for review right now. They have been sitting there for a few months. That said we are not paying Phabricator. If the only way to get these patches which are already worked on and in pretty good shape merged in is to pay you or maintain a fork I'm really starting to think about crowdsourcing this on Kickstarter or Indiegogo to collect those few hundred bucks to get already working patches landed or simply create a fork and point people there loudly. I also think I almost qualify for "Rise to prominence" sub-section mentioned in https://secure.phabricator.com/book/phabcontrib/article/contributing_code/#alternatives but of course that also relies on you guys letting me contribute more. Making patches sit for months without any feedback and then making me rebase them multiple times is definitely not a great way to encourage people to "Rise to prominence" if you really value that.

@davidhodeffi - Feel free to apply the following patches to your local version: D15675, D15693 and D15676. I have been running on them for months and they work just fine.

I'm not sure if Phacility wants it to be used this way but it may be possible to setup a Fund Initiative for prioritizing this. I would likely be able to secure some funds for this feature.

In T8298#187157, @BYK wrote:

@eadler - I'd like to remind you that there are 3 patches waiting for review right now. They have been ...

I don't work for phacility. I, like you, occasionally contribute code as well as administer at this point 5 different copies of phabricator. Sorry for curt tone above.

@eadler - oops, sorry about that then :) I'm still behind what I said though ;)

@eadler We are all appreciate your work. I understand you have your own reasons why not applying those patches. But I also know that we cannot use

svn mv

Which is critical operation for development.
On the other side I don't want to miss a history for a file so I cannot use

svn rm
svn add

For now we don't have a hook on our svn (because it still POC) so we commit and then Audit but it really frustrate our team (5 people).

Let's be practical. For now we have 2 options :

  1. Merging those patches and then refactor /redesign the code.
  2. Designing and refactoring now.

We also need to estimate the refactoring , if it would take too long maybe applying the patches first can be a good idea.
Delaying this issue more is not a good idea.
Any other ideas? Can we work together on this issue? Can we start working on it tomorrow?

running arc patch locally to test the patches should take you less than a minute.

At this time if you want to use Phabricator with Windows development, you pretty much have to be prepared to have some support resources and patches internally because Windows isn't a platform that the upstream frequently uses.

For example, we have our own forked version of Arcanist and libphutil that fixes several issues that we run into (like D13685). We also got an optional feature upstreamed (use files for streams on Windows), which we have to enable for all of our unit testing and linting to avoid deadlocks when executing processes under Windows.

As I mentioned on D15675, it's unlikely in my experience that you'll see these things progress until it either gets prioritized or lands naturally on the upstream roadmap. Which is fine because upstream is free to do what they want, but you should be aware that if you're considering adopting Phabricator in a Windows environment that there is a significant increase in internal costs with regards to supporting your engineers using the software.

@hach-que I have understood that Windows is not the best OS for development, on the other hand you cannot ignore users that use this OS.
Anyway, what can be done in order to fix that ? what do you want me to do ? please be practical. I don't have any budget from the company right now so paying Phacility is not an option right now. I am very talented developer :)

Do you think there are any intentions to fix this issue for windows users? At least in far future?
Me and my colleges are still working with an old arcanist version 737f5c0d. If this workaround breaks, there would be no way for windows users to work with arcanist. Except using a fork containing D15675.

So I start thinking about creating a arcanist like cli for windows users based on conduit methods and powershell.

What do you think? Would it relax the situation?

@chad - How can I be involved with any progress on this front then since now it is planned?

Aside: unlike "feature requests" this diff and its companions actually fix an existing feature that you've committed to maintain anyways so it should reduce the maintenance load, not increase :)

Just a quick note to say that I have been using @BYK 's D15675, D15676 and D15693 for over a month now with no issues, so would be in favour of them being merged in.
Setup:

  • Windows 10 x64,
  • standalone cmd.exe
  • cmd.exe through ConEmu
  • git repos hosteded locally in phabricator (running on Ubuntu server)
  • relatively straightforward use of arc feature, arc diff and arc land

Thanks @BYK!

epriestley claimed this task.

Evolved into T13209.