Page MenuHomePhabricator

Shell completion doesn't work for filenames containing a space
Closed, ResolvedPublic

Description

The shell completion for arc upload doesn't work properly if the filename contains a space. For example, in an empty directory I ran touch 'foo bar' and then ran arc upload foo<TAB>. The output was:

> arc upload foo<TAB>
bar  foo

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Arcanist.
joshuaspence added a subscriber: joshuaspence.
chad added a subscriber: chad.EditedAug 9 2015, 12:22 AM

Isn't that shell specific?

This is at least somewhat our fault, behavior on nano foo<tab> is different from arc upload foo<tab>. There's probably some shell-specific magic in there too, of course...

Shell completion needs an overhaul anyway (T7488) so that it can work outside of just arc.

eadler added a project: Restricted Project.Jun 17 2016, 6:58 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 17 2016, 7:09 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:05 PM
epriestley moved this task from Backlog to vNext on the Arcanist board.Nov 14 2016, 5:51 PM

...

https://stackoverflow.com/a/1146716

?????????????????

How did anyone get computers to do anything before 2005?

Some general notes on this after putting far more time than feels valuable into it:

The construction X=( $Y ) in Bash splits $Y on spaces (any whitespace?), turning it into an array, and assigns it to X. This ignores escapes, so:

Y="a\ b"
X=( $Y )

...results in an array with elements a\ and b. This means the construction is not useful if we want to handle filenames with spaces. The current approach uses this snippet:

COMPREPLY=( $(compgen -W "${OPTS}" -- ${CUR}) )

We can fix the behavior by defining IFS ("internal field separator") to be \n only, like this:

IFS=$'\n'
Y="a\ b"
X=( $Y )

This produces an array with one element, a\ b.

IFS is a global, so assigning it will change the behavior of everything. We can limit this by using local IFS inside the completion function.

compgen -W "..." has a similar issue, where the parameter is tokenized using IFS, so the behavior of compgen depends on the definition of IFS. The default behavior of compgen is to split on (any?) whitespace. Since the behavior of compgen appears to be simple (split the list and reject results which do not match the prefix) my current patch drops it in favor of doing our own prefix matching, emitting newline-delimited results, and then just using X=( $Y ) to return them.

The complete ... -o filenames ... flag is important for these behaviors:

  1. when you type src<tab>, completing it to src/ and not adding a space;
  2. letting you navigate subdirectories like src/path/to/thing.c;
  3. when you type src/thing/<tab><tab>, showing you the items in that subdirectory as completion suggestions (say, thing.c, stuff.py), not the full paths that will actually complete (src/thing/thing.c, src/thing/stuff.py).

We can emulate behavior (1) by using -o nospace and adding an explicit space after non-path completions.

We can emulate behavior (2) by doing filesystem inspection ourselves.

I don't see a way to emulate behavior (3), and the subdirectory completion behavior is quite bad without it. Thus, my current patch retains -o filenames.

A downside of this is that if you have a folder named --draft/ and type arc diff --draf<tab>, it will autocomplete as arc diff --draft/. This seems unavoidable, and like a very small price to pay. Don't name directories --thing/, mkay?

Overall, I think I nearly have this working locally, although some unusual cases, like arc weld foo\ <tab> aren't quite working yet. Nor is arc weld 'foo<tab> working yet, per the StackOverflow answer above, although I'm not terribly concerned about any of these cases if they prove unreasonably difficult.

I'm also retaining the use of compgen -f (to generate file completions), at least for now, although I've moved it into arc shell-complete to limit the amount of Bash magic we're relying on.

Dropping compgen -W ... lets us autocomplete arc WEL<tab> into arc weld, which seems nice, if a bit niche? Broadly, we can autocomplete things into completely different things if we have some reason to. A possibly useful behavior would be to expand flags, e.g. autocomplete -f into --force, interpreting "-f<tab>" to mean "remind me what this short flag stands for".

The current code also uses ${COMP_WORDS[@]}, but believe this needs to be "${COMP_WORDS[@]}" to preserve arguments correctly when they contain spaces.

In a directory with a file foo bar:

epriestley@orbital ~/scratch/tmp $ ls
foo bar

...I wrote this script which runs compgen -A file -- $1:

$ cat ../script.php 
<?php

$output = array();
$err = array();
exec("compgen -A file -- ".escapeshellarg($argv[1]), $output, $err);
print_r(
  array(
    'output' => $output,
    'err' => $err,
  ));

Here is what I get if I run compgen -A file -- foo in the directory:

$ compgen -A file -- foo
foo bar

Here is what I get if I run script foo in the directory:

$ php -f ../script.php foo
Array
(
    [output] => Array
        (
            [0] => foo bar
        )

    [err] => 0
)

So far, so good. The script and the direct invocation do exactly the same thing, so we'd expect them to exhibit exactly the same behavior.

Here is what I get if I run compgen -A file -- 'foo\ ' in the directory:

$ compgen -A file -- 'foo\ '
foo bar

Here is what I get if I run script 'foo\ ' in the directory:

$ php -f ../script.php 'foo\ '
Array
(
    [output] => Array
        (
        )

    [err] => 1
)

That is, compgen -A file -- ... has different behavior for arguments with spaces in them if it's invoked from the shell versus invoked from PHP. There's nothing on stderr, either, and this doesn't appear to be a consequence of any environmental variables.

I attempted to figure out what's going on by examining the Bash source. I think, perhaps, pcomp_filename_completion_function() is where things go sideways? It has about 50 lines of heuristics to figure out when to dequote arguments:

      /* There are roughly three paths we can follow to get here:
		1.  complete -f
		2.  compgen -f "$word" from a completion function
		3.  compgen -f "$word" from the command line
	 They all need to be handled.

...

      /* Intended to solve a mismatched assumption by bash-completion. ...

...

      /* Another mismatched assumption by bash-completion.  If compgen is being ...

...

      /* We could check whether gen_shell_function_matches is in the call
	 stack by checking whether the gen-shell-function-matches tag is in
	 the unwind-protect stack, but there's no function to do that yet.

...

	       sh_contains_quotes (text))	/* guess */

I'm just going to cheat my way out of this since it seems like absolute insanity to me.

Bash also ships with a completion example with this preamble:

# This function performs file and directory completion. It's better than
# simply using 'compgen -f', because it honours spaces in filenames. ...

It is deeply alarming to me that Bash works at all.

I've marked D19705 as fixing this. Some stuff still doesn't work, notably:

arc 'we<tab> -> does not complete
arc shell-complete '--gen<tab> -> does not complete

In these cases, we could dequote the argument, evaluate it, and complete it, in theory. That is, arc 'weld' is a valid command to type. But no one does this and I think our behavior is reasonable here.

We also get some things like arc --config x=y we<tab> less right than we could. The ones I'm aware of have comments in the code. I think these are a low priority to improve, and we can improve them seamlessly later without requiring an arc shell-complete --generate.