Page MenuHomePhabricator

Address higher-impact `arc` toolsets behaviors
Open, NormalPublic

Description

arc unit and arc lint no longer prompt you before continuing if there are failures.

In Mercurial, with certain character sets, some arc workflows may fail in arc-hg while trying to perform UTF8 operations.

Shell completion can be trivially simplified.

arc unit and arc lint should have more sensible ordering and summarization of results.

arc alias can't alias diff, etc.

arc alias doesn't list aliases, etc.

arc may leave stdout/stderr nonblocking.

arc no longer automatically provides "Depends on".

Event Timeline

epriestley triaged this task as Normal priority.Apr 6 2021, 3:54 PM
epriestley created this task.

arc may leave stdout/stderr nonblocking.

Ideally, I'd like arc to leave stdout in the state it started in. However, I can't figure out how to actually test if a stream is nonblocking in PHP.

The report in PHI1845 uses Python to invoke fcntl, but literally invoking Python seems a bit beyond the pale.

PHP has dio_fcntl() but this isn't a standard extension. socket_get_option() doesn't seem to be able to read O_NONBLOCK (and this isn't a socket anyway, although PHP is a bit fast-and-loose about this).

stream_get_meta_data() returns a blocked key, which is almost useful, but if that's false we can't distinguish between "nonblocking" and "blocking, but input is available".

I'm just going to assume stdin starts as blocking, since it seems like a reasonable assumption.

I'm also going to assume this method isn't reentrant. Today, there's exactly one callsite that can mess with blocking behavior and it's never-reentrant. If more of this crops up or the method becomes reentrant, this would need some kind of smarter scope guard or stack management to count how many layers of nonblockingness have been applied to the socket.

Actually, blocked is documented as:

blocked (bool) - true if the stream is in blocking IO mode.
https://www.php.net/manual/en/function.stream-get-meta-data

This appears to be the actual behavior of the key:

test.php
<?php

$stdin = fopen('php://stdin', 'r');
var_dump(stream_get_meta_data($stdin));
$ echo abcd | php -f test.php
array(9) {
  ["timed_out"]=>
  bool(false)
  ["blocked"]=>
  bool(true)
...
}

Here, blocked should be false if it really means "blocked" (a read won't block, since I piped text) but it's true, agreeing with the documentation, and the source looks like it literally just reads O_NONBLOCK:

plain_wrapper.c
add_assoc_bool((zval*)ptrparam, "timed_out", 0);
add_assoc_bool((zval*)ptrparam, "blocked", (flags & O_NONBLOCK)? 0 : 1);
add_assoc_bool((zval*)ptrparam, "eof", stream->eof);

So that's good.

I also can't get O_NONBLOCK to survive process exit on macOS. This is possibly because macOS is now zsh, and this RedHat bug suggests that zsh clears O_NONBLOCK:

https://bugzilla.redhat.com/show_bug.cgi?id=1068697

I can't get bash to hold O_NONBLOCK on macOS either, though (on "3.2.57(1)-release" which has a 2007 date), so the exact reproduction environment isn't entirely clear to me.

...this would need some kind of smarter scope guard...

If pcntl is not installed, the user can ^C and dodge a finally block, so this does need to be a guard in all cases.

In Mercurial, with certain character sets, some arc workflows may fail in arc-hg while trying to perform UTF8 operations.

I'm doing some testing to find an appropriate update to arcanist for this. I ran into another issue, when patching down a revision which has unicode characters in the summary on a windows machine. I could try to whack-a-mole these different issues but since Mercurial's --encoding argument is a global option do you think it would be wise to update ArcanistMercurialAPI::buildLocalFuture() with this?

diff --git a/src/repository/api/ArcanistMercurialAPI.php b/src/repository/api/ArcanistMercurialAPI.php
index e5c2078b..ac72c356 100644
--- a/src/repository/api/ArcanistMercurialAPI.php
+++ b/src/repository/api/ArcanistMercurialAPI.php
@@ -15,7 +15,7 @@ final class ArcanistMercurialAPI extends ArcanistRepositoryAPI {
   protected function buildLocalFuture(array $argv) {
     $env = $this->getMercurialEnvironmentVariables();

-    $argv[0] = 'hg '.$argv[0];
+    $argv[0] = 'hg --encoding utf-8 '.$argv[0];

     $future = newv('ExecFuture', $argv)
       ->setEnv($env)

I made that change, then testing with both powershell and cmd (they use different code pages by default~) I was able to both patch and diff to update the revision. Or do you think it would be wiser to only add --encoding utf-8 to the invocations which we uncover cause problems?

I think the global flag is reasonable.

The "most correct" approach is perhaps to make an effort to identify the actual encoding in use, but this adds piles and piles of complexity and I think no reasonable repository should use an encoding which doesn't map into UTF-8 nondestructively (or, really, any encoding except UTF-8 whatsoever). The Mercurial documentation also implies all this metadata is stored internally in UTF-8 anyway, so I don't really know what Mercurial is doing on Windows (destructively converting UTF-8 into a non-functional local Windows encoding?).

It's possible that chcp ("Change Codepage") on Windows would fix this without requiring --encoding, see also T13500. That is, if this breaks:

$ hg log <some commit with UTF-8>

...but this works:

$ chcp 65001 # "Switch the terminal to UTF-8?"
$ hg log <whatever>

...that might be cleaner in some sense. But I have little understanding of how chcp works, and I think --encoding is reasonable even if chcp (and, possibly, the other Linux locale stuff in T13500, for non-Windows systems) is ultimately a more general fix.

Interestingly this is the output I get when using hg log to print out the commit message containing smart quotes with different encodings -- this behavior is the the same for both PowerShell (chcp 65001) and cmd.exe (chcp 437) and no errors occur.

$ hg log -r tip --template "{desc}" # "default" encoding
A test

Summary:
A test of ôsmart quotesö

$ hg --encoding utf-8 log -r tip --template "{desc}"
A test

Summary:
A test of ΓÇ£smart quotesΓÇ¥

$ hg --encoding latin-1 log -r tip --template "{desc}"
A test

Summary:
A test of ?smart quotes?

Given that both PS and cmd.exe have different code pages but this behavior is the same across both it seems like Mercurial might be ignoring the code page? Though I'm not sure what it is doing that invoking it from PHP would cause these problems. I wasn't expecting this result.