Paths

Table of Contentst

Differential D18606

Allow setting file descriptors on ExecFutures
Needs ReviewPublic
Actions

Authored by alexmv on Sep 14 2017, 8:54 PM.

Details

Reviewers

None

Group Reviewers

Blessed Reviewers

Summary

Allow callers to more precisely set the file descriptors
inherited by the child process. This allows one to plumb STDERR
directly to the controlling TTY, let the process read from our STDIN,
attach filehandles directly, or pass data on higher-numbered file
descriptors.

Test Plan

Added tests

Diff Detail

Repository

rPHU libphutil

Branch

up-file-descriptors

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 18454
Build 24850: Run Core Tests
Build 24849: arc lint + arc unit

Event Timeline

alexmv created this revision.Sep 14 2017, 8:54 PM

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptSep 14 2017, 8:54 PM

Herald added a subscriber: epriestley. · View Herald Transcript

Harbormaster failed remote builds in B18454: Diff 44673!Sep 14 2017, 8:54 PM

For context, the underlying goal of this change is to allow us to have Future versions of phutil_console_prompt and phutil_console_confirm which can operate while work is proceeding in the background. Our specific use case is to prompt "Unit tests are expected to take xxx seconds; run them locally?" while sneakily starting to run them in the background while the user is considering their answer. We abort the prompt future if the tests complete before we see a response from the user, and abort the test-running Futures if the user responds 'n'.

These Futures obviously only work for not-Windows -- so they can't be used to replace phutil_console_prompt. As such, I'm not sure if they're suitable to upstream, but I can submit them if you'd like.

In the past, we ran lint and unit in the background while $EDITOR was open for the user to edit their commit message. This broadly seems more promising than "Run slow tests?", which seems like it will usually only get a second or two of "cheat" time most of the time? An easier approach to this might also be:

Running optional tests (ETA 32 seconds, ^C to skip remaining tests and continue diff)...

..and then catch the signal and kill the test subprocess.

I'm not terribly enthusiastic about this approach -- it seems hackier and less flexible (e.g., almost certainly won't work while $EDITOR is open, hopeless on Windows) than other possible approaches.

Strictly speaking, this stuff also already works:

$future = new ExecFuture('run-unit-tests');
$future->start();

if (phutil_console_prompt(...)) {
...

With this construction alone, tests will run in the background until they fill either the OS stderr or stdout buffer (4K I think?), which I expect will normally probably take longer than answering a prompt will.

We could make this more nuanced with, e.g., phutil_console_prompt(..., $futures) or some less-goofy construct without needing to pass actual fds.

This approach is only a problem when the subprocess actually needs to prompt the user interactively to continue. I think (?) arc unit does not. arc lint currently does, but if we were to pursue this again I'd be inclined to start by restructuring arc lint so it did not need to do this (I believe there is no technical reason that the prompts need to be blocking and serial, or need to be handled by the subprocess).

I'm not terribly enthusiastic about this approach

Er, specifically, about the "pass specific fds" approach.

In the past, we ran lint and unit in the background while $EDITOR was open for the user to edit their commit message.

Out of curiosity, why did that change?

This broadly seems more promising than "Run slow tests?", which seems like it will usually only get a second or two of "cheat" time most of the time?

Our metrics show that for us, about 50% of arc diff runs hit a prompt. Of those with a prompt, the timings look like this:

arc-diff-console-timings.png (593×898 px, 19 KB)

Where:

p50: 6s
p75: 21s
p90: 66s
p95: 136s

So at least 25% of arc diff runs which prompt end up waiting more than 21s, which is not insignificant. There's some fudge here because this is the sum of the prompt times in a single arc command, and there may be multiple prompts within this. However, we do also see a correlation between time to initial prompt and time spent waiting at it, with a rise around 15s -- meaning that users tend to tab away near that mark.

We saw statistically significant (p<0.01) improvements in this by adding a echo "\007" at prompts, and encouraging users to configure their terminals to make visual bells apparent:

Running optional tests (ETA 32 seconds, ^C to skip remaining tests and continue diff)...

Hm, that is one solution. For us the times may be in minutes, however, so this robs the user of the feeling of control. The illusion of control is an important one, I think.

I'm not terribly enthusiastic about this approach -- it seems hackier and less flexible (e.g., almost certainly won't work while $EDITOR is open, hopeless on Windows) than other possible approaches.

I think that this commit is of utility even if you don't believe in futures. For instance, it allows writing directly to a file without having to deal with the buffering.

I agree that prompt futures are hopeless on Windows, which is why I was not intending to send that commit upstream to you. Regardless, though, I don't understand the objection of "almost certainly won't work while $EDITOR is open" because that's not a place that one would try any sort of prompt, background or not.

With this construction alone, tests will run in the background until they fill either the OS stderr or stdout buffer (4K I think?), which I expect will normally probably take longer than answering a prompt will.

We can easily fill 4k of output buffer in 21s. Practically, we don't, because we write test output to a file on disk, but I am extremely waring of hanging logic on this assumption.

We could make this more nuanced with, e.g., phutil_console_prompt(..., $futures) or some less-goofy construct without needing to pass actual fds.

For a concrete point of comparison, P2074 is the change that builds on this one. We then use FutureIterator to run this in parallel with the multiple unit test futures.

P2074 is the change that builds on this one

Aaah, I misunderstood what you were trying to build here.

In the past, we built this feature this way:

arc lint and arc unit are run as subprocesses, with a --recon flag.
The --recon flag took over stdin, stdout and stderr and converted them all to protocol frames on stdout:

{"type": "stdout", "content": "Ran lint AAA"}\n
{"type": "stdout", "content": "Ran lint BBB"}\n
{"type": "prompt", "content": "Apply lint patch QQQ?"}\n

The parent process basically did client/server, and sent message frames back to the subprocess to indicate the user's answer.

I imagined you were trying to do something similar, not put the prompt itself in a subprocess.

Out of curiosity, why did that change?

It was really complicated, a weird, questionably-well-thought-through change out of Facebook and only clearly useful to Facebook, and it caused a lot of undiagnosable hangs (some were eventually diagnosed as an xdebug issue I think, D7748 may have resolved others). T4281 has some more. Partly, I think the approach accepted a large amount of unnecessary complexity to try to avoid any workflow disruption.

I think ConsoleConfirmFuture is reasonable, but it can be written without any of this and without executing a subprocess (just feof()/fread() stdin, I think?). The cost will be that history with "up" and "down" won't work for "y" and "n". This seems like a reasonable cost since this feature is silly and not used anyway.

Do you need ConsolePromptFuture to do subprocesses? It's only for "excuses", I think? Could we just move the excuse phase to the very end instead (so it doesn't need to be a future), or drop the history feature from it (or replace it with "press enter to use previous answer")? Or possibly figure out some way to turn off line buffering on php://input/STDIN and just implement ↑ and ↓ history ourselves (here be dragons, but it looks like stty <stuff> can swap the tty to character mode: http://php.net/manual/en/function.fgetc.php#103045)?

To the degree that output buffers filling up is an issue we could also add some kind of --buffer flag to the subprocesses that buffers output in the process until the parent tells it that it's ready via stdin or signal, or have the subprocesses use nonblocking I/O to write to stdout. I'm less eager about these approaches, but they'll work if we can't read streams while in $EDITOR.

I'm also not really married to (or particularly fond of) the "excuse" feature at all, and it has become more obsolete as we've added support for server-side tests. Draft mode (T2543, run server-side tests before notifying reviewers by default) may further erode its utility.

In D18606#227169, @epriestley wrote:

P2074 is the change that builds on this one

Aaah, I misunderstood what you were trying to build here.

Sorry for not being clearer!

In the past, we built this feature this way: [snip]

Aha -- interesting.

I think ConsoleConfirmFuture is reasonable, but it can be written without any of this and without executing a subprocess (just feof()/fread() stdin, I think?). The cost will be that history with "up" and "down" won't work for "y" and "n". This seems like a reasonable cost since this feature is silly and not used anyway.

Do you need ConsolePromptFuture to do subprocesses? It's only for "excuses", I think? Could we just move the excuse phase to the very end instead (so it doesn't need to be a future), or drop the history feature from it (or replace it with "press enter to use previous answer")? Or possibly figure out some way to turn off line buffering on php://input/STDIN and just implement ↑ and ↓ history ourselves (here be dragons, but it looks like stty <stuff> can swap the tty to character mode: http://php.net/manual/en/function.fgetc.php#103045)?

ConsolePromptFuture existed solely to be able to be the base class of ConsoleConfirmFuture -- we have no plans to use it directly. We can certainly do away with ConsolePromptFuture and go for a more straightforward ConsoleConfirmFuture using feof/fread, but in the short term I wanted as straightforward a translation as possible, since I wasn't certain if this was something you all were interested in at all.

I'm also not really married to (or particularly fond of) the "excuse" feature at all, and it has become more obsolete as we've added support for server-side tests. Draft mode (T2543, run server-side tests before notifying reviewers by default) may further erode its utility.

I agree that most of our users view it as actively detrimental ("I came back after an hour to find it at a prompt for something nobody cares about"), and since the reason is not terribly high-visibility in Differential, it adds little value. I think we would be in favor of removing it.

Cool, sounds like we're pretty much on the same page, then. Basically, I'm supportive of pursuing this but I'd rather do a "high workflow disruption, low cleverness" change than a "tons of cleverness so we don't have to change any behaviors" sort of change. I don't think the current behavior is sacrosanct and we can probably have a much larger impact by disrupting the current workflow a bit (or a lot).

Another possible thing here is that we currently run unit serially after lint. The reason for this is that lint may change code in a material way (patch + commit), and possibly change the output of unit. If you're willing to say "lint doesn't change the result of tests" or "lint changes the result of tests so rarely that we don't care", or "we'll mark specific lint messages as non-test-impacting, even when they change the code" we could parallelize lint and uint. We could also parallelize them optimistically, and throw the unit progress away only after finding that lint has actually made a change.

In the short term, I'm happy to bring a non-subprocess ConsoleConfirmFuture upstream if that's a useful building block for you. In the mid-term, we could plan an upstream pathway toward roughly "parallelize all steps of arc diff as much as possible at every stage of the workflow" (maybe file a Support issue)?

We could even parallelize "excuse" if we retained the feature: there's no reason arc couldn't send a revision to reviewers first, then prompt and upload an excuse later, especially once the "Draft" state is built. Or wait for, say, 15 seconds for an excuse and then move on in the background even if the prompt is still active.

Also, thanks for sharing those charts earlier -- they're pretty compelling evidence that this is more than a minor annoyance.

In D18606#227322, @epriestley wrote:

Cool, sounds like we're pretty much on the same page, then. Basically, I'm supportive of pursuing this but I'd rather do a "high workflow disruption, low cleverness" change than a "tons of cleverness so we don't have to change any behaviors" sort of change. I don't think the current behavior is sacrosanct and we can probably have a much larger impact by disrupting the current workflow a bit (or a lot).

:nod:

Another possible thing here is that we currently run unit serially after lint. The reason for this is that lint may change code in a material way (patch + commit), and possibly change the output of unit. If you're willing to say "lint doesn't change the result of tests" or "lint changes the result of tests so rarely that we don't care", or "we'll mark specific lint messages as non-test-impacting, even when they change the code" we could parallelize lint and uint. We could also parallelize them optimistically, and throw the unit progress away only after finding that lint has actually made a change.

lint can run in parallel with unit as long as it collects the autofixes and doesn't carry through on applying them until after unit has completed.

In the short term, I'm happy to bring a non-subprocess ConsoleConfirmFuture upstream if that's a useful building block for you.

Sure. I think I'd abandon this diff and open a support issue for "implement this interface this as thou wilt" since we're running this locally, and we have enough high-priority other work on our plates right now that it is likely more efficient to offload the upstream-maintainable solution to you all.

In the mid-term, we could plan an upstream pathway toward roughly "parallelize all steps of arc diff as much as possible at every stage of the workflow" (maybe file a Support issue)?

We'd probably be in support of such an issue; arc diff time is something that our developers feel a small-to-medium pain from. However, we may also be looking into building a LintEngine that essentially just passes control off to Python, which manages some paralleism itself -- mostly because folks are more comfortable writing simple linters in Python, and we have existing infrastructure to make those linters more testable, etc.

Also, thanks for sharing those charts earlier -- they're pretty compelling evidence that this is more than a minor annoyance.

For reference, we're currently logging metrics for every arc run on:

Time spent at a prompt
Time spent at an editor
Time spent doing neither ("compute time")
Time running git, with breakdowns by command
Time spent on Phabricator API calls
Time spent on calls to our own network endpoints
Time spent on each linter
Which linters are shown to users, and which of those are still unresolved when code is landed
Estimates of arc unit time
Actual arc unit time

With a few thousand runs of arc per day, there's some interesting data to be gleaned.

I should clean up and upstream our profiling hooks for prompt/editor timings, as those may be generally applicable. The metrics collection itself is fairly infrastructure-specific and thus not really generalizable, though.

lint can run in parallel with unit as long as it collects the autofixes and doesn't carry through on applying them until after unit has completed.

This is one issue, but just to underscore the other issue, I might write some code like this:

assert_equal(1 + 1, 2);

I run arc diff, and it runs arc lint and arc unit in parallel. They come back and arc unit says "all tests passed". arc lint says "one autofix". So far so good.

That autofix replaces all instances of 2 with "two" ("to improve readability"), so we apply it and the code now reads:

assert_equal(1 + 1, "two");

Then we send that to Differential. Now we've uploaded code with a failing test, alongside metadata that says "this code passes all tests".

If you're willing to say "all our suggested patches are actually sane, I double-dog-swear" we don't have to worry about this. But the space of all possible things that lint can do includes emitting patches which change code in a way that breaks tests.

epriestley mentioned this in T12996: Parallelize substages of `arc diff`.Sep 25 2017, 4:51 PM