Parallelize substages of `arc diff`
Open, NormalPublic
Actions

Assigned To

Authored By

	epriestley
	Sep 25 2017, 4:51 PM

Description

See PHI84. See T4281 for earlier context. See D18606 for some additional discussion.

When we run arc diff, we typically spend time doing these things, more or less sequentially:

Prompting the user to answer various questions.
Launching $EDITOR to ask the user for a commit message or update message.
Running linters (roughly arc lint).
Running unit tests (roughly arc unit).

Currently, these steps run sequentially. In theory, much of this work can happen in parallel instead.

T4281 was an earlier effort to parallelize some of this work. It relied on using the parent process as a sort of server, and the subprocesses as sort of clients, and letting the "server" give the "clients" access to the terminal when they needed to prompt the user. This was very "clever" and also very fragile and hard to debug (i.e., users reported unreproducible hangs until I ripped the whole thing out). Although we may have fixed some of the problems with this model in the meantime, some of the problems are also fairly fundamental.

Currently, there are some legitimate dependencies between these steps. Some of these we can likely extract by adjusting how the workflow works; others may require more finesse.

In particular, lint can emit any kind of patch, and may instruct arc to edit files in a way which changes their behavior (for example, you can write a linter which replaces every file with a haiku about pasta). If lint modifies files, unit tests which passed before the modifications may fail after the modifications. If we run lint and unit in parallel, then apply the lint fixes, and do not re-run unit tests, we may upload a bad change with metadata that says "tests pass". Realistically, it is generally safe to assume that lint does not break unit tests, but we can't be certain this is true in the general case.

A couple of general technical capability questions:

Can we actually write a PhutilConfirmFuture which doesn't block? (The answer should be "yes".)
Can we avoid filling the output buffer in subprocesses by testing if writes to stdout would block? (This should also be a "yes".)

I broadly expect to:

Restructure lint and unit so they can operate in a --subprocess sort of mode which just hands back the results without acting on them.
Do stdout testing in those subprocesses to prevent stalling on the stdout buffer.
Keep all command-and-control logic in the main process. We can switch this to futures where it makes sense, but if the subprocesses don't block this doesn't really matter.

As with other infrastructure changes, this is probably 30 diffs that do nothing and then five lines of actual changes.

I think this is not directly adjacent to other planned arc refactoring, although at least some amount of cleanup is likely inevitable and that should further the cause of T10038, etc.

Revisions and Commits

rPHU libphutil
	D18646	rPHU63518a53a9bb Provide ConsoleView classes for "[ OKAY ] Good things happened." console lines
rARC Arcanist
	D19706	rARCdf2c1ba912d1 [Wilds] Provide a skeleton for prompt behaviors
	D18645	rARC7af60b07cb4d Modularize "arc lint" renderers
	D18644	rARCa053fcc4d51b Remove "arc lint --only-changed"
	D18643	rARCbe7987b25a82 Remove "arc lint --cache"
	D18642	rARC074dd8f3a6ec Remove "arc lint --only-new"
	D18641	rARC3453ef835800 Remove "async lint" from `arc lint`
	D18640	rARCd446517d5edd Remove "arc diff --no-diff"

Related Objects

Mentioned In: T8971: Auto-fix patch is wrong when multiple issue affect the same line
T13098: Plans: Arcanist toolsets and extensions
D18654: Support automatic "Depends On" detection in Mercurial
T11343: Generate default "Depends on" line in commit message when multiple diffs are stacked
T4281: Restore background linting and unit tests for performance
Mentioned Here: D18606: Allow setting file descriptors on ExecFutures
T4281: Restore background linting and unit tests for performance
T10038: Plan the mid-term pathway for unit-test/linter bindings

Event Timeline

epriestley created this task.Sep 25 2017, 4:51 PM

Herald added a subscriber: eadler. · View Herald TranscriptSep 25 2017, 4:51 PM

epriestley mentioned this in T4281: Restore background linting and unit tests for performance.Sep 25 2017, 4:52 PM

epriestley added a revision: D18640: Remove "arc diff --no-diff".Sep 25 2017, 5:50 PM

Can we avoid filling the output buffer in subprocesses by testing if writes to stdout would block? (This should also be a "yes".)

This seems to work as expected. I wrote a block.php and a work.php:

block.php

<?php

while (true) {
  sleep(1);
}

work.php

<?php

require_once 'scripts/init/init-script.php';

$nonblocking = !empty($argv[1]);

if ($nonblocking) {
  $stdout = fopen('php://stdout', 'wb');
  stream_set_blocking($stdout, false);
}

while (true) {
  $work = str_repeat('x', 4096);
  if ($nonblocking) {
    phutil_fwrite_nonblocking_stream($stdout, $work);
  } else {
    echo $work;
  }

  fprintf(STDERR, '.');
}

When invoked as php -f work.php | php -f block.php, a few dots are printed out and then things hang forever, as expected: the stdout buffer fills up and the work.php process blocks in echo.

When invoked as php -f work.php -- 1 | php -f block.php to enable the nonblocking test, work does not hang.

When either variant of work.php is run standalone, their behavior is identical.

So we should be able to do something like this in lint and unit subprocesses to let them keep working even if the parent process is busy prompting the user in a blocking way or waiting for $EDITOR to exit, without needing a client/server level of cleverness.

epriestley added a revision: D18641: Remove "async lint" from `arc lint`.Sep 25 2017, 6:08 PM

epriestley added a revision: D18642: Remove "arc lint --only-new".Sep 25 2017, 6:17 PM

epriestley added a revision: D18643: Remove "arc lint --cache".Sep 25 2017, 6:22 PM

epriestley added a revision: D18644: Remove "arc lint --only-changed".Sep 25 2017, 6:39 PM

epriestley added a revision: D18645: Modularize "arc lint" renderers.Sep 25 2017, 9:48 PM

epriestley added a revision: D18646: Provide ConsoleView classes for "[ OKAY ] Good things happened." console lines.Sep 26 2017, 2:08 AM

epriestley added a commit: rPHU63518a53a9bb: Provide ConsoleView classes for "[ OKAY ] Good things happened." console lines.Sep 26 2017, 2:38 AM

joshuaspence added a subscriber: joshuaspence.Sep 26 2017, 8:54 AM

ftdysa added a subscriber: ftdysa.Sep 26 2017, 5:11 PM

I think I'm going to land this stuff on the experimental branch since there's also some other related work which is better targeted there, and things are probably going to get worse before they get better.

epriestley added a commit: rARCd446517d5edd: Remove "arc diff --no-diff".Sep 27 2017, 3:17 PM

epriestley added a commit: rARC3453ef835800: Remove "async lint" from `arc lint`.

epriestley added a commit: rARC074dd8f3a6ec: Remove "arc lint --only-new".Sep 27 2017, 3:21 PM

epriestley added a commit: rARCbe7987b25a82: Remove "arc lint --cache".

epriestley added a commit: rARCa053fcc4d51b: Remove "arc lint --only-changed".

epriestley mentioned this in T11343: Generate default "Depends on" line in commit message when multiple diffs are stacked.Sep 27 2017, 5:10 PM

epriestley added a commit: rARC7af60b07cb4d: Modularize "arc lint" renderers.Sep 27 2017, 5:24 PM

epriestley mentioned this in D18654: Support automatic "Depends On" detection in Mercurial.Oct 2 2017, 2:06 PM

sakura added a subscriber: sakura.Oct 2 2017, 8:09 PM

alexmv added a subscriber: alexmv.Jan 16 2018, 4:16 PM

epriestley mentioned this in T13098: Plans: Arcanist toolsets and extensions.Mar 5 2018, 2:04 PM

scp awarded a token.Jun 27 2018, 4:20 PM