Page MenuHomePhabricator

[Wilds] Make "arc" survive all tests with no failures

Authored by epriestley on Sep 27 2018, 9:38 PM.



Ref T13098. The lint part of this barely gets all the pieces cobbled together, but all the tests do actually run now, minus a handful of skipped tests. Major ideas here:

Lint, Unit and (soon) Build are becoming "Operations" where an "Overseer" manages a list of "Engines" that read configuration from a ".arc<operation>" file, operate on a working copy, may operate on a subset of files (possibly selected by examining recent changes in the working copy), produce some kind of result (test outcomes, lint messages, build artifacts) and then the results are sent to a list of one or more "Sinks" (console display, files, Harbormaster, patchers). All three workflows share a meaningful amount of code in terms of doing most of these things in roughly the same way.

A lot of Lint logic is currently based around passing paths around as strings, then calling a lot of $thing->getParentObject()->getInformationForPath($path_as_string). This tends to be pretty bulky and somewhat fragile. I'm moving toward passing an ArcanistWorkingCopyPath $path around instead, which has methods with a better defined scope like $path->getData(), $path->getMimeType(), $path->isBinary(), and so on. This requires us to build fewer objects to run tests.

arc lint itself won't run yet, and I don't plan to make it run for a bit (I have some of a patch, but it's not really there yet). Instead, I want to focus on getting coverage for existing flows (particularly alias) and then getting Windows support online now that it can have test coverage.

This change is less about any of what it actually does and more about powering through to make arc unit a useful tool for building other changes.

Test Plan


Diff Detail

rARC Arcanist
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley requested review of this revision.Sep 27 2018, 9:38 PM
epriestley created this revision.
epriestley marked 5 inline comments as done.Sep 27 2018, 9:45 PM
epriestley added inline comments.

This is a good example of the new approach vs the old approach -- I think the new one is quite a bit cleaner and easier to deal with.


Note that we no longer need any of: a working copy; or a configuration manager; or a "testable" engine to test linters. Although some other simplifications also help permit this, switching to ArcanistWorkingCopyPath as a standalone object is the biggest enabler since linters no longer need to rely on being able to call through chains of parent objects -- like $this->getEngine()->getWorkingCopy()->getRepositoryAPI()->getFileData(...) or whatever -- to pull data.


I'm saving the stuff I'm deleting in a file, but I'd generally like to get more documentation like this into the actual documentation and not in the source code later on.

The actual documentation is now wrong, e.g. I've removed --engine.


This moved to ArcanistWorkingCopyPath.

248–250 ↗(On Diff #47110)

For debugging a weird thing, I reverted this locally.

amckinley accepted this revision.Sep 27 2018, 10:22 PM
amckinley added inline comments.

I didn't know this was a default lint rule, but I'm so glad to see it.


This is gone because we're no longer subclassing ArcanistFutureLinter, right?


Why can't we just invoke Filesystem::getMimeType() on the real file instead of copying it? And if we do have to copy it, maybe we should only copy the first couple hundred bytes?

This revision is now accepted and ready to land.Sep 27 2018, 10:22 PM
epriestley marked 2 inline comments as done.Sep 28 2018, 2:45 PM
epriestley added inline comments.

Right, I basically replaced a whole big fancy Future mechanism with "run the futures one at a time" to get this working. This likely causes a significant performance regression in arc lint, but it doesn't run yet so that's moot for now. I expect to restore futures later, but hope to find a simpler implementation. It doesn't matter for unit tests since unit tests don't run in parallel with one another (which seems like a good thing), so the performance is the same either way.

The implementation was actually doing two different things:

  1. running multiple copies of the xhpast binary in parallel to improve performance;
  2. sharing the generated parse trees between different linters, so if a third-party linter uses XHPAST trees it doesn't need to re-analyze files.

But the way (2) got implemented was very messy and very "make it work without changing the architecture". Once I move my focus to arc lint I expect to revisit this stuff and try to find a cleaner approach for it.


I'd ideally like (most of?) the unit tests to be able to work without having to touch disk, or even have a real file on disk at all. This change moves us some of the way toward that: we're still writing files to disk but most of (none of?) the current linters actually touch them, and we could probably stop soon or in the future without too much effort.

Today, some other tests work by building an actual git repository in tmp/ and effectively running arc inside it. Although this is the only realistic way to do some things, I think way more tests than necessary currently rely on a lot more external state than they need.

Later, I expect WorkingCopyPath to also be able to represent a real file on disk, and obviously we need to actually write to a real file when patching working copies.

I think Filesystem::getMimeType() could probably also be changed to have a "$bytes" variant, although this might be somewhat tricky. It looks like echo ... | file - works when we fall back to the file binary, and it looks like finfo_file($resource, ...) might be able to take some kind of clever PHP stream reference for the second argument and not require a write to disk.

Historically, we had (and perhaps still have?) some linters which really acted on a whole repository state. These may be trickier, but maybe they really make more sense to decompose as "build + lint" or turn into beautifiers instead of linters.

This is kind of meandering but basically: I do expect to add some code to just use the real file on disk when we have one later on, but I'm also trying to make stuff need less state/context to run.