Page MenuHomePhabricator

Split large path lists into blocks when linting
ClosedPublic

Authored by epriestley on Apr 22 2015, 10:41 AM.

Details

Summary

Fixes T5097. When linting a large list of paths (e.g., with --everything), do this internally:

$chunks = array_chunk($paths, 32);
foreach ($chunks as $chunk) {
  $this->lintChunk($chunk);
}

This keeps the advantages of parallelism and artifact sharing for future-based linters, without having memory usage grow in an unbounded way.

These callbacks changed:

  • willLintPath(): Useless, no meaningful implementations. Internalized the required side effect and broke the hook.
  • didRunLinters(): Now useless, with no meaningful implementations. Broke the hook.
  • didLintPaths(): New hook which executes opposite willLintPaths().
  • lintPath(): Linters no longer need to implement this method.

XHPAST now has an explicit way to release shared futures.

These minor changes also happened:

  • Formalized the "linter ID", which is a semi-durable identifier for the cache.
  • Removed linter -> exception explicit mapping, which was unused. We now just collect exceptions.
  • We do the canRun() checks first (and separately) now.
  • Share more service call profiling code.
  • Fix an issue where the test harness would use the path on disk, even if configuration set a different path.
Test Plan
  • Ran arc lint --everything in arcanist/.
    • With no chunking, saw unstable memory usage with a peak at 941 MB.
    • With chunk size 32, saw stable memory usage with a peak at 269 MB.
    • With chunk size 8, saw stable memory usage with a peak at 180 MB.
  • Ran with --trace and saw profiling information.
  • Created this diff.

Diff Detail

Repository
rARC Arcanist
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley retitled this revision from to Split large path lists into blocks when linting.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: joshuaspence.
  • Small reorganization of new functions.
joshuaspence accepted this revision.Apr 22 2015, 11:59 AM
joshuaspence edited edge metadata.

A couple of minor inlines.

I tested this on rP and noticed one unfortunate performance issue.

> time arc lint --everything --never-apply-patches
real    19m26.750s
user    16m48.589s
sys 3m58.260s

> time git ls-tree -r --name-only HEAD | xargs -n128 arc lint --lintall --never-apply-patches
real    9m28.819s
user    9m20.757s
sys 1m7.596s
src/lint/engine/ArcanistLintEngine.php
256

This is essentially just mfilter($linters, 'canRun').

517

Maybe add assert_instances_of($runnable, 'ArcanistLinter')

519

We can worry about this later, but it might be nice to expose the chunk size as a parameter. Particularly for repository with very large files, or machines with very limited memory.

610

There could be multiple linters of the same class configured, so maybe it would be better to use $linter->getID()?

src/lint/linter/ArcanistFutureLinter.php
22–53

I'm not sure I understand this change (just adding return).

53

It probably doesn't provide much value since subclasses could override this method, but we could add assert_instances_of($futures, 'Future') here.

src/lint/linter/ArcanistLinter.php
118

Should be @{class:ArcanistLintEngine}

133

As above.

208

Missing final

This revision is now accepted and ready to land.Apr 22 2015, 11:59 AM
epriestley marked 3 inline comments as done.Apr 22 2015, 12:12 PM
epriestley added inline comments.
src/lint/engine/ArcanistLintEngine.php
519

Yeah, I think at some point we could add some getChunkSizeLimit() method to Linter and use the minimum value from all of the runnable linters, with the default returning null to mean "use whatever the default behavior is".

610

Linter IDs aren't always meaningful and --trace is primarily consumed by humans, but it's possible that there are reasons why that might be a better choice.

src/lint/linter/ArcanistFutureLinter.php
22–53

purely stylistic

epriestley edited edge metadata.
  • Fix typos/references/missing final.

Ghost inlines feel pretty good on this diff.

I can probably do something about the performance, but don't have an XHProf build for whatever version of PHP I have on the CLI at the moment so the --xprofile flag isn't active. I'll take a look at that when I get a chance. 2X isn't good but is a step up from "runs out of RAM and exits".

This revision was automatically updated to reflect the committed changes.

I'm playing around with profiling this now... is there a way to load data into the XHProf application?

Hmm... I uploaded my profile as F378852 but https://secure.phabricator.com/xhprof/profile/PHID-FILE-4jauqk6sgw6slmtsn3uy/ throws a "Failed to unserialize XHProf profile!" exception because its not valid JSON.

That profile just shows us choosing to run a different arc binary because you're running an arc command in an arc working copy, but not the arc from the local directory, so we run that one instead. :)

(But the profile itself is mechanically correct and would otherwise be useful.)