Split large path lists into blocks when linting
ClosedPublic
Actions

Authored by epriestley on Apr 22 2015, 10:41 AM.

Details

Reviewers

joshuaspence

Maniphest Tasks

T5097: `arc lint --everything` uses a large amount of memory in when executed in large repositories

Commits

rARCf4aadb96048b: Split large path lists into blocks when linting

Summary

Fixes T5097. When linting a large list of paths (e.g., with --everything), do this internally:

$chunks = array_chunk($paths, 32);
foreach ($chunks as $chunk) {
  $this->lintChunk($chunk);
}

This keeps the advantages of parallelism and artifact sharing for future-based linters, without having memory usage grow in an unbounded way.

These callbacks changed:

willLintPath(): Useless, no meaningful implementations. Internalized the required side effect and broke the hook.
didRunLinters(): Now useless, with no meaningful implementations. Broke the hook.
didLintPaths(): New hook which executes opposite willLintPaths().
lintPath(): Linters no longer need to implement this method.

XHPAST now has an explicit way to release shared futures.

These minor changes also happened:

Formalized the "linter ID", which is a semi-durable identifier for the cache.
Removed linter -> exception explicit mapping, which was unused. We now just collect exceptions.
We do the canRun() checks first (and separately) now.
Share more service call profiling code.
Fix an issue where the test harness would use the path on disk, even if configuration set a different path.

Test Plan

Ran arc lint --everything in arcanist/.
- With no chunking, saw unstable memory usage with a peak at 941 MB.
- With chunk size 32, saw stable memory usage with a peak at 269 MB.
- With chunk size 8, saw stable memory usage with a peak at 180 MB.
Ran with --trace and saw profiling information.
Created this diff.

Diff Detail

Repository

rARC Arcanist

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley updated this revision to Diff 30015.Apr 22 2015, 10:41 AM

epriestley retitled this revision from to Split large path lists into blocks when linting.

epriestley updated this object.

epriestley edited the test plan for this revision. (Show Details)

epriestley added a reviewer: joshuaspence.

epriestley added a task: T5097: `arc lint --everything` uses a large amount of memory in when executed in large repositories.

Herald added a subscriber: epriestley. · View Herald TranscriptApr 22 2015, 10:41 AM

Small reorganization of new functions.

A couple of minor inlines.

I tested this on rP and noticed one unfortunate performance issue.

> time arc lint --everything --never-apply-patches
real    19m26.750s
user    16m48.589s
sys 3m58.260s

> time git ls-tree -r --name-only HEAD | xargs -n128 arc lint --lintall --never-apply-patches
real    9m28.819s
user    9m20.757s
sys 1m7.596s

src/lint/engine/ArcanistLintEngine.php
256	This is essentially just `mfilter($linters, 'canRun')`.
517	Maybe add `assert_instances_of($runnable, 'ArcanistLinter')`
519	We can worry about this later, but it might be nice to expose the chunk size as a parameter. Particularly for repository with very large files, or machines with very limited memory.
610	There could be multiple linters of the same class configured, so maybe it would be better to use `$linter->getID()`?
src/lint/linter/ArcanistFutureLinter.php
22–53	I'm not sure I understand this change (just adding `return`).
53	It probably doesn't provide much value since subclasses could override this method, but we could add `assert_instances_of($futures, 'Future')` here.
src/lint/linter/ArcanistLinter.php
118	Should be `@{class:ArcanistLintEngine}`
133	As above.
208	Missing `final`

This revision is now accepted and ready to land.Apr 22 2015, 11:59 AM

epriestley marked 3 inline comments as done.Apr 22 2015, 12:12 PM

epriestley added inline comments.

src/lint/engine/ArcanistLintEngine.php
519	Yeah, I think at some point we could add some `getChunkSizeLimit()` method to `Linter` and use the minimum value from all of the runnable linters, with the default returning `null` to mean "use whatever the default behavior is".
610	Linter IDs aren't always meaningful and `--trace` is primarily consumed by humans, but it's possible that there are reasons why that might be a better choice.
src/lint/linter/ArcanistFutureLinter.php
22–53	purely stylistic

Fix typos/references/missing final.

Ghost inlines feel pretty good on this diff.

I can probably do something about the performance, but don't have an XHProf build for whatever version of PHP I have on the CLI at the moment so the --xprofile flag isn't active. I'll take a look at that when I get a chance. 2X isn't good but is a step up from "runs out of RAM and exits".

Closed by commit rARCf4aadb96048b: Split large path lists into blocks when linting (authored by epriestley, committed by epriestley). · Explain WhyApr 22 2015, 12:16 PM

This revision was automatically updated to reflect the committed changes.

I'm playing around with profiling this now... is there a way to load data into the XHProf application?

Hmm... I uploaded my profile as F378852 but https://secure.phabricator.com/xhprof/profile/PHID-FILE-4jauqk6sgw6slmtsn3uy/ throws a "Failed to unserialize XHProf profile!" exception because its not valid JSON.

OK. I had to unserialize the data and then JSON encode it (F378863). See https://secure.phabricator.com/xhprof/profile/PHID-FILE-wpoaazl2e6jisrepgadf/

That profile just shows us choosing to run a different arc binary because you're running an arc command in an arc working copy, but not the arc from the local directory, so we run that one instead. :)