Page MenuHomePhabricator

Split large path lists into blocks when linting

Authored by epriestley on Apr 22 2015, 10:41 AM.



Fixes T5097. When linting a large list of paths (e.g., with --everything), do this internally:

$chunks = array_chunk($paths, 32);
foreach ($chunks as $chunk) {

This keeps the advantages of parallelism and artifact sharing for future-based linters, without having memory usage grow in an unbounded way.

These callbacks changed:

  • willLintPath(): Useless, no meaningful implementations. Internalized the required side effect and broke the hook.
  • didRunLinters(): Now useless, with no meaningful implementations. Broke the hook.
  • didLintPaths(): New hook which executes opposite willLintPaths().
  • lintPath(): Linters no longer need to implement this method.

XHPAST now has an explicit way to release shared futures.

These minor changes also happened:

  • Formalized the "linter ID", which is a semi-durable identifier for the cache.
  • Removed linter -> exception explicit mapping, which was unused. We now just collect exceptions.
  • We do the canRun() checks first (and separately) now.
  • Share more service call profiling code.
  • Fix an issue where the test harness would use the path on disk, even if configuration set a different path.
Test Plan
  • Ran arc lint --everything in arcanist/.
    • With no chunking, saw unstable memory usage with a peak at 941 MB.
    • With chunk size 32, saw stable memory usage with a peak at 269 MB.
    • With chunk size 8, saw stable memory usage with a peak at 180 MB.
  • Ran with --trace and saw profiling information.
  • Created this diff.

Diff Detail

rARC Arcanist
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley retitled this revision from to Split large path lists into blocks when linting.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: joshuaspence.
  • Small reorganization of new functions.
joshuaspence accepted this revision.Apr 22 2015, 11:59 AM
joshuaspence edited edge metadata.

A couple of minor inlines.

I tested this on rP and noticed one unfortunate performance issue.

> time arc lint --everything --never-apply-patches
real    19m26.750s
user    16m48.589s
sys 3m58.260s

> time git ls-tree -r --name-only HEAD | xargs -n128 arc lint --lintall --never-apply-patches
real    9m28.819s
user    9m20.757s
sys 1m7.596s

This is essentially just mfilter($linters, 'canRun').


Maybe add assert_instances_of($runnable, 'ArcanistLinter')


We can worry about this later, but it might be nice to expose the chunk size as a parameter. Particularly for repository with very large files, or machines with very limited memory.


There could be multiple linters of the same class configured, so maybe it would be better to use $linter->getID()?


I'm not sure I understand this change (just adding return).


It probably doesn't provide much value since subclasses could override this method, but we could add assert_instances_of($futures, 'Future') here.


Should be @{class:ArcanistLintEngine}


As above.


Missing final

This revision is now accepted and ready to land.Apr 22 2015, 11:59 AM
epriestley marked 3 inline comments as done.Apr 22 2015, 12:12 PM
epriestley added inline comments.

Yeah, I think at some point we could add some getChunkSizeLimit() method to Linter and use the minimum value from all of the runnable linters, with the default returning null to mean "use whatever the default behavior is".


Linter IDs aren't always meaningful and --trace is primarily consumed by humans, but it's possible that there are reasons why that might be a better choice.


purely stylistic

epriestley edited edge metadata.
  • Fix typos/references/missing final.

Ghost inlines feel pretty good on this diff.

I can probably do something about the performance, but don't have an XHProf build for whatever version of PHP I have on the CLI at the moment so the --xprofile flag isn't active. I'll take a look at that when I get a chance. 2X isn't good but is a step up from "runs out of RAM and exits".

This revision was automatically updated to reflect the committed changes.

I'm playing around with profiling this now... is there a way to load data into the XHProf application?

Hmm... I uploaded my profile as F378852 but throws a "Failed to unserialize XHProf profile!" exception because its not valid JSON.

That profile just shows us choosing to run a different arc binary because you're running an arc command in an arc working copy, but not the arc from the local directory, so we run that one instead. :)

(But the profile itself is mechanically correct and would otherwise be useful.)