Page MenuHomePhabricator

Add a `didLintPaths` function to the `ArcanistLinter` class.
AbandonedPublic

Authored by joshuaspence on May 18 2014, 2:10 PM.

Details

Summary

Fixes T5097. Currently, running arc lint --everything over a large database could OOM if the ArcanistXHPASTLinter is used because the XHPAST parse tree for each file is stored in memory, and is not cleared from memory even after the affected paths have finished linting.

Test Plan

Ran arc lint --everything over rP. It is still running (it is very slow, currently it has been running for 27 minutes), but it has been executing for longer than it would have without this change. More importantly, the memory usage on my machine is reasonably stable.

Diff Detail

Repository
rARC Arcanist
Branch
did_lint_path
Lint
Lint OK
Unit
Unit Tests OK
Build Status
Buildable 625
Build 625: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

joshuaspence retitled this revision from to Add a `didLintPaths` function to the `ArcanistLinter` class..
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.

Here are some plots showing the memory usage before and after. The memory usage was measured using memory_get_usage() in ArcanistFutureLinter::didRunLinters (immediately after $this->didLintPath($path)).

This comment was removed by joshuaspence.

@epriestley, what do you think? I'm bumping this because its blocking us from using arc lint as part of our CI process.

epriestley edited edge metadata.May 20 2014, 10:58 PM

That blue line looks like it ends up just as bad as the orange one / is not stable?

Let me pop open the code and convince myself this is correct, on casual examination it looks like it might free trees too quickly.

joshuaspence edited edge metadata.

Add some echo statements to help illustrate the effect of this change.

@epriestley, I think that you are partially right. I added in some echo calls for debugging (diff updated to show this) and I get the following output from running arc lint on this branch:

> arc lint
didLintPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
didLintPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
getXHPASTTreeForPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/engine/ArcanistLintEngine.php
getXHPASTTreeForPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
getXHPASTTreeForPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/engine/ArcanistLintEngine.php
getXHPASTTreeForPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
 OKAY  No lint warnings.

The call to didLintPath in ArcanistLintEngine doesn't seem quite right.

joshuaspence abandoned this revision.May 21 2014, 11:28 AM

Yeah ok... I don't think this will work. Currently, a linter is run on all of the paths before the next linter is invoked (sort of like a depth-first search). Since the XHPAST trees are reused amongst linters, it's not possible to free the trees until after all linters have run. To fix this, we would need to run all linters over a given path before moving to the next path (breadth first). I'm not sure if this would be easy to do.

Yeah, I think FutureLinters will still hold most of their results until the end. I think the best low-ish effort approach might be to handle the paths in chunks of, say, 128 paths. This will cause some unnecessary slowdown where we aren't running at full parallelism at the end of a chunk, but a number like 128 is high enough that normal, non---everything diffs won't hit it.

The LintWorkflow and LintEngine code is also really thorny right now. I started cleaning it up yesterday when looking at this, but didn't make much progress.