Page MenuHomePhabricator

Add a `didLintPaths` function to the `ArcanistLinter` class.
AbandonedPublic

Authored by joshuaspence on May 18 2014, 2:10 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 9, 5:44 PM
Unknown Object (File)
Sat, Nov 9, 7:34 AM
Unknown Object (File)
Oct 25 2024, 1:12 PM
Unknown Object (File)
Oct 16 2024, 12:58 PM
Unknown Object (File)
Sep 22 2024, 2:45 PM
Unknown Object (File)
Sep 15 2024, 10:42 AM
Unknown Object (File)
Sep 12 2024, 9:30 AM
Unknown Object (File)
Sep 12 2024, 9:30 AM
Subscribers

Details

Summary

Fixes T5097. Currently, running arc lint --everything over a large database could OOM if the ArcanistXHPASTLinter is used because the XHPAST parse tree for each file is stored in memory, and is not cleared from memory even after the affected paths have finished linting.

Test Plan

Ran arc lint --everything over rP. It is still running (it is very slow, currently it has been running for 27 minutes), but it has been executing for longer than it would have without this change. More importantly, the memory usage on my machine is reasonably stable.

Diff Detail

Repository
rARC Arcanist
Branch
did_lint_path
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 625
Build 625: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

joshuaspence retitled this revision from to Add a `didLintPaths` function to the `ArcanistLinter` class..
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.

Here are some plots showing the memory usage before and after. The memory usage was measured using memory_get_usage() in ArcanistFutureLinter::didRunLinters (immediately after $this->didLintPath($path)).

didLintPaths.png (340×604 px, 26 KB)

@epriestley, what do you think? I'm bumping this because its blocking us from using arc lint as part of our CI process.

That blue line looks like it ends up just as bad as the orange one / is not stable?

Let me pop open the code and convince myself this is correct, on casual examination it looks like it might free trees too quickly.

joshuaspence edited edge metadata.

Add some echo statements to help illustrate the effect of this change.

@epriestley, I think that you are partially right. I added in some echo calls for debugging (diff updated to show this) and I get the following output from running arc lint on this branch:

> arc lint
didLintPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
didLintPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
getXHPASTTreeForPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/engine/ArcanistLintEngine.php
getXHPASTTreeForPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
getXHPASTTreeForPath: src/lint/engine/ArcanistLintEngine.php
didLintPath: src/lint/engine/ArcanistLintEngine.php
getXHPASTTreeForPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistFutureLinter.php
didLintPath: src/lint/linter/ArcanistFutureLinter.php
getXHPASTTreeForPath: src/lint/linter/ArcanistLinter.php
didLintPath: src/lint/linter/ArcanistLinter.php
 OKAY  No lint warnings.

The call to didLintPath in ArcanistLintEngine doesn't seem quite right.

Yeah ok... I don't think this will work. Currently, a linter is run on all of the paths before the next linter is invoked (sort of like a depth-first search). Since the XHPAST trees are reused amongst linters, it's not possible to free the trees until after all linters have run. To fix this, we would need to run all linters over a given path before moving to the next path (breadth first). I'm not sure if this would be easy to do.

Yeah, I think FutureLinters will still hold most of their results until the end. I think the best low-ish effort approach might be to handle the paths in chunks of, say, 128 paths. This will cause some unnecessary slowdown where we aren't running at full parallelism at the end of a chunk, but a number like 128 is high enough that normal, non---everything diffs won't hit it.

The LintWorkflow and LintEngine code is also really thorny right now. I started cleaning it up yesterday when looking at this, but didn't make much progress.