Fixes T5097. Currently, running arc lint --everything over a large database could OOM if the ArcanistXHPASTLinter is used because the XHPAST parse tree for each file is stored in memory, and is not cleared from memory even after the affected paths have finished linting.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers - Maniphest Tasks
- T5097: `arc lint --everything` uses a large amount of memory in when executed in large repositories
Ran arc lint --everything over rP. It is still running (it is very slow, currently it has been running for 27 minutes), but it has been executing for longer than it would have without this change. More importantly, the memory usage on my machine is reasonably stable.
Diff Detail
- Repository
- rARC Arcanist
- Branch
- did_lint_path
- Lint
Lint Passed - Unit
Tests Passed - Build Status
Buildable 625 Build 625: [Placeholder Plan] Wait for 30 Seconds
Event Timeline
Here are some plots showing the memory usage before and after. The memory usage was measured using memory_get_usage() in ArcanistFutureLinter::didRunLinters (immediately after $this->didLintPath($path)).
@epriestley, what do you think? I'm bumping this because its blocking us from using arc lint as part of our CI process.
That blue line looks like it ends up just as bad as the orange one / is not stable?
Let me pop open the code and convince myself this is correct, on casual examination it looks like it might free trees too quickly.
@epriestley, I think that you are partially right. I added in some echo calls for debugging (diff updated to show this) and I get the following output from running arc lint on this branch:
> arc lint didLintPath: src/lint/engine/ArcanistLintEngine.php didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php didLintPath: src/lint/linter/ArcanistFutureLinter.php didLintPath: src/lint/linter/ArcanistLinter.php didLintPath: src/lint/engine/ArcanistLintEngine.php didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php didLintPath: src/lint/linter/ArcanistFutureLinter.php didLintPath: src/lint/linter/ArcanistLinter.php getXHPASTTreeForPath: src/lint/linter/ArcanistBaseXHPASTLinter.php didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php getXHPASTTreeForPath: src/lint/engine/ArcanistLintEngine.php didLintPath: src/lint/engine/ArcanistLintEngine.php getXHPASTTreeForPath: src/lint/linter/ArcanistFutureLinter.php didLintPath: src/lint/linter/ArcanistFutureLinter.php getXHPASTTreeForPath: src/lint/linter/ArcanistLinter.php didLintPath: src/lint/linter/ArcanistLinter.php getXHPASTTreeForPath: src/lint/engine/ArcanistLintEngine.php didLintPath: src/lint/engine/ArcanistLintEngine.php getXHPASTTreeForPath: src/lint/linter/ArcanistBaseXHPASTLinter.php didLintPath: src/lint/linter/ArcanistBaseXHPASTLinter.php getXHPASTTreeForPath: src/lint/linter/ArcanistFutureLinter.php didLintPath: src/lint/linter/ArcanistFutureLinter.php getXHPASTTreeForPath: src/lint/linter/ArcanistLinter.php didLintPath: src/lint/linter/ArcanistLinter.php OKAY No lint warnings.
The call to didLintPath in ArcanistLintEngine doesn't seem quite right.
Yeah ok... I don't think this will work. Currently, a linter is run on all of the paths before the next linter is invoked (sort of like a depth-first search). Since the XHPAST trees are reused amongst linters, it's not possible to free the trees until after all linters have run. To fix this, we would need to run all linters over a given path before moving to the next path (breadth first). I'm not sure if this would be easy to do.
Yeah, I think FutureLinters will still hold most of their results until the end. I think the best low-ish effort approach might be to handle the paths in chunks of, say, 128 paths. This will cause some unnecessary slowdown where we aren't running at full parallelism at the end of a chunk, but a number like 128 is high enough that normal, non---everything diffs won't hit it.
The LintWorkflow and LintEngine code is also really thorny right now. I started cleaning it up yesterday when looking at this, but didn't make much progress.