Page MenuHomePhabricator

`arc lint --everything` uses a large amount of memory in when executed in large repositories
Closed, ResolvedPublic

Description

I was running arc lint --apply-patches --everything in rP and it eventually ran out of memory (my laptop has 8GB of RAM) and failed with:

Some linters failed:
    - ArcanistXHPASTLinter: Exception: Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory
    - ArcanistPhutilXHPASTLinter: Exception: Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory

This doesn't seem correct and I suspect that this is due to the way that the output is being rendered (since the output produced by arc lint seems to be buffered and output only once linting has completed).

Event Timeline

joshuaspence assigned this task to epriestley.
joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Arcanist.
joshuaspence added a subscriber: joshuaspence.

I suspect buffering the message isn't a very big deal, but buffering the XHPAST parse trees probably is.

The quick fix here is probably:

  • after all linters finish operating on a path, invoke didFinishLinting($path) or similar;
  • In that method, throw away the parse tree if we're an XHPAST linter.

Just bringing in these related comments:

Yeah ok... I don't think this will work. Currently, a linter is run on all of the paths before the next linter is invoked (sort of like a depth-first search). Since the XHPAST trees are reused amongst linters, it's not possible to free the trees until after all linters have run. To fix this, we would need to run all linters over a given path before moving to the next path (breadth first). I'm not sure if this would be easy to do.

Yeah, I think FutureLinters will still hold most of their results until the end. I think the best low-ish effort approach might be to handle the paths in chunks of, say, 128 paths. This will cause some unnecessary slowdown where we aren't running at full parallelism at the end of a chunk, but a number like 128 is high enough that normal, non---everything diffs won't hit it.

The LintWorkflow and LintEngine code is also really thorny right now. I started cleaning it up yesterday when looking at this, but didn't make much progress.

@epriestley, I guess a way to describe the way things are currently working is as linter-first (lint all paths through a single linter and then progress to the next linter).

Would it be at all reasonable to change this to be path-first (lint a single path through all linters and then progress to the next linter)? It seems to me that this would help here.

In the case where you are running one linter (which is essentially true with Phabricator, since XHPAST dominates performance) that will essentially remove all parallelism, since we'll start one path, wait for it to finish, analyze it, start a second path, wait for it to finish, analyze it, etc. So normal arc lint will get 4x or 8x slower.

In the best case where you have multiple future-based linters they can all run in parallel for each path, which will save a bit of time, but you're still waiting for the slowest one to finish, so this is still probably much worse than the current approach.

The "do it in chunks" approach tries to balance these concerns without requiring us to rewrite everything.

Oh OK, that makes sense...

I guess that really this problem (memory usage) is only an issue for ArcanistBaseXHPASTLinter linters. Specifically, I think that this problem arose in D9059. Do you agree? If this is the case, then we need to find a better compromise between time and space. We are essentially just storing the XHPAST trees so that they don't have to be regenerated later, if I understand correctly.

I think this was an issue before D9059 -- prior to that we still held the tree in memory, the objects just accessed it differently. Particularly, I think T35 (the oldest open bug) is probably largely the same issue.

OK, quite possibly. Surely, however, paying a significant performance penalty is better than exhausting all available RAM?

Well, you can fake chunking with something like this, instead of running --everything directly:

git ls-tree -r --name-only HEAD | xargs -n128 arc lint --lintall --never-apply-patches

So I think this is mostly an ease-of-administration issue: the per-path approach would slow down every user to save administrators a little work in large repositories with memory-hungry linters. That doesn't seem like a great tradeoff. I do think the current behavior is bad, but don't want to make the common behavior way worse for everyone in order to fix it.

Well, you can fake chunking with something like this, instead of running --everything directly:

git ls-tree -r --name-only HEAD | xargs -n128 arc lint --lintall --never-apply-patches

Almost, but not quite. In my particular application, the output from arc lint will be valid XML (i.e. arc lint --output xml). In any case (but specifically in my case), the output from arc lint --lintall --never-apply-patches {subset 1 of $files} + arc lint --lintall --never-apply-patches {subset ... of $files} + arc lint --lintall --never-apply-patches {subset N of $files} is not identical to the output of arc lint --lintall --never-apply-patches {$files}.

joshuaspence renamed this task from `arc lint --everything` uses a large amount of memory in when executed in large repositories. to `arc lint --everything` uses a large amount of memory in when executed in large repositories.Jul 10 2014, 9:02 PM

Would this task be available for paid prioritization?

Yeah, but I'd have to take another look at the code to give you an estimate since I don't remember exactly how much stuff is standing in our way. I'm headed to bed pretty soon but can take a gander tomorrow morning.

Can you remind me why we lint by linter instead of by path? As in, we lint all paths for a given linter before moving to the next linter.

I feel that the memory usage could be further improved if we linted by path, because we would only need to store a single XHPAST tree in memory. This means that we wouldn't need chunking.

I feel that the memory usage could be further improved if we linted by path, because we would only need to store a single XHPAST tree in memory. This means that we wouldn't need chunking.

This would also drop parallelism from 8 (or whatever) to 1, making the process 8 times slower.

You can change the chunk size constant to 1 to see the effect -- I expect it will lower memory usage (but not provide a huge reduction in peak, which I think is approximately a function of the largest file in the repository at these chunk sizes) but be much slower overall.

@epriestley, would you be open to making the chunk size configurable? Changing the chunk size from 32 to 512 reduced the amount of time required for us to lint ~1500 TypeScript files from 370 seconds to 66 seconds.