See D12501#125742 for some context. I think that the linter workflow could be made much more performant.
Description
Revisions and Commits
rARC Arcanist | |||
D12520 | rARC46ce8a5a35f9 Memoize paths | ||
D12519 | rARC9ddf37b9ee3e Improve performance of LINT_UNNECESSARY_SEMICOLON | ||
D12518 | rARCd6f4bded25ee Memoize PHP compatibility info |
Related Objects
- Mentioned In
- D10541: Split the `ArcanistXHPASTLinter` into modular rules
- Mentioned Here
- D12501: Split large path lists into blocks when linting
Event Timeline
Here is a better profile (F379444), see https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/.
I don't see any huuuuge wins, but:
This is only ~3% of runtime, but almost entirely memoizable:
This might be getting hit enough to make turning it into a loop worthwhile, although this is pretty nitpicky:
https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=mpull
This is ~8ish% of runtime and probably has a faster implementation available (we can probably do a faster check than getSemanticString() to see if the statement is only a semicolon), or might not be worth spending 8% of runtime checking:
This is decoding php_compat_info.json for every file, but could memoize it, and does 50K calls to idx() which might be worth turning into isset():
This could do a faster check than getTokens() to abort, not sure if it would do much:
This should be memoized:
The profile likely overstates the cost of functions like idx() and mpull(), but occasionally they're worth reducing to isset() or similar if they're really getting hit tens of thousands of times.
I think php_compat_info.json is the biggest easy win.
All the file/path memoization is probably a close second.
This can probably build a single huge regexp and drop time to nearly zero:
That is, instead of:
foreach ($words as $word) { preg_match($word); }
...do:
preg_match("$word1|$word2|$word3|$word4")
Maybe with array_chunk() to do 1,000 at a time or something if preg_match() get upset with gigantic regexps.
I think php_compat_info.json is the biggest easy win.
Unless you can somehow make XHPAST incredibly cheap, of course, but I don't think there's too much water left in that stone.
Here's an updated profile
Direct link: https://secure.phabricator.com/xhprof/profile/PHID-FILE-qka36vkinukairqy7gk6/
Couple things offhand:
Looks like the spelling linter is now the most expensive linter (~5% of runtime) -- the preg_match() stuff can probably reduce that a lot:
This is probably memoizable, I think we're rebuilding the whole library map for each file chunk: