Page MenuHomePhabricator

Investigate lint performance issues
Open, WishlistPublic

Description

See D12501#125742 for some context. I think that the linter workflow could be made much more performant.

Revisions and Commits

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Lint.
joshuaspence added a subscriber: joshuaspence.

It looks like the most significant function is AASTNode::getTokens.

I don't see any huuuuge wins, but:

This is only ~3% of runtime, but almost entirely memoizable:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=Filesystem::resolvePath

This might be getting hit enough to make turning it into a loop worthwhile, although this is pretty nitpicky:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=mpull

This is ~8ish% of runtime and probably has a faster implementation available (we can probably do a faster check than getSemanticString() to see if the statement is only a semicolon), or might not be worth spending 8% of runtime checking:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=ArcanistXHPASTLinter::lintUnnecessarySemicolons

This is decoding php_compat_info.json for every file, but could memoize it, and does 50K calls to idx() which might be worth turning into isset():

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=ArcanistXHPASTLinter::lintPHPCompatibility

This could do a faster check than getTokens() to abort, not sure if it would do much:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=ArcanistXHPASTLinter::lintBraceFormatting

This should be memoized:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=ArcanistLinter::getPaths

The profile likely overstates the cost of functions like idx() and mpull(), but occasionally they're worth reducing to isset() or similar if they're really getting hit tens of thousands of times.

I think php_compat_info.json is the biggest easy win.

All the file/path memoization is probably a close second.

This can probably build a single huge regexp and drop time to nearly zero:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-k7tcfv6t7e7o3itxfwns/?symbol=ArcanistSpellingLinter::checkExactWord

That is, instead of:

foreach ($words as $word) {
  preg_match($word);
}

...do:

preg_match("$word1|$word2|$word3|$word4")

Maybe with array_chunk() to do 1,000 at a time or something if preg_match() get upset with gigantic regexps.

Those regexps can also be memoized.

I think php_compat_info.json is the biggest easy win.

Unless you can somehow make XHPAST incredibly cheap, of course, but I don't think there's too much water left in that stone.

Here's an updated profile, this time from linting rP:

Couple things offhand:

Looks like the spelling linter is now the most expensive linter (~5% of runtime) -- the preg_match() stuff can probably reduce that a lot:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-qka36vkinukairqy7gk6/?symbol=ArcanistSpellingLinter::lintPath

This is probably memoizable, I think we're rebuilding the whole library map for each file chunk:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-qka36vkinukairqy7gk6/?symbol=PhutilLibraryMapBuilder::buildFileSymbolMap

btrahan triaged this task as Wishlist priority.Apr 23 2015, 5:39 PM
btrahan added a subscriber: btrahan.