Some profiling using XHProf in the Dark Console showed me that Diffusion was wasting a ton of time on array_merge. This change sped up the loading of a large file in Diffusion from 16.8 seconds to 2.4 seconds.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers - Commits
- Restricted Diffusion Commit
rP9d0d1ac42f8a: Speed up DiffusionBrowseFileController by removing call to array_merge
Load files in Diffusion. They all look good. Also, use a PHP shell to try to manually verify that I still kinda remember some PHP and, yes, this is functionally equivalent to what was there before.
Diff Detail
- Repository
- rP Phabricator
- Branch
- array_merge
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 334 Build 334: [Placeholder Plan] Wait for 30 Seconds
Event Timeline
Really nice catch!
Update libphutil/ to clear lint. Those symbols were introduced in a more recent version. If you can't update /usr/local/libphutil, you can put development copies of arcanist/ and libphutil/ next to one another somewhere, then run that version of arc to get clean lint.
This will probably perform even faster:
$rows = array(); foreach ($stuff as $thing) { // ... $rows[] = $this->renderInlines(...); // .. } return array_mergev($rows);
Can you try that?
Oh look at this helpful documentation that some kind soul (who must spend all his time frustrated with PHP) wrote! https://secure.phabricator.com/book/phabflavor/article/php_pitfalls/
Thanks, will update this diff.
Wait, how does your suggestion work? This line means that $rows would be a mix of array objects and phutil_tag objects, so (I think that) you can't pass that into array_mergev.
I also don't know anything about the internals of PHP, but I wouldn't be surprised if no-op'ing on empty arrays is faster than building a giant array where half the entries are empty arrays, and then calling array_mergev. (In general, I'd expect very few lines to have inlines in Diffusion -- those should just be stuff like lint warnings, which should hopefully be rare.)
Oh, you're right -- a cleaner construction is probably:
foreach ($cur_inlines as $cur_inline) { $rows[] = $cur_inline; } // .. } return $rows;
The problem with this "merge rarely" is that array_merge() copies both arguments, so if we have a pathological case (say, 100 inlines) we'll copy every row 100 times (or maybe 50 times on average or whatever). Not as bad as copying every row 5000 times (or however many lines are in the file) but still not good.
Closed by commit rP9d0d1ac42f8a (authored by Jacob Hurwitz <jhurwitz@dropbox.com>, committed by @epriestley).