Page MenuHomePhabricator

Fix LiskMigrationIterator scaling in O(n^2)
ClosedPublic

Authored by michaeljs1990 on May 9 2016, 5:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 8:25 PM
Unknown Object (File)
Sat, Nov 16, 12:36 PM
Unknown Object (File)
Mon, Nov 11, 11:52 PM
Unknown Object (File)
Sat, Nov 9, 11:50 AM
Unknown Object (File)
Sat, Nov 9, 11:50 AM
Unknown Object (File)
Sat, Nov 9, 11:50 AM
Unknown Object (File)
Fri, Nov 8, 9:06 AM
Unknown Object (File)
Tue, Nov 5, 10:54 PM
Subscribers
Tokens
"The World Burns" token, awarded by yelirekim."Doubloon" token, awarded by madmik3."Pterodactyl" token, awarded by epriestley.

Details

Summary

When running migrations that insert more than 1000 rows at a time the PhutilChunkedIterator becomes exponentially slower at generating chunks. The code below was where I encountered this issue originally.

<?php

$message_table = new HarbormasterBuildUnitMessage();

$unit_test_table = new CIUnitTestHarbormasterBuildUnitMessage();
$conn_w = $unit_test_table->establishConnection('w');

$size = 5000;
$row_iter = id(new LiskMigrationIterator($message_table))->setPageSize($size);
$chunk_iter = new PhutilChunkedIterator($row_iter, $size);

// point 1
foreach ($chunk_iter as $chunk) {
  // point 2
  $sql = array();
  ...
}

where the time between the point 1 and point 2 comment exceed 1 second around the 5k mark and become unresponsive around 15k. PHP seems to do a very poor job at converting large arrays to boolean values as can be seen with this code sample. As I was testing this I noticed it's completely data dependent so it's possibly for arrays containg scalar values this is much more performant.

<?php

$root = dirname(__FILE__);
require_once $root.'/integrator/scripts/__init_script__.php';

$bigbig = $build_unit_messages = id(new HarbormasterBuildUnitMessage())
  ->loadAllWhere('id > %s LIMIT 1000000', 0);

$start = microtime(true);
(bool)$bigbig;
echo microtime(true) - $start;

This code on my computer causes 1.88 seconds to be output as opposed to 0.0000006 seconds when wrapping it in count.

My title claim is somewhat bold. In my case that was the issue but your millage may vary.

Test Plan

As seen above.

Diff Detail

Repository
rPHU libphutil
Branch
strange (branched from master)
Lint
Lint Passed
Unit
Tests Skipped
Build Status
Buildable 12099
Build 15248: arc lint + arc unit

Event Timeline

michaeljs1990 retitled this revision from to Fix LiskMigrationIterator scaling in O(n^2).
michaeljs1990 updated this object.
michaeljs1990 edited the test plan for this revision. (Show Details)

It looks like this problem is gone on php7 and hhvm based on this output https://3v4l.org/dP8LY#v5619

epriestley added a reviewer: epriestley.

My best guess at why this happens, which is completely not meaningfully backed up by any data, is that this is because casting creates a copy of the value first in older PHP:

zend_vm_def.h
	if (opline->extended_value != IS_STRING) {
		ZVAL_COPY_VALUE(result, expr);
		if (!IS_OP1_TMP_FREE()) {
			zendi_zval_copy_ctor(*result);
		}
	}

This isn't necessary for most casts, and appears to have a cost that it somewhat proportional to the size of the array. So PHP is actually doing something like this:

$result = "make a copy of" $x;
$result = (bool)$result;

Where "make a copy of" isn't nearly as expensive as clone but isn't O(1) either.

It looks like this is the commit that improved the behavior:

https://github.com/php/php-src/commit/9263d18bd9655a5f64f1956d9536e76e9ec22144

The if ($array) variation doesn't go through the same code and doesn't appear to be affected:

https://3v4l.org/kH5bc

We currently recommend if ($array) { over if (count($array)) { for simplicity/consistency, and it looks like that recommendation is fine to continue providing because it doesn't have this degenerate case. But we may want to be more careful about explicitly casting (bool)$array in the future.

Thanks for hunting this down!

This revision is now accepted and ready to land.May 9 2016, 7:16 PM

I added you to Blessed Committers so you should be able to land this yourself, see that project description for tips.

(I also added you to Community, granting you sweeping new powers to abuse.)

This revision was automatically updated to reflect the committed changes.