Page MenuHomePhabricator

Fix LiskMigrationIterator scaling in O(n^2)
ClosedPublic

Authored by michaeljs1990 on May 9 2016, 5:34 PM.
Tags
None
Referenced Files
F14019605: D15869.diff
Tue, Nov 5, 10:54 PM
F13992811: D15869.diff
Tue, Oct 22, 6:57 PM
F13981908: D15869.id38225.diff
Sat, Oct 19, 7:52 PM
F13959837: D15869.id38224.diff
Mon, Oct 14, 8:54 PM
Unknown Object (File)
Sep 12 2024, 8:30 AM
Unknown Object (File)
Sep 5 2024, 8:13 PM
Unknown Object (File)
Sep 4 2024, 5:09 PM
Unknown Object (File)
Aug 30 2024, 9:14 AM
Subscribers
Tokens
"The World Burns" token, awarded by yelirekim."Doubloon" token, awarded by madmik3."Pterodactyl" token, awarded by epriestley.

Details

Summary

When running migrations that insert more than 1000 rows at a time the PhutilChunkedIterator becomes exponentially slower at generating chunks. The code below was where I encountered this issue originally.

<?php

$message_table = new HarbormasterBuildUnitMessage();

$unit_test_table = new CIUnitTestHarbormasterBuildUnitMessage();
$conn_w = $unit_test_table->establishConnection('w');

$size = 5000;
$row_iter = id(new LiskMigrationIterator($message_table))->setPageSize($size);
$chunk_iter = new PhutilChunkedIterator($row_iter, $size);

// point 1
foreach ($chunk_iter as $chunk) {
  // point 2
  $sql = array();
  ...
}

where the time between the point 1 and point 2 comment exceed 1 second around the 5k mark and become unresponsive around 15k. PHP seems to do a very poor job at converting large arrays to boolean values as can be seen with this code sample. As I was testing this I noticed it's completely data dependent so it's possibly for arrays containg scalar values this is much more performant.

<?php

$root = dirname(__FILE__);
require_once $root.'/integrator/scripts/__init_script__.php';

$bigbig = $build_unit_messages = id(new HarbormasterBuildUnitMessage())
  ->loadAllWhere('id > %s LIMIT 1000000', 0);

$start = microtime(true);
(bool)$bigbig;
echo microtime(true) - $start;

This code on my computer causes 1.88 seconds to be output as opposed to 0.0000006 seconds when wrapping it in count.

My title claim is somewhat bold. In my case that was the issue but your millage may vary.

Test Plan

As seen above.

Diff Detail

Repository
rPHU libphutil
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

michaeljs1990 retitled this revision from to Fix LiskMigrationIterator scaling in O(n^2).
michaeljs1990 updated this object.
michaeljs1990 edited the test plan for this revision. (Show Details)

It looks like this problem is gone on php7 and hhvm based on this output https://3v4l.org/dP8LY#v5619

epriestley added a reviewer: epriestley.

My best guess at why this happens, which is completely not meaningfully backed up by any data, is that this is because casting creates a copy of the value first in older PHP:

zend_vm_def.h
	if (opline->extended_value != IS_STRING) {
		ZVAL_COPY_VALUE(result, expr);
		if (!IS_OP1_TMP_FREE()) {
			zendi_zval_copy_ctor(*result);
		}
	}

This isn't necessary for most casts, and appears to have a cost that it somewhat proportional to the size of the array. So PHP is actually doing something like this:

$result = "make a copy of" $x;
$result = (bool)$result;

Where "make a copy of" isn't nearly as expensive as clone but isn't O(1) either.

It looks like this is the commit that improved the behavior:

https://github.com/php/php-src/commit/9263d18bd9655a5f64f1956d9536e76e9ec22144

The if ($array) variation doesn't go through the same code and doesn't appear to be affected:

https://3v4l.org/kH5bc

We currently recommend if ($array) { over if (count($array)) { for simplicity/consistency, and it looks like that recommendation is fine to continue providing because it doesn't have this degenerate case. But we may want to be more careful about explicitly casting (bool)$array in the future.

Thanks for hunting this down!

This revision is now accepted and ready to land.May 9 2016, 7:16 PM

I added you to Blessed Committers so you should be able to land this yourself, see that project description for tips.

(I also added you to Community, granting you sweeping new powers to abuse.)

This revision was automatically updated to reflect the committed changes.