Page MenuHomePhabricator

phutil_utf8v_combined consumes great gobs of memory, changeset parsing fails
Closed, ResolvedPublic

Description

We have one lingering, repeatedly-failing task that when it runs burns a CPU for a long time and consumes memory until it eventually fails with ENOMEM.

Symptoms are similar to T11743, though the exact pattern differs, and it seems to be in a different part of the code. (And D16682 doesn't fix this one.) Crucially for my personal immediate happiness, this one doesn't cause the system to go unavailable, or not as conspicuously so -- either because of the subtle differences in the symptoms, or just because we happen to have only one affected task instead of a posse of them.

In brief: the task consumes tons of memory while computing various diffs in a Differential revision. Ultimately it fails when it tries to fork a process (a simple diff, it looks like) after already getting up to about 16GB of memory. The most painful case here for the code appears to be when we try to compute a nice diff on a giant, multi-MB line.

I should go home and eat dinner and things, but I may come back tomorrow and attempt a repro. Or perhaps these traces will be enough already to debug the issue and make a fix.


TRACES

The backtrace from SIGHUP, in the middle of the hang while memory use is growing, looks like this (full copy below):

#2 /phab_path/libphutil/src/utils/utf8.php(789): phutil_utf8v_combine_characters(Array)
#3 /phab_path/arcanist/src/difference/ArcanistDiffUtils.php(160): phutil_utf8v_combined('{"version":3,"s...')
#4 /phab_path/arcanist/src/difference/ArcanistDiffUtils.php(59): ArcanistDiffUtils::computeIntralineEdits('{"version":3,"s...', '{"version":3,"s...')
#5 /phab_path/phabricator/src/applications/differential/parser/DifferentialHunkParser.php(347): ArcanistDiffUtils::generateIntralineDiff('{"version":3,"s...', '{"version":3,"s...')
#6 /phab_path/phabricator/src/applications/differential/parser/DifferentialChangesetParser.php(745): DifferentialHunkParser->generateIntraLineDiffs()
#7 /phab_path/phabricator/src/applications/differential/parser/DifferentialChangesetParser.php(645): DifferentialChangesetParser->process()

The trace from --trace has several iterations like this:

>>> [75] <exec> $ pygmentize -O encoding=utf-8 -O stripnl=False -f html -l 'html+evoque'
>>> [76] <exec> $ pygmentize -O encoding=utf-8 -O stripnl=False -f html -l 'html+evoque'
<<< [75] <exec> 107,102 us
<<< [76] <exec> 104,431 us
>>> [77] <exec> $ diff '-bw' '-U65535' '-L' '/dev/universe 9999-99-99' '-L' '/dev/universe 9999-99-99' '/tmp/beqle54vuq88g8so/21297-Q4vlwR' '/tmp/dhoisyi69zk84csk/21297-c2v0aA'
<<< [77] <exec> 8,708 us
>>> [78] <event> differential.willMarkGenerated <listeners = 1>
<<< [78] <event> 266 us

Then it does this, and then hangs for a while, spinning the CPU:

>>> [79] <exec> $ pygmentize -O encoding=utf-8 -O stripnl=False -f html -l 'ts'
>>> [80] <exec> $ pygmentize -O encoding=utf-8 -O stripnl=False -f html -l 'ts'
<<< [79] <exec> 111,197 us
<<< [80] <exec> 107,945 us
>>> [81] <event> differential.willMarkGenerated <listeners = 1>
<<< [81] <event> 257 us
>>> [82] <exec> $ diff '-bw' '-U65535' '-L' '/dev/universe 9999-99-99' '-L' '/dev/universe 9999-99-99' '/tmp/4rxajs43r7acscsg/21297-pSmAQk' '/tmp/10x24j128vjkoc8k/21297-a0nqz5'
<<< [82] <exec> 161,331 us

At this point it's using about 3GB of memory, and the number isn't growing.

Then after a few minutes it actually prints another pair of lines, and after a few more minutes another:

>>> [83] <event> differential.willMarkGenerated <listeners = 1>
<<< [83] <event> 420 us

During this time there are bursts of growing memory use, and periods of stable memory use. It's continuously burning one core of CPU.

And a few more minutes later, we get to the end, including an exception where fork fails for lack of memory. At this point the process has had stable memory use for a while -- at nearly 16GiB, nearly half the memory on the machine, which is presumably why the kernel says "no thanks" to forking that.

Handily, this exception comes with its own backtrace:

>>> [84] <event> differential.willMarkGenerated <listeners = 1>
<<< [84] <event> 420 us
>>> [85] <exec> $ diff '-bw' '-U65535' '-L' '/dev/universe 9999-99-99' '-L' '/dev/universe 9999-99-99' '/tmp/a0kgzfdxruo0k4wg/21297-dDSlRR' '/tmp/diolz5sd7e8sk8o8/21297-rZbt9D'
>>> [86] <query> UPDATE `worker_activetask` SET `failureTime` = '1475805673', `taskClass` = 'PhabricatorApplicationTransactionPublishWorker', `leaseOwner` = NULL, `leaseExpires` = '1475805400', `failureCount` = '1754', `dataID` = '29804072', `priority` = '1000', `objectPHID` = 'PHID-DREV-olkoii564vunfq2hdkmw', `id` = '29804024' WHERE `id` = '29804024'
<<< [86] <query> 4,992 us
[2016-10-07 02:01:13] EXCEPTION: (Exception) Failed to `proc_open()`: proc_open(): fork failed - Cannot allocate memory at [<phutil>/src/future/exec/ExecFuture.php:675]
arcanist(head=master, ref.master=d455ddde6a9c), phabricator(head=master, ref.master=fd5825fafdcb, custom=1), phutil(head=master, ref.master=6c65998b0b1c), pytest-phabricator(head=1550854ff89dca474176d307f6a485cd5eff0452, ref.master=1939f1e4f656)
  #0 ExecFuture::isReady() called at [<phutil>/src/future/Future.php:37]
  #1 Future::resolve() called at [<phutil>/src/future/exec/execx.php:37]
  #2 exec_manual(string, array, TempFile, TempFile) called at [<phabricator>/src/infrastructure/diff/PhabricatorDifferenceEngine.php:100]
  #3 PhabricatorDifferenceEngine::generateRawDiffFromFileContent(string, string) called at [<phabricator>/src/infrastructure/diff/PhabricatorDifferenceEngine.php:163]
  #4 PhabricatorDifferenceEngine::generateChangesetFromFileContent(string, string) called at [<phabricator>/src/applications/differential/parser/DifferentialChangesetParser.php:710]
  #5 DifferentialChangesetParser::process() called at [<phabricator>/src/applications/differential/parser/DifferentialChangesetParser.php:645]
[... and then the same as from SIGHUP below.]

By my reading of these traces, the point where the exception is thrown isn't the main code at fault -- the real problem is that we're allocating such a huge amount of memory in the earlier phases.

Looking on the web at the diff the --trace output seems to refer to, it contains several large files that look like generated, minified CSS and JS files. In particular these have some gigantic lines which have changed. On the web we seem to actually successfully render most of the diffs, though I notice some busy apache2 processes for a while when I load the page... aha, in fact there's one file it tries to render and times out on, with the "Loading..." box sticking around forever after the apache2 process stops.

That file is one single line of 7387356 bytes (in both old and new versions), a giant JSON document with an extension .js.map that looks like a generated sourcemap. There are a few diffs scattered through it.


YET MORE DETAIL

Full backtrace from SIGHUP:

#0 /phab_path/libphutil/src/future/exec/PhutilSignalRouter.php(61): PhutilBacktraceSignalHandler->handleSignal(Object(PhutilSignalRouter), 1)
#1 /phab_path/libphutil/src/utils/utf8.php(832): PhutilSignalRouter->routeSignal(1)
#2 /phab_path/libphutil/src/utils/utf8.php(789): phutil_utf8v_combine_characters(Array)
#3 /phab_path/arcanist/src/difference/ArcanistDiffUtils.php(160): phutil_utf8v_combined('{"version":3,"s...')
#4 /phab_path/arcanist/src/difference/ArcanistDiffUtils.php(59): ArcanistDiffUtils::computeIntralineEdits('{"version":3,"s...', '{"version":3,"s...')
#5 /phab_path/phabricator/src/applications/differential/parser/DifferentialHunkParser.php(347): ArcanistDiffUtils::generateIntralineDiff('{"version":3,"s...', '{"version":3,"s...')
#6 /phab_path/phabricator/src/applications/differential/parser/DifferentialChangesetParser.php(745): DifferentialHunkParser->generateIntraLineDiffs()
#7 /phab_path/phabricator/src/applications/differential/parser/DifferentialChangesetParser.php(645): DifferentialChangesetParser->process()
#8 /phab_path/phabricator/src/applications/differential/parser/DifferentialChangesetParser.php(859): DifferentialChangesetParser->tryCacheStuff()
#9 /phab_path/phabricator/src/applications/differential/mail/DifferentialChangeDetailMailView.php(53): DifferentialChangesetParser->render()
#10 /phab_path/phabricator/src/applications/differential/editor/DifferentialTransactionEditor.php(1424): DifferentialChangeDetailMailView->buildMailSection()
#11 /phab_path/phabricator/src/applications/differential/editor/DifferentialTransactionEditor.php(1271): DifferentialTransactionEditor->appendChangeDetailsForMail(Object(DifferentialRevision), Object(DifferentialDiff), 'Index: arp/arp/...', Object(PhabricatorMetaMTAMailBody))
#12 /phab_path/phabricator/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php(2511): DifferentialTransactionEditor->buildMailBody(Object(DifferentialRevision), Array)
#13 /phab_path/phabricator/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php(2468): PhabricatorApplicationTransactionEditor->buildMailForTarget(Object(DifferentialRevision), Array, Object(PhabricatorMailTarget))
#14 /phab_path/phabricator/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php(1150): PhabricatorApplicationTransactionEditor->buildMail(Object(DifferentialRevision), Array)
#15 /phab_path/phabricator/src/applications/transactions/worker/PhabricatorApplicationTransactionPublishWorker.php(21): PhabricatorApplicationTransactionEditor->publishTransactions(Object(DifferentialRevision), Array)
#16 /phab_path/phabricator/src/infrastructure/daemon/workers/PhabricatorWorker.php(123): PhabricatorApplicationTransactionPublishWorker->doWork()
#17 /phab_path/phabricator/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php(171): PhabricatorWorker->executeTask()
#18 /phab_path/phabricator/src/infrastructure/daemon/workers/management/PhabricatorWorkerManagementExecuteWorkflow.php(52): PhabricatorWorkerActiveTask->executeTask()
#19 /phab_path/libphutil/src/parser/argument/PhutilArgumentParser.php(438): PhabricatorWorkerManagementExecuteWorkflow->execute(Object(PhutilArgumentParser))
#20 /phab_path/libphutil/src/parser/argument/PhutilArgumentParser.php(330): PhutilArgumentParser->parseWorkflowsFull(Array)
#21 /phab_path/phabricator/scripts/setup/manage_worker.php(21): PhutilArgumentParser->parseWorkflows(Array)
#22 {main}

Event Timeline

I'll look at this tomorrow but can probably figure it out from this. Initial guess is that we're doing this (which is O(big constant * string length)):

$amount_of_stuff = phutil_utf8v_combined($input);
if ($amount_of_stuff > $huge) {
  give up;
}

Instead, we can do this first (which is O(1)):

$rough_amount_of_stuff = strlen($input);
if ($rough_amount_of_stuff > $some_number_like_four * $huge) {
  give up;
}

...

That said, I'm surprised a ~7MB input requires 16GB of memory to process -- PHP is bad with this stuff, but not normally that bad -- so there may be something else afoot here.

(Is the line in question all normal boring ASCII text?)

It looks like this is largely just PHP being PHP. To handle UTF8 combining characters correctly, we're producing a list of strings like this internally as the atoms of comparison:

array(
  "<latin letter a> <combining umlaut> <combining circumflex>",
  "<latin letter q>",
  "<picture of a snowflake>",
  ...
);

For an 8MB input of ASCII text (that is, all single-byte UTF8 characters) this list has 8,388,608 elements and requires approximately 1.5GB of RAM. This works out to 177 bytes per element on a 64-bit system, which roughly squares with the realm of reasonable-in-PHP (each zval has reference counting, type information, etc; each array element has a key zval and a value zval and hash bucket metadata).

This conversion also takes 11 seconds, even when the entire input is the letter "q" -- and phutil_utf8v() already has a fast path for ASCII text (from cursory prodding, allocating and building the result seems like the slow part, not inspection of the input).

Per T2312, UTF8 handling is one of the best candidates for porting to C at some point, where even the sloppiest implementation would be many hundreds of times faster and less memory-hungry than the implementation in PHP.

Short of porting to C, we could try to improve this in PHP in a couple of ways:

Iterator: We could expose phutil_utf8v() as an iterator instead of having it return the entire result. This would reduce memory usage to something reasonable, but impose a performance penalty. This penalty is probably mild, but even if it's zero it seems reasonable to add the early bailout check to avoid spending 11 seconds churning before hitting the late bailout check.

Intern Bytes: This is silly, but I changed phutil_utf8v() to intern byte strings:

diff --git a/src/utils/utf8.php b/src/utils/utf8.php
index fb51c90..e60b207 100644
--- a/src/utils/utf8.php
+++ b/src/utils/utf8.php
@@ -336,11 +336,16 @@ function phutil_utf8v($string, $byte_limit = null) {
   $res = array();
   $len = strlen($string);
 
+  static $intern = array();
+
   $ii = 0;
   while ($ii < $len) {
     $byte = $string[$ii];
     if ($byte <= "\x7F") {
-      $res[] = $byte;
+      if (empty($intern[$byte])) {
+        $intern[$byte] = $byte;
+      }
+      $res[] = $intern[$byte];
       $ii += 1;
 
       if ($byte_limit && ($ii >= $byte_limit)) {

Strings are copy-on-write in PHP so this is "probably safe". This reduces memory usage by about 50%. ๐Ÿ—

Fast Path for Byte Strings: We could write a fast path for ASCII text, but we're back in the same spot as soon as someone has an 8MB input with one ๐Ÿ’ฏ emoji in the middle, and the code gets significantly more complex.


None of these fix-by-optimizing approaches are too attractive to me, so I'm just going to pursue the original fix (bailing early based with a coarse string byte length check) instead.

This possibly dovetails with T11713. Broadly, writing something like a PhutilUTF8StringMeasurer -- along the lines of PhutilUTF8StringTruncator -- that can do glyph/byte/character measurements and answer "is this string too long?" (which I think is usually the question we're asking) efficiently would probably clean up a number of these edge cases. I probably won't pursue that yet, but I'll make a note on T11713 about this.

D16683 makes us bail early if the combined length of the old and new lines is larger than 1280 bytes, which is a crude test for approximately two reasonable-length lines of emoji source code.

epriestley triaged this task as Normal priority.
epriestley added a project: Performance.

I believe this should now be fixed in HEAD of master. It should promote to stable within 24 hours.

I made a note about this in T11713 for eventual followup on broader UTF8 splitting/counting issues.

Thanks for the thorough report. Let us know if you're still seeing issues after updating or run into anything else.

This helped! Thanks for another fast fix.

The task in question still doesn't quite succeed. I'll file a new bug, because I think the one described in the original report here is in fact squashed.