Page MenuHomePhabricator

Meltdown and Spectre Speculative Branch Prediction Attacks
Closed, ResolvedPublic

Tokens
"Yellow Medal" token, awarded by neilfitz."Dat Boi" token, awarded by jcox."Party Time" token, awarded by kristo.mario."Dat Boi" token, awarded by ftdysa."Party Time" token, awarded by cspeckmim.
Assigned To
Authored By
epriestley
Jan 5 2018, 9:13 PM

Description

The "Meltdown" and "Spectre" attacks have recently been widely disclosed. The are attacks against a large family of CPUs that use speculative execution features to read memory by observing cache timing side effects.

Phabricator is not affected because it is written in PHP, a great language with excellent security features which protect it against this kind of attack. Among other advanced capabilities, PHP instructions execute too slowly to allow a runtime program to distinguish between L1 cache access and main memory access.

This is also a local privilege escalation attack. At least today, Herald rules are insufficiently expressive to allow an attacker to encode a speculative execution cache timing side channel attack.

Related Objects

Event Timeline

epriestley triaged this task as Normal priority.Jan 5 2018, 9:13 PM
epriestley created this task.
epriestley updated the task description. (Show Details)Jan 5 2018, 9:13 PM
epriestley closed this task as Resolved.Jan 5 2018, 9:24 PM
epriestley claimed this task.

Another quality Security update from Phabricator! 🐕

ftdysa awarded a token.Jan 5 2018, 9:46 PM
jcox awarded a token.Jan 9 2018, 4:50 PM

Among other advanced capabilities, PHP instructions execute too slowly to allow a runtime program to distinguish between L1 cache access and main memory access.

That just means the attacker needs a bigger sample size!

epriestley added a comment.EditedJan 18 2018, 2:10 AM

In theory, yes. But I suspect the actual required sample size may be on the scale of "heat death of the universe", potentially making this attack less practical than just brute forcing whatever secret you're trying to extract.

We get occasional reports via HackerOne about not using constant-time functions for various things. (The highest quality one is https://hackerone.com/reports/220246, although it's internal. There, the concern is that the PHP implementation of bin2hex() is not constant-time.)

Given that we've received multiple reports like this, a significant amount of effort has gone to thinking about timing attacks in PHP. Someone has even written this library which provides implementations of a bunch of standard PHP library functions in PHP.

So: how do we know that this library works? How can we attack an application, rewrite it to use this library, and then show that we can no longer attack it?

It has some unit tests, but all of the tests only cover that the reimplementations are correct. They don't actually demonstrate that, for some input, bin2hex() has a detectable timing difference while Hex::encode() doesn't.

The README and references all ultimately boil down to papers talking about some proper programming language like C. As far as I can tell, no one has ever written a proof-of-concept of any timing attack in PHP.

(The php-internals response -- from Rasmus, no less -- is, of course, concern that constant-time implementations will slow down PHP too much. This is the language I work in every day, mostly by choice!)

Okay, so let's just write a timing attack ourselves. Everyone seems convinced that this is easy, so it's probably trivial which is why no one bothers to demonstrate it. Right? We'll just show that this attack is realistic in a trivial way ourselves and then we can at least establish a ceiling and have some basis for discussing security policy.

How about this, the classic constant time hash comparison attack? Since === isn't constant time, we expect "abbb" === "aaaa" to execute faster than "aaab" === "aaaa", since the latter needs to examine more bytes before it can determine that the strings are unequal.

<?php

$N = 1000000000;

$test_cases = array(
  'abbb',
  'aabb',
  'aaab',
);

$timings = array();
foreach ($test_cases as $test_case) {
  $ii = $N;
  $t_start = microtime(true);
  while (--$ii) {
    if ($test_case === 'aaaa') {
      $result = true;
    } else {
      $result = false;
    }
  }
  $t_end = microtime(true);

  $timings[$test_case] = ($t_end - $t_start);
}

foreach ($timings as $test_case => $timing) {
  printf(
    'Test "%s" took %.3f ns per run.'."\n",
    $test_case,
    ($timing * 1000000000) / $N);
}
  • We're going to run A BILLION iterations.
  • We're running all the iterations in the tightest reasonable loop and take a local timing measurement.

This is completely unrealistic in an actual attack scenario, so this will let us guess the secret 'aaaa' no problem, right?

epriestley@orbital ~/Desktop $ php -f timing.php 
Test "abbb" took 20.212 ns per run.
Test "aabb" took 19.711 ns per run.
Test "aaab" took 21.371 ns per run.

That's not quite right.

Okay, well maybe we're just doing something wrong. Most of the literature ultimately refers to Opportunities and Limits of Remote Timing Attacks (Crosby 2009) as a reference to establish a ceiling here. What does that paper do?

It has a whole lot of really smart math, so we might be in trouble. But then all the math leads to this:

The best filter of each type was identified through brute force.

Ah, this paper really speaks directly to my soul.

It looks like on their dataset a 5th percentile filter was pretty good. Let's collect all the timing samples and just spot check some percentiles and see if we can get a similar result.

<?php

$N = 1000000000;
$percentile = 50;

$test_cases = array(
  'abbb',
  'aabb',
  'aaab',
);

$timings = array();
foreach ($test_cases as $test_case) {
  $ii = $N;
  $timings[$test_case] = array();

  $test_timings = array_fill(0, $N, 0);
  while ($ii--) {
    $t_start = microtime(true);
    if ($test_case === 'aaaa') {
      $result = true;
    } else {
      $result = false;
    }
    $test_timings[$ii] = ($t_start - microtime(true));
  }

  $timings[$test_case] = $test_timings;
}

foreach ($timings as $test_case => $set) {
  sort($set);

  $index = (int)floor(($percentile / 100) * $N);
  $measurement = $set[$index];

  printf(
    'Test "%s" took %.3f ns at percentile %s.'."\n",
    $test_case,
    ($measurement * 1000000000));
}

Okay, let's run it...

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 34359738376 bytes) in /Users/epriestley/Desktop/timing.php on line 17

Storing a billion integers in PHP requires 34GB of memory. There's probably some smart lossy data structure we could use to approximate the Nth percentile of a dataset almost always without storing every value, but let's just do A MILLION runs instead:

ini_set('memory_limit', '16G');
...
$N = 1000000;
$ php -f timing.php 
Test "abbb" took 0.000 ns at percentile 50.
Test "aabb" took 0.000 ns at percentile 50.
Test "aaab" took 0.000 ns at percentile 50.

This doesn't work, of course: the operations we want to time all take way less than 1us to execute, and the most accurate timing function we have in PHP only gives us microseconds.

No problem: Crosby didn't need a cooperating runtime to execute a timing attack and just did it over a network. We can do the same thing: we'll just use a C client on the attacking machine to do the timing, and it can time with nanosecond precision. We'll make requests to PHP, time them, then use smart statistical methods to filter out the jitter.

Crosby discusses jitter in Apache:

We can see that a realistic application like Apache introduces only a microsecond of jitter at 1000 samples even when highly loaded...
Of course, different application structures, on different operating systems, and on different hardware may yield more or less jitter.

Could be different, huh? Here's an un-fancy way to test our jitter:

epriestley@orbital ~ $ time curl https://secure.phabricator.com/ >/dev/null
real	0m0.288s
epriestley@orbital ~ $ time curl https://secure.phabricator.com/ >/dev/null
real	0m0.375s

It looks like (when not highly loaded) a realistic PHP application like Phabricator has about 100ms of jitter -- roughly 100,000 times more jitter than Apache as measured by Crosby, and they didn't attack Apache. And they used UDP. And they spent 14 days making the requests so they didn't cause load on the server. And they resolved a timing difference of ~30us over the internet, which is similar to the timing attack in Brumley and Boneh 2004 but even PHP does string comparisons with ~1ns timing differences.

Maybe I'm just not clever enough to figure out how to execute this attack, but there are a whole lot of orders of magnitude being hand-waved away there, and it looks a lot like the attack we're describing might just be a DOS attack. Or a slow way to guess the secret by brute force. And no one else appears to be able to execute this attack, either: they all just refer to Crosby 2009 to show that this class of attacks are possible in theory without actually establishing a ceiling for executing these attacks against a realistic PHP application.

I agree that these attacks are completely possible in theory. With enough samples, you can always filter out any amount of random noise that has been added to a signal. But I'm not quite sure how theory translates into practice here.

Of course, we use phutil_hashes_are_identical() to compare hashes: I have zero evidence that it actually defuses any attack or even has a constant execution time across all inputs, but it stops researchers from reporting that we're vulnerable to this attack.

Wow, I didn't mean for my two cents' worth of snark to cause such a stir! I do buy the argument that Phabricator isn't impacted by Meltdown/Spectre ("At least today, Herald rules are insufficiently expressive to allow an attacker to encode a speculative execution cache timing side channel attack.").


Just for the sake of distracting you furth^W^W^Wscientific discourse, here's a variant of the local timing code that runs with fairly consistent success for me:

<?php

$N = 1000000;
$batches = 100;

$test_cases = array(
  'abbb',
  'aabb',
  'aaab',
);

$best_timings = array();
foreach ($test_cases as $test_case) {
  for ($try = 0; $try < $batches; $try++) {
    $ii = $N;
    $t_start = microtime(true);
    while (--$ii) {
      if ($test_case === 'aaaa') {
        $result = true;
      } else {
        $result = false;
      }
    }
    $t_end = microtime(true);

    $time = $t_end - $t_start;
    if (isset($best_timings[$test_case])) {
      $best_timings[$test_case] = min($best_timings[$test_case], $time);
    } else {
      $best_timings[$test_case] = $time;
    }
  }
}

foreach ($best_timings as $test_case => $timing) {
  printf(
    'Test "%s" took %.3f ns per run in best batch.'."\n",
    $test_case,
    ($timing * 1000000000) / $N);
}
josiah@josiah-mbp:~$ php -f timing.php 
Test "abbb" took 28.079 ns per run in best batch.
Test "aabb" took 28.817 ns per run in best batch.
Test "aaab" took 29.763 ns per run in best batch.
josiah@josiah-mbp:~$ php -f timing.php 
Test "abbb" took 28.692 ns per run in best batch.
Test "aabb" took 28.744 ns per run in best batch.
Test "aaab" took 29.638 ns per run in best batch.
josiah@josiah-mbp:~$ php -f timing.php 
Test "abbb" took 28.676 ns per run in best batch.
Test "aabb" took 28.996 ns per run in best batch.
Test "aaab" took 29.880 ns per run in best batch.
josiah@josiah-mbp:~$ php -f timing.php 
Test "abbb" took 28.266 ns per run in best batch.
Test "aabb" took 28.621 ns per run in best batch.
Test "aaab" took 29.733 ns per run in best batch.
josiah@josiah-mbp:~$ php -f timing.php 
Test "abbb" took 28.407 ns per run in best batch.
Test "aabb" took 28.570 ns per run in best batch.
Test "aaab" took 30.326 ns per run in best batch.
josiah@josiah-mbp:~$ php --version
PHP 5.6.30 (cli) (built: Oct 29 2017 20:30:32) 
Copyright (c) 1997-2016 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies

Ah, nice!

The other thought I had was that using a cooperating subprocess and emitting signals to tell it to click a nanosecond-precision stopwatch might also make the attack more practical (or use pipes or domain sockets -- however you can get out of PHP with the lowest cost). They're probably all somewhat slow but likely better than microsecond-precision. Then you "just" need to get a cooperating binary onto the target host.