HomePhabricator

Prevent backtracking on long JSON strings with escape codes

Description

Prevent backtracking on long JSON strings with escape codes

Summary:
The JSONLintLexer class uses a series of regular expression rules to verify that
lexed JSON tokens are formatted correctly. The string check attempts to express
that strings are well-formed by matching:

  • Start of token
  • A double-quote
  • A repetition of zero or more of:
    • A two-character escape sequence (i.e. \n)
    • A six-character unicode codepoint (i.e. \u1234)
    • Any number of printable characters that are not \ or ".
  • A closing quote

Key to this regex is that the three alternatives that make up the core of the
string are non-overlapping sets. The PCRE engine which backs PHP, however, does
not note this, and assumes that it must thus mark each run though this
alternation as a possible point to backtrack to. To do so, internally it uses a
stack frame. Long strings can thus cause the process to run out of stack space;
this can be replicated via:

preg_match('{^(?:x|y)*}', str_repeat("x", 10000));

PCRE does not consider this type of stack exhaustion a bug -- see
https://bugs.exim.org/show_bug.cgi?id=979

Specifically, long JSON strings with many escape codes (each of which
necessitates a stack frame) can trigger this stack exhaustion in the PCRE
engine, and thus segfault the PHP process. This failure mode does not trigger
on PHP 7.0 and up, which default to using PCRE's JIT mode.

Force the PCRE engine to realize that the alternation cases when parsing the
JSON string are non-overlapping, and that backtracking across them is never an
option, by using the non-backtracking (?>x|y) capture form, along with the
non-backtracking repetition symbols ++ and *+. These allow PCRE to use a
tail-recursive form, thus preventing stack exhaustion.

Test Plan:
In a repository configured to lint JSON files:

perl -le 'print q|{"k":"|.(q|a\\n|x10000).q|"}|' > test.json
arc lint test.json

...no longer segfaults.

Reviewers: Blessed Reviewers, epriestley

Reviewed By: Blessed Reviewers, epriestley

Subscribers: Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D17363

Details

Provenance
alexmvAuthored on Feb 15 2017, 10:11 PM
alexmvPushed on Feb 16 2017, 10:42 PM
Reviewer
Blessed Reviewers
Differential Revision
D17363: Prevent backtracking on long JSON strings with escape codes
Parents
rPHUe78b0df10881: Remove unused class "PhutilConsoleConcatenatedView"
Branches
Unknown
Tags
Unknown
Build Status
Buildable 15668
Build 20691: Run Core Tests