Page MenuHomePhabricator

Pygments Bash lexer has explosive complexity on unterminated strings with many backslashes
Open, LowPublic

Description

See https://discourse.phabricator-community.org/t/pygmentize-causing-high-cpu-indefinitely/1602/12.

Pygments has a bad regex in the Bash lexer which can backtrack explosively on input in the form "\\\\\\\\....

This should be reported to the Pygments upstream.

Some steps for mitigating it in Phabricator might include:

  • Fork and distribute a copy of Pygments as an external? I'm not sure how responsive the Pygments upstream is. This would simplify setup somewhat. Major downside is that the external is very large (~100K lines). Maybe more attractive after T5055?
  • Swap to a PHP lexer. We'd generally like to do this anyway but converting these is a big pain.
  • Put a hard timelimit on pygmentize evaluation. I'm not thrilled about adding this layer of complexity. It also tends to mean that we're sweeping a potentially serious problem under the rug. I'd like to have some sort of feedback mechanism that lets us identify inputs where runtime is not approximately O(N) on input size so we can fix the lexers. At a minimum, this should clearly surface the failure to the UI ("Syntax highlighting for this file took an unreasonably long time...")

Event Timeline

epriestley triaged this task as Low priority.
nochum added a subscriber: nochum.Fri, Nov 30, 7:24 PM
nochum added a comment.Fri, Dec 7, 3:06 PM

FYI, I think this upstream issue with pygments addresses this issue (was open since 2016, but only recently had a commit made to fix it): https://bitbucket.org/birkenfeld/pygments-main/issues/1257/infinite-loops-in-specific-files

Ah, thanks! Yeah, that looks like the same thing. I think the bug title isn't quite correct (runtime is dramatically worse than O(input), but not infinite) and the analysis is slightly misleading (escaping the $ doesn't matter, I think) but the patch looks like it will get the job done (although, offhand, I'm unsure about the ($|(?=")|(?=\$) groups).

Since the upstream has had a detailed reproduction case and a patch for this for ~2 years without fixing it, that makes the "drop the dependency" approaches look more attractive to me. D19847 should at least limit the amount of damage that bad behavior in Pygments can cause to Phabricator.