Page MenuHomePhabricator

Pygments Bash lexer has explosive complexity on unterminated strings with many backslashes
Open, LowPublic

Description

See https://discourse.phabricator-community.org/t/pygmentize-causing-high-cpu-indefinitely/1602/12.

Pygments has a bad regex in the Bash lexer which can backtrack explosively on input in the form "\\\\\\\\....

This should be reported to the Pygments upstream.

Some steps for mitigating it in Phabricator might include:

  • Fork and distribute a copy of Pygments as an external? I'm not sure how responsive the Pygments upstream is. This would simplify setup somewhat. Major downside is that the external is very large (~100K lines). Maybe more attractive after T5055?
  • Swap to a PHP lexer. We'd generally like to do this anyway but converting these is a big pain.
  • Put a hard timelimit on pygmentize evaluation. I'm not thrilled about adding this layer of complexity. It also tends to mean that we're sweeping a potentially serious problem under the rug. I'd like to have some sort of feedback mechanism that lets us identify inputs where runtime is not approximately O(N) on input size so we can fix the lexers. At a minimum, this should clearly surface the failure to the UI ("Syntax highlighting for this file took an unreasonably long time...")

Event Timeline

epriestley created this task.

FYI, I think this upstream issue with pygments addresses this issue (was open since 2016, but only recently had a commit made to fix it): https://bitbucket.org/birkenfeld/pygments-main/issues/1257/infinite-loops-in-specific-files

Ah, thanks! Yeah, that looks like the same thing. I think the bug title isn't quite correct (runtime is dramatically worse than O(input), but not infinite) and the analysis is slightly misleading (escaping the $ doesn't matter, I think) but the patch looks like it will get the job done (although, offhand, I'm unsure about the ($|(?=")|(?=\$) groups).

Since the upstream has had a detailed reproduction case and a patch for this for ~2 years without fixing it, that makes the "drop the dependency" approaches look more attractive to me. D19847 should at least limit the amount of damage that bad behavior in Pygments can cause to Phabricator.

It seems the commit didn't fully fix the issue. When pygmentize runs for phabricator, there are processed that are forked from the initial process that are still never killed and they cause the cpu to spike. For example, when running into this issue on our install, we see something like:
#ps aux | grep pygm
user 13190 0.0 0.0 4456 680 ? S 16:46 0:00 sh -c pygmentize -O encoding=utf-8 -O stripnl=False -f html -l 'bash'
user 13191 0.0 0.0 4456 784 ? S 16:46 0:00 sh -c pygmentize -O encoding=utf-8 -O stripnl=False -f html -l 'bash'
user 13192 74.7 0.0 23996 9336 ? R 16:46 0:05 /usr/bin/python /usr/local/bin/pygmentize -O encoding=utf-8 -O stripnl=False -f html -l bash
user 13193 76.7 0.0 23636 9028 ? R 16:46 0:06 /usr/bin/python /usr/local/bin/pygmentize -O encoding=utf-8 -O stripnl=False -f html -l bash

Then when the 15 minute limit kicks in the top 2 go away, but the bottom 2 remain, spiking the cpu.

@nochum, if you want to try replacing 'pygmentize ... with 'exec pygmentize ... near line 25 of PhutilPygmentsSyntaxHighlighter.php and that fixes it, I'll upstream it. exec will stop dash from leaving an sh -c process running.

(Also: the time limit should be 15 seconds, not 15 minutes! If you're seeing the sh processes die after 15 minutes, something else is likely afoot.)

@nochum, if you want to try replacing 'pygmentize ... with 'exec pygmentize ... near line 25 of PhutilPygmentsSyntaxHighlighter.php and that fixes it, I'll upstream it. exec will stop dash from leaving an sh -c process running.

Worked perfectly!

(Also: the time limit should be 15 seconds, not 15 minutes! If you're seeing the sh processes die after 15 minutes, something else is likely afoot.)

Yes, it's 15 seconds, I misspoke.

epriestley added a revision: Restricted Differential Revision.Feb 5 2019, 3:09 AM
amckinley removed a revision: Restricted Differential Revision.Feb 5 2019, 11:43 PM