See https://discourse.phabricator-community.org/t/pygmentize-causing-high-cpu-indefinitely/1602/12.
Pygments has a bad regex in the Bash lexer which can backtrack explosively on input in the form "\\\\\\\\....
This should be reported to the Pygments upstream.
Some steps for mitigating it in Phabricator might include:
- Fork and distribute a copy of Pygments as an external? I'm not sure how responsive the Pygments upstream is. This would simplify setup somewhat. Major downside is that the external is very large (~100K lines). Maybe more attractive after T5055?
- Swap to a PHP lexer. We'd generally like to do this anyway but converting these is a big pain.
- Put a hard timelimit on pygmentize evaluation. I'm not thrilled about adding this layer of complexity. It also tends to mean that we're sweeping a potentially serious problem under the rug. I'd like to have some sort of feedback mechanism that lets us identify inputs where runtime is not approximately O(N) on input size so we can fix the lexers. At a minimum, this should clearly surface the failure to the UI ("Syntax highlighting for this file took an unreasonably long time...")