I ran across this in the context of T13586. PhutilRemarkupHyperlinkRule uses a (slightly more complex) version of this regular expression to find links:
(\w{3,}://[^\s]+)
When matched against an input like:
AAAAAA...
...this expression executes very slowly:
$ cat example.php <?php $pattern = '(\w{3,}://)'; $corpus = str_repeat('A', 1024 * 512); $result = preg_match($pattern, $corpus); var_dump($result); $ time php -f example.php int(0) php -f example.php 64.72s user 0.74s system 99% cpu 1:05.48 total
Here, it took 64s to match a 512KB input. This may be specific to particular PHP and/or PCRE versions, since I'd naively expect it to have arisen earlier.
I think the issue is: since this regex has no concrete/anchoring prefix, the PCRE engine may backtrack explosively trying to match \w{3,}. Replacing it with \w+ doesn't fix the problem.
We'd like this pattern to anchor on ://. We can't use a lookbehind to match the protocol portion because the lookbehind pattern is not fixed-length.