Page MenuHomePhabricator

PREG_BACKTRACK_LIMIT_ERROR parsing commit under PhabricatorRepositoryCommitHeraldWorker/PhabricatorCustomFieldMonogramParser
Closed, ResolvedPublic

Description

I've added a new repository to Phabricator, and all of my commits have been parsed except for one. Whenever the daemon tries to parse the commit, it fails with the following exception:

EXCEPTION: (PhutilProxyException) Error while executing Task ID 1178866. {>} (Exception) Regular expression "/(?:^|\b)(?i:(revert|reverts|reverted|backout|backsout|backedout|back out|backs out|backed out|undo|undoes)\s+)(?i:(commit|commits|change|changes|changeset|changesets|rev|revs|revision|revisions|diff|diffs)\s+)?((?:[rA-Z0-9a-f]+[,\s]*)+)(?:\band\s+([rA-Z0-9a-f]+))?(?i:())?(?:$|\b)/" is invalid! at [<phabricator>/src/infrastructure/customfield/parser/PhabricatorCustomFieldMonogramParser.php:41]
arcanist(head=master, ref.master=4c3d75401f81), phabricator(head=master, ref.master=779a612e4172), phutil(head=master, ref.master=6e8db556dbb9)
   #0 <#2> PhabricatorCustomFieldMonogramParser::parseCorpus(string) called at [<phabricator>/src/applications/repository/worker/PhabricatorRepositoryCommitHeraldWorker.php:65]
   #1 <#2> PhabricatorRepositoryCommitHeraldWorker::parseCommit(PhabricatorRepository, PhabricatorRepositoryCommit) called at [<phabricator>/src/applications/repository/worker/PhabricatorRepositoryCommitParserWorker.php:44]
   #2 <#2> PhabricatorRepositoryCommitParserWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:91]
   #3 <#2> PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:162]
   #4 <#2> PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:22]
   #5 PhabricatorTaskmasterDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:183]
   #6 PhutilDaemon::execute() called at [<phutil>/scripts/daemon/exec/exec_daemon.php:125]

I wrote a small test script that reproduces the issue, using the code and regular expression from Phabricator and a sample commit message:

<?php
    $pattern = '/(?:^|\b)(?i:(revert|reverts|reverted|backout|backsout|backedout|back out|backs out|backed out|undo|undoes)\s+)(?i:(commit|commits|change|changes|changeset|changesets|rev|revs|revision|revisions|diff|diffs)\s+)?((?:[rA-Z0-9a-f]+[,\s]*)+)(?:\band\s+([rA-Z0-9a-f]+))?(?i:())?(?:$|\b)/';
    $corpus = "Revert my bug.\n\nThis reverts commit 2db7ea8d243404f73aee00cc2804bd7779aee762o.";
    $ok = preg_match_all(
      $pattern,
      $corpus,
      $matches,
      PREG_SET_ORDER | PREG_OFFSET_CAPTURE);

    if ($ok === false) {
      throw new Exception(preg_last_error());
    } else {
        print_r($matches);
    }
?>

When I run it, it throws an exception with message "2", which I believe corresponds to PREG_BACKTRACK_LIMIT_ERROR. I've tried increasing the pcre.backtrack_limit setting to 1000000000, but that just delays the time until the exception is thrown. When I increase the limit even higher, to 1000000000000, the script runs for longer but does not complete within the 4 minute window allocated to the task in Phabricator. (My machine is a Mac Pro with a 3 GHz 8-core processor and 32 GB of RAM.)

Event Timeline

edibiase updated the task description. (Show Details)
edibiase added a project: Herald.
edibiase added a subscriber: edibiase.

I've heard rumors the watch team uses Phabricator.

Thanks! This is an extremely thorough and helpful bug report.

I think this is where things are tripping up:

2db7ea8d243404f73aee00cc2804bd7779aee762o.
                                        ^
                                        |
Unexpected lowercase "Oh" in hash  -----+

The regular expression is backtracking explosively and trying to match substrings of the hash as hashes.

I think we can fix this by requiring a word boundary (or end-of-input) after a match.

Note that this particular "reverts" (with a commit hash followed by some noise) won't be recognized after this patch: the regexp will simply fail to match. I think this is correct, although we could plausibly be more aggressive about it. But I think most cases where we'd hit this are probably silly nonsense (say, in copy/pasted output from other tools that describes the context of a change) that users would expect not to match, and that sneaky "o" characters are probably a rare case.

We don't do all that much with these matches today anyway, but could look at this again in the future if we start doing more with them and/or users run into similar issues with typos or other cases where the input isn't really a hash but users reasonably expect it to match.

This should now be fixed in HEAD. Thanks for the report! Let us know if you run into anything else.

Thank you very much! I've updated to HEAD and everything seems to be running smoothly now.