This appears to be ~1.5-2.0x faster than pygments on my machine, which isn't as good as I was hoping for but will help. (This is ~330 ms for a 4400-line file I chose at random.)
- Group Reviewers
- rPHU9b2f35480dc0: Implementation of PhutilLexer for Python
Loaded a Python paste and saw highlights that approximately (or maybe exactly) match the pygments version.
This stuff is pretty hard to review comprehensively, but we can tweak it and start adding unit tests or whatever if there are issues. It looks structurally correct to me, and I didn't catch anything suspicious looking.
Thanks for putting this together!
I was just looking over this due to a recent mention in a chat, and while I don't understand what's going on here, I can't help but wonder if this should be \\\\" instead of \\\\\'.
I'm not entirely sure how to test the rule, but the upstream seems to use ":
If anyone can show me a Python file where this matters, I'm happy to fix it and put some test coverage on it.
This was my best guess:
print r"Tom \"Maverick\" Cruise" print r'Tom \'Maverick\' Cruise'
...but neither string highlights specially and the runtime behavior just makes me confused about why Python has this feature:
$ python maverick.py Tom \"Maverick\" Cruise Tom \'Maverick\' Cruise
Try taking out the r prefix. It's for "raw strings", in which backslashes are interpreted literally. This is mostly used for regexps (python does not support /.../ strings, which are how "raw" strings are implemented in most languages).
Oh, my assumption was that the raw part was important because of the comment ("// included here for raw strings").
Non-raw strings seem to work correctly without changes (see inline about why):
Additional normal escaping rules get merged in here, and seem to handle things in normal strings.