Paths

Table of Contentst

Differential D9370

Implementation of PhutilLexer for Python
ClosedPublic
Actions

Authored by sophiebits on Jun 4 2014, 3:10 AM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Commits

rPHU9b2f35480dc0: Implementation of PhutilLexer for Python

Summary

This appears to be ~1.5-2.0x faster than pygments on my machine, which isn't as good as I was hoping for but will help. (This is ~330 ms for a 4400-line file I chose at random.)

Test Plan

Loaded a Python paste and saw highlights that approximately (or maybe exactly) match the pygments version.

Diff Detail

Repository

rPHU libphutil

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

sophiebits updated this revision to Diff 22333.Jun 4 2014, 3:10 AM

sophiebits retitled this revision from to Implementation of PhutilLexer for Python.

sophiebits updated this object.

sophiebits edited the test plan for this revision. (Show Details)

sophiebits added a reviewer: epriestley.

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptJun 4 2014, 3:10 AM

Herald added subscribers: Korvin, epriestley. · View Herald Transcript

Harbormaster completed remote builds in B840: Diff 22333.Jun 4 2014, 3:11 AM

sophiebits added inline comments.Jun 4 2014, 7:37 PM

src/lexer/PhutilPythonFragmentLexer.php
2	test comment, please ignore

Oops, don't allow newlines in non-triple-quoted strings

Harbormaster completed remote builds in B846: Diff 22340.Jun 4 2014, 7:51 PM

This stuff is pretty hard to review comprehensively, but we can tweak it and start adding unit tests or whatever if there are issues. It looks structurally correct to me, and I didn't catch anything suspicious looking.

Thanks for putting this together!

This revision is now accepted and ready to land.Jun 5 2014, 6:39 PM

Closed by commit rPHU9b2f35480dc0 (authored by @spicyj, committed by @epriestley).

epriestley mentioned this in D14273: Skip pygmentize for large source and too long lines.Oct 15 2015, 1:58 PM

csilvers added a subscriber: csilvers.May 19 2017, 12:01 AM

csilvers added inline comments.

src/lexer/PhutilPythonFragmentLexer.php
219	I was just looking over this due to a recent mention in a chat, and while I don't understand what's going on here, I can't help but wonder if this should be `\\\\"` instead of `\\\\\'`.

I'm not entirely sure how to test the rule, but the upstream seems to use ":

https://bitbucket.org/birkenfeld/pygments-main/src/7941677dc77d4f2bf0bbd6140ade85a9454b8b80/pygments/lexers/python.py?at=default&fileviewer=file-view-default#python.py-220

If anyone can show me a Python file where this matters, I'm happy to fix it and put some test coverage on it.

This was my best guess:

maverick.py

print r"Tom \"Maverick\" Cruise"
print r'Tom \'Maverick\' Cruise'

...but neither string highlights specially and the runtime behavior just makes me confused about why Python has this feature:

$ python maverick.py
Tom \"Maverick\" Cruise
Tom \'Maverick\' Cruise

...what? Why?

In D9370#217514, @epriestley wrote:
This was my best guess:
maverick.py
print r"Tom \"Maverick\" Cruise"
print r'Tom \'Maverick\' Cruise'
...but neither string highlights specially and the runtime behavior just makes me confused about why Python has this feature:
$ python maverick.py
Tom \"Maverick\" Cruise
Tom \'Maverick\' Cruise
...what? Why?

Try taking out the r prefix. It's for "raw strings", in which backslashes are interpreted literally. This is mostly used for regexps (python does not support /.../ strings, which are how "raw" strings are implemented in most languages).

Oh, my assumption was that the raw part was important because of the comment ("// included here for raw strings").

Non-raw strings seem to work correctly without changes (see inline about why):

Screen Shot 2017-05-18 at 5.32.50 PM.png (153×328 px, 17 KB)

src/lexer/PhutilPythonFragmentLexer.php
306	Additional normal escaping rules get merged in here, and seem to handle things in normal strings.

In D9370#217516, @epriestley wrote:

Oh, my assumption was that the raw part was important because of the comment ("// included here for raw strings").

Non-raw strings seem to work correctly without changes (see inline about why):

OK, as I say I have no idea what the line in question is doing. But I do think it's probably wrong. It may just be wrong in a way that doesn't matter.

Yeah, I think it's wrong but I'm not sure if the right version is to use a " or to remove it completely, since I can't find an input which it does anything for.