Page MenuHomePhabricator

Support PHP5.4+ syntax in XHPAST
Open, NormalPublic


I'm aware that this isn't on the roadmap (T4157, et al), but I'd find it incredibly helpful as we liberally use 5.4 [1] syntax at WePay, and attempting to roll our own parse tree using token_get_all would be... not a great idea. This makes both linters and symbol importing tedious at best (non-reusable state machines, manual brace matching, and plenty of other not-fun-stuff). Granted the symbol stuff still gets a little confused when it comes to namespaces, but that's unrelated to this.

I'm quite happy to take this on myself if will actually be accepted, but I don't want to waste time trying if it's something that won't actually be supported, i.e. D1207, so this mostly serves to gauge interest (is there a better forum for this?)

[1] short-array syntax and traits are problems; it seems to handle namespaces and closures just fine. We're not on 5.5 yet due to an apc change, so I'll leave yield and finally alone for now.


Differential Revisions
D19741: Let lint rules support anonymous classes
D19740: Fix class attribute node in anonymous class
D17820: Replace all the PHPAST JSON test data with readable test data
D17819: Make PHPAST parser tests stable and human-readable
D17817: Add support for exponentiation
D17816: Add support for constant scalar expressions
D17803: (WIP) Add anonymous classes support to xhpast
D17802: Add nullable type hint support to xhpast
D13967: Improve handling of PHP 5.4 features
D13968: Improve XHPAST handling of traits
rARC82445bb60534: Let lint rules support anonymous classes
rPHU6c83b169bc18: Fix class attribute node in anonymous class
D19736 / rPHU603209bb1756: Add support for anonymous class
D18770 / rPHU32746e8b0793: Add support for nullable parameter types
D18639 / rPHUae80e72461d2: Fix node for nullable return value
D18621 / rPHU2596aecadc53: Add support for nullable return types
D16138 / rPHU51c179b4c000: Support class const dereferencing in xhpast
D10337 / rPHU5f573d2b7f27: Correctly support try..catch..finally in xhpast
D8611 / rPHU904d39effd20: Add array method dereference support to xhpast.
D8150 / rPHUe719f331aa1a: Add support in XHPAST for "(new X)->y" syntax (added in 5.4)
D8033 / rPHU86d651f9c929: Mostly align parser to PHP 5.5.8
D8032 / rPHU93a29d837130: Align PHPAST lexer to PHP 5.5.8
D8031 / rPHU2a1a4b020a76: Add PHPAST test for remaining language features
D8030 / rPHUc2e32dcf8f29: Add PHPAST tests for static class expressions, callable, traits, finally and…
D8029 / rPHU20aa61e6c9fc: Add PHPAST test for empty heredoc
D8028 / rPHU74999552f0cc: Add a PHPAST test case for most of the PHP grammar
D8027 / rPHU65f813e7ab78: Add failing PHPAST test case for binary literals
D8026 / rPHU75feb3d50b23: Add PHPAST test cases for most lexer patterns
D8025 / rPHUcded4338fe33: Add unit test scaffolding for XHPAST/PHPAST

Event Timeline

Firehed created this task.Jan 21 2014, 7:59 AM
Firehed claimed this task.
Firehed raised the priority of this task from to Needs Triage.
Firehed updated the task description. (Show Details)
Firehed added a project: XHPAST.
Firehed added subscribers: Firehed, epriestley.

We're happy to put this in the upstream, it's just very complex to implement. Some general thoughts:

  • There's a known bug with empty HEREDOC with a partial fix in T3512.
  • It would be nice to get at least some basic test coverage on the AST while we're in here.
  • Since we've moved away from XHP, moving the name toward "PHPAST" is ideally desirable.
  • We may need grammar changes to support Class::{expr}()?
  • Binary literal is new, and I think has no syntax support.
  • callable is now a typehint (probably not relevant)?
  • Traits syntax.
  • Finally syntax.
  • Yield syntax.
  • Short array syntax.
  • Make sure we're actually parsing namespaces, not completely faking it.
  • foreach as list(...) syntax, old grammar may restrict possible expressions.
  • [1, 2][0] and "asdf"[0] syntax (these were probably runtime errors).
  • The AST is fast enough for use, but isn't fast. The core parsing task is fast, but much of the time is spent marshalling/unmarshalling trees to/from JSON and searching trees in PHP. It would be nice to provide an optional PHP extension to improve performance of one or both of these tasks (T2312).
  • We have a parsing bug in that we do not correctly parse <?php "{$v[""]}"; (see T635).
  • We have a parsing bug in that we do not correctly parse ?><span><?=1,2;?></span><?php (see T635).
  • Generally, we fake our way through expression parsing inside of strings because we don't use these syntaxes and don't have applications which need them to parse, but should probably do it correctly.

I'll see about making some headway here. Plan of attack is probably:

  • Add test coverage for all language features, insofar as we reasonably can.
  • Add known-failing tests for all the broken stuff.
  • Add known-failing tests for all the new/missing stuff.
  • Then, align the grammar with PHP 5.5 aggressively.
epriestley edited this Maniphest Task.Jan 21 2014, 5:18 PM
epriestley edited this Maniphest Task.Jan 21 2014, 5:37 PM
epriestley edited this Maniphest Task.Jan 21 2014, 5:46 PM
epriestley edited this Maniphest Task.Jan 21 2014, 6:19 PM
epriestley edited this Maniphest Task.Jan 21 2014, 6:42 PM
epriestley edited this Maniphest Task.Jan 21 2014, 7:01 PM

Stuff not yet covered:

  • ...because my PHP v5.5.0-dev was built before they existed so I need to recompile:
    • Literal dereference syntax
    • foreach as list
    • yield
  • ...other:
    • HEREDOC with EOF
    • string-index-in-string
    • weird span thing
epriestley edited this Maniphest Task.Jan 21 2014, 7:40 PM
epriestley edited this Maniphest Task.Jan 21 2014, 8:18 PM
epriestley edited this Maniphest Task.Jan 21 2014, 9:35 PM

Okay, this diff series does not address:

  • Callable static expressions (C::{$x.$y}()).
  • foreach as list() -- proably easy to add but I forgot about it.
  • Not 100% sure static array indexes parse into the tree reasonably.
  • Heredoc + EOF parses wrong.
  • I didn't fix the two weird edge cases (<?php "{$v[""]}";, ?><span><?=1,2;?></span><?php).
  • Generally, parsing of magic inside string literals isn't as sophisticated as it ideally should be.

There's also related cleanup:

  • The PHPFragmentLexer should recognize new keywords (finally, trait) and probably needs some grammar updates.
  • evalStaticScalar() in PHP probably needs to be updated for 0b1111.

I'm more-or-less not really planning to pursue any of that for now since I think most of it isn't especially valuable, but yell at me if you need it. Also if D8033, etc., did anything weird in building the new parse trees -- I haven't actually used them to write any rules, so I'm not sure if I, e.g., dropped parts of the tree in constructing it. Cursory inspection suggested they're OK, but it's hard to get right by just looking it over.

epriestley edited this Maniphest Task.Jan 21 2014, 10:32 PM
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley triaged this task as Normal priority.Jan 21 2014, 10:34 PM

Okay, let me know how much luck you have with stuff in HEAD? This probably has some remaining rough edges, but I think we're in much better shape than we were yesterday.

This is incredible, thanks! Our devops team is going to comb through all of this and the reviews and see if we need to take a crack at anything else.

Just as an update - so far all of the changes I've tested have been working really well for 5.4 source. The stock symbol generation script seems to not mind files with more modern grammar ($x = [];, trait, use, closures, etc), and the XHPAST linter seems ok too. As you'd expect we have a massive number of files in our codebase seeing that it started close to five years ago so naturally I haven't been able to run it on everything but I'm feeling pretty good about this :)

Awesome! One thing I'm vaguely aware of with the symbol script is that I think it emits an empty symbol for closures, but that should be easy to filter out if you haven't dealt with it already.

Update here - did spot one issue: (new Foo)->bar (added in 5.4) fails:

[2014-02-04 11:03:32] EXCEPTION: (XHPASTSyntaxErrorException) XHPAST Parse Error: syntax error, unexpected T_OBJECT_OPERATOR on line 2531 at [/home/sites/phabricator/libphutil/src/parser/xhpast/api/XHPASTTree.php:62]

The following change fixes the file:

-        return (new Shop_Item_Model())->object_name;
+       $SI = new Shop_Item_Model;
+       return $SI->object_name;

Also, unsurprisingly, this fails too:

$model = 'some_string';
return (new $model)->table_name;
Firehed edited this Maniphest Task.Feb 5 2014, 8:56 PM

Found another issue with some array dereferencing: if ($state && !isset(self::getSearchStates()[$state])) { still fails. I'll look into that one a bit later - once I saw what the zend grammar was doing... ick.

Firehed edited this Maniphest Task.Feb 5 2014, 9:33 PM

This particular construction isn't supported by our version of the grammar, but is part of Zend:

<?php $o->m()[0];

This causes php54.lint-test to fail in arcanist/, but has no other effects, so I'll probably won't get around to fixing it for a bit unless someone hits a real issue with it.

I just ran into that one in practice today, so I'll probably take a crack at another grammar update to handle it in the next 48 hours or so.

Firehed edited this Maniphest Task.Mar 25 2014, 7:06 AM
Firehed edited this Maniphest Task.Mar 25 2014, 8:36 PM
richardvanvelzen added a comment.EditedJul 18 2014, 7:27 AM
try {
} finally {

Catches are optional when you have a while. This small diff fixes that:

diff --git a/support/xhpast/parser.y b/support/xhpast/parser.y
index 7047dab..817beff 100644
--- a/support/xhpast/parser.y
+++ b/support/xhpast/parser.y
@@ -561,17 +561,12 @@ unticked_statement:
     $$ = NNEW(n_STATEMENT)->appendChild(NNEW(n_EMPTY));
     NMORE($$, $1);
-| T_TRY '{' inner_statement_list '}' T_CATCH '(' fully_qualified_class_name T_VARIABLE ')' '{' inner_statement_list '}' additional_catches finally_statement {
+| T_TRY '{' inner_statement_list '}' additional_catches finally_statement {
     NTYPE($1, n_TRY);
     $1->appendChild(NEXPAND($2, $3, $4));
-    NTYPE($5, n_CATCH);
-    $5->appendChild($7);
-    $5->appendChild(NTYPE($8, n_VARIABLE));
-    $5->appendChild(NEXPAND($10, $11, $12));
-    $1->appendChild(NNEW(n_CATCH_LIST)->appendChild($5)->appendChildren($13));
-    $1->appendChild($14);
+    $1->appendChild(NNEW(n_CATCH_LIST)->appendChildren($5));
+    $1->appendChild($6);
     $$ = NNEW(n_STATEMENT)->appendChild($1);

(note: PHP checks for presence of at least one catch or finally at runtime, it seems - the syntax allows for try {} only)

@epriestley: could you take a look at the comment above? I'd push out a diff but for some reason the values of the tokens change when I rebuild xhpast.

Mnkras added a subscriber: Mnkras.Apr 16 2015, 4:38 PM
Korvin added a subscriber: Korvin.Apr 16 2015, 4:48 PM

Missing finally support is now starting to affect a large portion of my colleagues.

I'm about to start implement full dereferencing support. Cases that are missing now:


… and probably more. Dereferencing is now implemented via a whole number of separate parser rules that (according to PHP's own grammar) can be combined in a simpler rule: dereferencable '[' optional_expr ']'.

To make everyone's life easier, I'd like to split this up in a few separate changes:

  • Remove the grammar restrictions for constant definitions (private $foo = foo(); is permitted by PHP grammar and raises a runtime error)
  • Bring definitions for scalars in line with the PHP grammar
  • Introduce dereference support.

@epriestley Does this sound reasonable? I do not suspect that this will break any standing tests.

That seems fine to me.

@richardvanvelzen / @joshuaspence -

Changes in D17802, D17803, D17816 and D17817 all seem generally fine to me from cursory inspection. If D17819 seems sane for fixing the "huge blobs of unstable JSON" problem, let's stick them all on that, I'll take a closer look at the nodes, and we can get everything in? Good luck racing each other on minor version numbers.

Also, just so I don't completely forget about it:

  • D17820 likely identified a bug exposed by the php-traits test case -- we end up with a node with no type.