Page MenuHomePhabricator

Support PHP5.4+ syntax in XHPAST
Closed, ResolvedPublic


I'm aware that this isn't on the roadmap (T4157, et al), but I'd find it incredibly helpful as we liberally use 5.4 [1] syntax at WePay, and attempting to roll our own parse tree using token_get_all would be... not a great idea. This makes both linters and symbol importing tedious at best (non-reusable state machines, manual brace matching, and plenty of other not-fun-stuff). Granted the symbol stuff still gets a little confused when it comes to namespaces, but that's unrelated to this.

I'm quite happy to take this on myself if will actually be accepted, but I don't want to waste time trying if it's something that won't actually be supported, i.e. D1207, so this mostly serves to gauge interest (is there a better forum for this?)

[1] short-array syntax and traits are problems; it seems to handle namespaces and closures just fine. We're not on 5.5 yet due to an apc change, so I'll leave yield and finally alone for now.

Revisions and Commits

rPHU libphutil
Needs Review
Needs Review
Needs Review
Needs Review
Changes Planned
rARC Arcanist
Needs Revision

Event Timeline

Firehed claimed this task.
Firehed raised the priority of this task from to Needs Triage.
Firehed updated the task description. (Show Details)
Firehed added a project: XHPAST.
Firehed added subscribers: Firehed, epriestley.

We're happy to put this in the upstream, it's just very complex to implement. Some general thoughts:

  • There's a known bug with empty HEREDOC with a partial fix in T3512.
  • It would be nice to get at least some basic test coverage on the AST while we're in here.
  • Since we've moved away from XHP, moving the name toward "PHPAST" is ideally desirable.
  • We may need grammar changes to support Class::{expr}()?
  • Binary literal is new, and I think has no syntax support.
  • callable is now a typehint (probably not relevant)?
  • Traits syntax.
  • Finally syntax.
  • Yield syntax.
  • Short array syntax.
  • Make sure we're actually parsing namespaces, not completely faking it.
  • foreach as list(...) syntax, old grammar may restrict possible expressions.
  • [1, 2][0] and "asdf"[0] syntax (these were probably runtime errors).
  • The AST is fast enough for use, but isn't fast. The core parsing task is fast, but much of the time is spent marshalling/unmarshalling trees to/from JSON and searching trees in PHP. It would be nice to provide an optional PHP extension to improve performance of one or both of these tasks (T2312).
  • We have a parsing bug in that we do not correctly parse <?php "{$v[""]}"; (see T635).
  • We have a parsing bug in that we do not correctly parse ?><span><?=1,2;?></span><?php (see T635).
  • Generally, we fake our way through expression parsing inside of strings because we don't use these syntaxes and don't have applications which need them to parse, but should probably do it correctly.

I'll see about making some headway here. Plan of attack is probably:

  • Add test coverage for all language features, insofar as we reasonably can.
  • Add known-failing tests for all the broken stuff.
  • Add known-failing tests for all the new/missing stuff.
  • Then, align the grammar with PHP 5.5 aggressively.

Stuff not yet covered:

  • ...because my PHP v5.5.0-dev was built before they existed so I need to recompile:
    • Literal dereference syntax
    • foreach as list
    • yield
  • ...other:
    • HEREDOC with EOF
    • string-index-in-string
    • weird span thing

Okay, this diff series does not address:

  • Callable static expressions (C::{$x.$y}()).
  • foreach as list() -- proably easy to add but I forgot about it.
  • Not 100% sure static array indexes parse into the tree reasonably.
  • Heredoc + EOF parses wrong.
  • I didn't fix the two weird edge cases (<?php "{$v[""]}";, ?><span><?=1,2;?></span><?php).
  • Generally, parsing of magic inside string literals isn't as sophisticated as it ideally should be.

There's also related cleanup:

  • The PHPFragmentLexer should recognize new keywords (finally, trait) and probably needs some grammar updates.
  • evalStaticScalar() in PHP probably needs to be updated for 0b1111.

I'm more-or-less not really planning to pursue any of that for now since I think most of it isn't especially valuable, but yell at me if you need it. Also if D8033, etc., did anything weird in building the new parse trees -- I haven't actually used them to write any rules, so I'm not sure if I, e.g., dropped parts of the tree in constructing it. Cursory inspection suggested they're OK, but it's hard to get right by just looking it over.

epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.
epriestley edited this Maniphest Task.

Okay, let me know how much luck you have with stuff in HEAD? This probably has some remaining rough edges, but I think we're in much better shape than we were yesterday.

This is incredible, thanks! Our devops team is going to comb through all of this and the reviews and see if we need to take a crack at anything else.

Just as an update - so far all of the changes I've tested have been working really well for 5.4 source. The stock symbol generation script seems to not mind files with more modern grammar ($x = [];, trait, use, closures, etc), and the XHPAST linter seems ok too. As you'd expect we have a massive number of files in our codebase seeing that it started close to five years ago so naturally I haven't been able to run it on everything but I'm feeling pretty good about this :)

Awesome! One thing I'm vaguely aware of with the symbol script is that I think it emits an empty symbol for closures, but that should be easy to filter out if you haven't dealt with it already.

Update here - did spot one issue: (new Foo)->bar (added in 5.4) fails:

[2014-02-04 11:03:32] EXCEPTION: (XHPASTSyntaxErrorException) XHPAST Parse Error: syntax error, unexpected T_OBJECT_OPERATOR on line 2531 at [/home/sites/phabricator/libphutil/src/parser/xhpast/api/XHPASTTree.php:62]

The following change fixes the file:

-        return (new Shop_Item_Model())->object_name;
+       $SI = new Shop_Item_Model;
+       return $SI->object_name;

Also, unsurprisingly, this fails too:

$model = 'some_string';
return (new $model)->table_name;

Found another issue with some array dereferencing: if ($state && !isset(self::getSearchStates()[$state])) { still fails. I'll look into that one a bit later - once I saw what the zend grammar was doing... ick.

This particular construction isn't supported by our version of the grammar, but is part of Zend:

<?php $o->m()[0];

This causes php54.lint-test to fail in arcanist/, but has no other effects, so I'll probably won't get around to fixing it for a bit unless someone hits a real issue with it.

I just ran into that one in practice today, so I'll probably take a crack at another grammar update to handle it in the next 48 hours or so.

try {
} finally {

Catches are optional when you have a while. This small diff fixes that:

diff --git a/support/xhpast/parser.y b/support/xhpast/parser.y
index 7047dab..817beff 100644
--- a/support/xhpast/parser.y
+++ b/support/xhpast/parser.y
@@ -561,17 +561,12 @@ unticked_statement:
     $$ = NNEW(n_STATEMENT)->appendChild(NNEW(n_EMPTY));
     NMORE($$, $1);
-| T_TRY '{' inner_statement_list '}' T_CATCH '(' fully_qualified_class_name T_VARIABLE ')' '{' inner_statement_list '}' additional_catches finally_statement {
+| T_TRY '{' inner_statement_list '}' additional_catches finally_statement {
     NTYPE($1, n_TRY);
     $1->appendChild(NEXPAND($2, $3, $4));
-    NTYPE($5, n_CATCH);
-    $5->appendChild($7);
-    $5->appendChild(NTYPE($8, n_VARIABLE));
-    $5->appendChild(NEXPAND($10, $11, $12));
-    $1->appendChild(NNEW(n_CATCH_LIST)->appendChild($5)->appendChildren($13));
-    $1->appendChild($14);
+    $1->appendChild(NNEW(n_CATCH_LIST)->appendChildren($5));
+    $1->appendChild($6);
     $$ = NNEW(n_STATEMENT)->appendChild($1);

(note: PHP checks for presence of at least one catch or finally at runtime, it seems - the syntax allows for try {} only)

@epriestley: could you take a look at the comment above? I'd push out a diff but for some reason the values of the tokens change when I rebuild xhpast.

Missing finally support is now starting to affect a large portion of my colleagues.

I'm about to start implement full dereferencing support. Cases that are missing now:


… and probably more. Dereferencing is now implemented via a whole number of separate parser rules that (according to PHP's own grammar) can be combined in a simpler rule: dereferencable '[' optional_expr ']'.

To make everyone's life easier, I'd like to split this up in a few separate changes:

  • Remove the grammar restrictions for constant definitions (private $foo = foo(); is permitted by PHP grammar and raises a runtime error)
  • Bring definitions for scalars in line with the PHP grammar
  • Introduce dereference support.

@epriestley Does this sound reasonable? I do not suspect that this will break any standing tests.

@richardvanvelzen / @joshuaspence -

Changes in D17802, D17803, D17816 and D17817 all seem generally fine to me from cursory inspection. If D17819 seems sane for fixing the "huge blobs of unstable JSON" problem, let's stick them all on that, I'll take a closer look at the nodes, and we can get everything in? Good luck racing each other on minor version numbers.

Also, just so I don't completely forget about it:

  • D17820 likely identified a bug exposed by the php-traits test case -- we end up with a node with no type.

Much of this is supported now; T13492 has some general followup.