This doesn't actually expose them in the AST.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers - Commits
- rPHUb20064a108f7: Add basic support for return type hints to xhpast
Added test cases for all available added variants.
Diff Detail
- Repository
- rPHU libphutil
- Branch
- xhpast-return-types
- Lint
Lint Passed - Unit
Tests Passed - Build Status
Buildable 11967 Build 15049: arc lint + arc unit
Event Timeline
One particular issue is that there is no room in the AST to add the return types. I've added TODOs at the relevant locations.
We've moved to fully utilizing PHP 7 features, and specifically miss this (regular scalar type hints are covered by normal type hints already)
Is there anything specific I can do to make this land-able?
I think the pathway forward is probably just to break the AST format, and maybe bump this across a bigger version upgrade (maybe to 7.0.0)? It would be good to do any other AST changes at the same time so they don't get spread out over months and months, not sure if we really have any.
So to start with:
- Expose return types in the AST.
- Run arc unit --everything in arc.
- See how bad that looks.
It's probably not particularly difficult to fix all the callsites. If that looks manageable, make some sort of effort to review what else we're missing for PHP7 that might trigger positional shifts in the AST and we can plan around that.
We could also consider layering some sort of key-based aliases on top of the positional AST, so you'd $node->getChildWithName('body') instead of $node->getChildAtPosition(5) and callers could survive changes like this, but I think these kinds of changes happen once every four years or so and it's probably not worthwhile. It wouldn't help us here, only next time. This just breaks existing code in somewhat-hard-to-understand ways and it's unfortunate that the API isn't more resistant to that.
Maybe worth thinking about anyway since we could implement it by just adding a big map like this somewhere, I think:
array( 'n_NODE_TYPE' => array( 'attributes' => 1, 'reference' => 2, 'name' => 3, 'parameters' => 4, ... ), ... )
That would give us a more readable way to walk the AST today and make the eventual PHP8 upgrade smoother, or at least make any AST changes break in obvious, easy-to-understand ways.
I think the only other noteworthy change in PHP 7 that needs an additional node is variadic arguments:
function x(...$args) { echo implode(', ', $args); } x(1, 2, 3); // prints "1, 2, 3"
Would you prefer such an addition to happen in this revision (and subsequently rename to "Add PHP 7 syntax to xhpast) or should it just be a follow-up?
If it's not too much trouble, let's just do it here? Happy to take it as a followup if that's easier, too, though.
Possibly we can squeeze ... into the same position as &, since the two are presumably exclusive? Not sure if that's desirable or ...&$x is valid.
I was hoping the same, [[ https://3v4l.org/0Nl5s | but &...$x is valid ]].
A bit short on time now, but I'll probably have it in there tomorrow. Thanks for the feedback!
Related: it may be worth looking into leveraging the native AST going forward instead of trying to keep xhpast in sync with PHP with every new minor release. I'd wager it wouldn't take too much voodoo to convert between them so that the existing linter rules continue to work.
I only looked at it briefly, but it looks like it doesn't capture whitespace/tokens in the AST, like most other ASTs. I might have missed something, but if I didn't this probably makes it unsuitable for any sort of lint operation: you may be able to detect some issues, but you have no chance of correcting them if the AST has discarded nonsemantic tokens.
As a basic starting point, I think there exists no unast() function which will reproduce the input file:
<?php $ast = ast\parse_file('file.php'); echo unast($ast); // Desired: unast() reproduces the content of `file.php`.
XHPAST has an very unusual input-preseving AST structure, specifically to support lint operations. Even though it's ultimately quite simple, it took me quite a few tries to arrive at it and it's more work to build than a normal AST, so I'd expect ASTs which aren't focused on lint to be unlikely to produce a structure which is usable in this role by luck. At least so far, I haven't seen other ASTs adopting anything similar.
(That sounded really self-congraulatory, but it took me a lot of work to come up with the node tree + token list approach!)
:D
But that is a very good point that I had completely overlooked. For better or worse, the linked ast implementation can be used fairly easily to turn parsed source code into something perfectly following a spec, but doesn't give the granular approach that arc lint uses in order to only deal with changed lines. And it really is something I dearly miss when running other linters with arc lint (pep8 being the one I have the most experience with)
Still, as an early adopter of new PHP features, it's fairly frustrating to (at best) have a lag time before xhpast doesn't just throw up all over the place. I'd love to see (and would help build) something where it's easier to keep zend_language_parser.y and xhpast/parser.y in sync without all the manual fiddling. I looked into the variadics you mentioned once I spotted this diff and it gets fairly gross (between &... and being used in both function calls and definitions). A short list syntax is almost certainly going to pass for 7.1 which will probably also break in a not so fun way.
Very much open to further discussion on this elsewhere if we don't want to pollute this review with it
Add variadic parameter support
Instead of adding another node, do the same trick as for by-ref parameters.
Note that I've called the node n_UNPACK to mirror unpack expressions (array_merge(...$arrays)). Easy to change if required.
That tree construction seems reasonable to me. Mirroring the n_UNPACK structure makes sense to me, too. Thanks!
Awesome! I'll follow up with some other features next week. Those should be a lot less tricky.
I'd love to see (and would help build) something where it's easier to keep zend_language_parser.y and xhpast/parser.y in sync without all the manual fiddling.
I think keeping XHPAST up to date is more blocked on upstream priorities than the mechanical difficulty of the fiddling today -- we still use stone age PHP, so we have weak incentives to support all the fancy new features.
Even if all the AST changes could be accomplished automatically via script (and I don't think they can be, because the mainline PHP parser discards tokens and nodes we need, I believe?) someone would still have to go run the script, test the new binary, bump the version, pick some new node names, organize the tree structure, change anything affected in first-party linters, write a changelog about what's different now, etc. I think the parser fiddling is usually not hugely difficult, and suspect we'd be reducing the work by something like ~50% by magic removal of any need to touch parser.y, not ~95%, and new stuff probably wouldn't really happen much faster as a result.
Two possible smallish changes that would make this a little easier to work with:
- The unit test format (giant blobs of JSON) currently in use is only slightly better than nothing. In this change, I have virtually zero ability to visually assess how these changes actually affected the tests, for example. Converting this into some kind of readable text file that looks more like the output of the /xhpast/ tool in the web UI would make it easier to verify changes by examining the tests.
- Adding a big map of the tree schema and allowing nodes to be addressed by key (getChildByKey('parameters'), per above, as an alternative to getChildByIndex(3)) would make callers more resistant to structural changes.
These are both probably a lot of work relative to their benefit, but computers can do them and I'm not sure they can do convert-ast.php --in php8.y --out xhpast-parser.y.
But overall, the shortest path to this stuff is still just "fiddle with parser.y, send me a diff", when you want something to work, I think. This fixes stuff much faster than the PHP upstream introduces it, we just spend very little time doing it since it's hard for anyone to prioritize.