Page MenuHomePhabricator

Fix ArcanistFormattedStringXHPASTLinterRule for PHP 8
ClosedPublic

Authored by jrtc27 on Jan 11 2021, 2:04 AM.

Details

Summary

PHP 8's sprintf raises a ValueError when encountering unknown format
specifiers (previously it would eat the argument and print nothing), so
linting format strings like %Ls dies with an uncaught ValueError.

Fix this by using a custom callback during linting to turn all format
specifiers into %s and replace the dummy null argument with the original
format specifier, ensuring we always end up providing valid input to the
sprintf at the end. This has the nice property that the output of the
call to xsprintf is the original format string, though any
transformation into valid input would do.

Test Plan

Ran arc lint

Diff Detail

Repository
rARC Arcanist
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

In the case of hypothetical zsprintf("A %XYZ B", ...), where %XYZ is some multi-character conversion like %Ls, the real zsprintf() would call sprintf("A %s B") internally, while this will call sprintf("A %sYZ B") -- that is, this callback can't know that %XYZ is a single conversion, rather than %X + YZ.

I can't think of any problems this will cause today or any theoretical problems it will cause in the future, and there's no easy way to future-proof it anyway, so I think this is the most reasonable fix.

This revision is now accepted and ready to land.Jan 11 2021, 3:29 AM

In the case of hypothetical zsprintf("A %XYZ B", ...), where %XYZ is some multi-character conversion like %Ls, the real zsprintf() would call sprintf("A %s B") internally, while this will call sprintf("A %sYZ B") -- that is, this callback can't know that %XYZ is a single conversion, rather than %X + YZ.

I can't think of any problems this will cause today or any theoretical problems it will cause in the future, and there's no easy way to future-proof it anyway, so I think this is the most reasonable fix.

That's what would happen before anyway, the normal sprintf when linting would eat the %X and a single argument, leaving a literal YZ behind because it too does not have that knowledge. That is:

Before: Calls xsprintf(null, null, array("A %XYZ B", null)) which calls sprintf("A %XYZ B", null) which results in "A YZ B"
After: Calls xsprintf('...Callback', null, array("A %XYZ B", null)) which calls sprintf("A %sYZ B", "%X") which results in "A %sYZ B"

Well, actually, %X is valid and prints an uppercase hex literal, but let's pretend it's not for this example.

If you wanted to preserve the old behaviour whilst still making it work on PHP 8 you could leave the value as null or make it the empty string; I just figured trying to print null as a string might start becoming an error some day (and I'm somewhat surprised they didn't go as far as making that a ValueError in PHP 8...), and passing the original specifier in for the argument is useful when debugging to see what the original format string was.

Oh, I'm totally onboard with this change -- I think this behavior is generally better than the old behavior and the property of getting the input string out is useful/clever, just trying to save future-me a minute or two if this breaks and I end up here via git blame.

Hm, Harbormaster is failing with:

Fatal error: Call to undefined function ArcanistFormattedStringXHPASTLinterRule::processXsprintfCallback() in /core/data/drydock/workingcopy-70/repo/arcanist/src/xsprintf/xsprintf.php on line 70

Does referencing static functions not work in old PHP versions or something? I tested with 7.4 and it was fine.

Ah, I think the issue is that xsprintf() internally does this :

xsprintf.php:70
$callback($userdata, $pattern, $pos, $argv[$arg], $len);

I think it's doing that only because I wrote it a very long time ago (or perhaps because it saved a few microseconds on some synthetic benchmark at Facebook -- this pathway was a moderate hotspot at one point long ago), not for any really valid reason. Try replacing that with this?

call_user_func(
  $callback,
  $userdata,
  $pattern,
  $pos,
  $argv[$arg],
  $len);

Purely for consistency, the rest of the codebase uses the array('X', 'y') form for object callbacks instead of X::y, but both forms should have exactly the same behavior when passed to call_user_func(...).

Here's some evidence to support that theory:

https://3v4l.org/0dsKU

The build server (saux001.phacility.net) is currently running PHP 5.5.9.

Oh, I think the reason to use $callback(...) is the behavior of the reference parameters. Yikes.

https://3v4l.org/PbM0Z

It's reasonable to define processXsprintfCallback() as xsprintf_test_callback() or something in that vein in xsprintf.php to avoid this whole mess. I'll document what's going on with $callback.

https://3v4l.org/eYFoc does work for 5.2.2+ to pass by reference, provided which arguments are passed by reference is part of the contract. But probably best to keep it simple and define xsprintf_test_callback as you say.

Oh, or the array('C', 'm') version appears to work properly when invoked as $callback in all versions of PHP since PHP 5.4:

https://3v4l.org/EmVRu

The formal minimum version is PHP 5.5 now (since T13492) so this would also be fine.