Page MenuHomePhabricator

Add a `pregsprintf` function
ClosedPublic

Authored by joshuaspence on Apr 7 2015, 8:02 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 6, 8:46 AM
Unknown Object (File)
Fri, Dec 20, 12:31 PM
Unknown Object (File)
Wed, Dec 11, 5:11 AM
Unknown Object (File)
Nov 26 2024, 11:34 AM
Unknown Object (File)
Nov 26 2024, 11:34 AM
Unknown Object (File)
Nov 26 2024, 11:33 AM
Unknown Object (File)
Nov 25 2024, 2:01 PM
Unknown Object (File)
Nov 22 2024, 10:39 AM
Subscribers

Details

Diff Detail

Repository
rPHU libphutil
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

joshuaspence retitled this revision from to Add a `pregsprintf` function.
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.
epriestley edited edge metadata.

Two suggestions:

First, I think it should also be impossible (or, at least, very difficult) to misuse, but currently relies on knowing that the delimiter is "@". I think we can make it stronger like this:

pregsprintf($pattern, $flags, $arg, $arg, ...);

So you call:

pregsprintf('^abc%sxyz%R$', 'siU', $string, $regexp);

...and it returns:

$pattern = xsprintf('xpsprintf_regex', $pattern, $args);
return $delim.$pattern.$delim.$flags;

Then, the caller doesn't have to know that "@" is magical (but see below for selection of a delimiter).

Second, I expect to get the wrong result for this, as written:

pregsprintf('%R', "\@");

That will be escaped into \\@, I think, which will terminate the pattern and break the regexp, somewhat-plausibly in a dangerous way.

Broadly, there are two possible levels of bad input here:

  • patterns which are well-intentioned but badly constructed;
  • patterns which are maliciously constructed and trying to use the e flag to run code.

As written, this is probably fine against the first class of input (where failure just produces bugs), but possibly not against the second class of input (where failure might be a security vulnerability). It would be nice to protect against even malicious inputs.

One possible approach for that which doesn't require a lot of logic is:

  • Use chr(7) as a delimiter (this is a valid delimiter).
  • If the subpattern contains chr(7), throw ("BEL must be escaped as \a").

(I chose BEL just because it's very unlikely to appear unescaped in any subpattern -- even in a character set range -- and has a convenient escape. Although maybe some other control character is a better choice since BEL might make using tail on error logs a bit of a mess.)

This is cheating, but should be safe, and this seems like a very small correctness price to pay for safety. Or, put another way, making the method 100% correct and 100% safe is desirable but hard (you have to deal with a lot of escaping stuff). Cheating with BEL makes it 99.99% correct and 100% safe. That's satisfactory to me until we hit some case where we care about 100% correctness.

The same rules also need to be applied to the $pattern template before it is printed. If we cheat with BEL, this is pretty easy.


Technically, even with all that, this does the wrong thing still, I think:

pregsprintf('\\%s', '@');

This is theoretically an issue in all xsprintf() calls though, and continuing to trust the pattern not to do this seems reasonable for now. It would be nice to actively prevent this eventually, though.

This revision now requires changes to proceed.Apr 7 2015, 1:41 PM
joshuaspence edited edge metadata.

Changes as requested

epriestley edited edge metadata.
This revision is now accepted and ready to land.Apr 8 2015, 7:28 PM
This revision was automatically updated to reflect the committed changes.