Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers - Commits
- rPHUd8a63d8c2907: Add a `pregsprintf` function
Added unit tests.
Diff Detail
- Repository
- rPHU libphutil
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Two suggestions:
First, I think it should also be impossible (or, at least, very difficult) to misuse, but currently relies on knowing that the delimiter is "@". I think we can make it stronger like this:
pregsprintf($pattern, $flags, $arg, $arg, ...);
So you call:
pregsprintf('^abc%sxyz%R$', 'siU', $string, $regexp);
...and it returns:
$pattern = xsprintf('xpsprintf_regex', $pattern, $args); return $delim.$pattern.$delim.$flags;
Then, the caller doesn't have to know that "@" is magical (but see below for selection of a delimiter).
Second, I expect to get the wrong result for this, as written:
pregsprintf('%R', "\@");
That will be escaped into \\@, I think, which will terminate the pattern and break the regexp, somewhat-plausibly in a dangerous way.
Broadly, there are two possible levels of bad input here:
- patterns which are well-intentioned but badly constructed;
- patterns which are maliciously constructed and trying to use the e flag to run code.
As written, this is probably fine against the first class of input (where failure just produces bugs), but possibly not against the second class of input (where failure might be a security vulnerability). It would be nice to protect against even malicious inputs.
One possible approach for that which doesn't require a lot of logic is:
- Use chr(7) as a delimiter (this is a valid delimiter).
- If the subpattern contains chr(7), throw ("BEL must be escaped as \a").
(I chose BEL just because it's very unlikely to appear unescaped in any subpattern -- even in a character set range -- and has a convenient escape. Although maybe some other control character is a better choice since BEL might make using tail on error logs a bit of a mess.)
This is cheating, but should be safe, and this seems like a very small correctness price to pay for safety. Or, put another way, making the method 100% correct and 100% safe is desirable but hard (you have to deal with a lot of escaping stuff). Cheating with BEL makes it 99.99% correct and 100% safe. That's satisfactory to me until we hit some case where we care about 100% correctness.
The same rules also need to be applied to the $pattern template before it is printed. If we cheat with BEL, this is pretty easy.
Technically, even with all that, this does the wrong thing still, I think:
pregsprintf('\\%s', '@');
This is theoretically an issue in all xsprintf() calls though, and continuing to trust the pattern not to do this seems reasonable for now. It would be nice to actively prevent this eventually, though.