Page MenuHomePhabricator

Fix destructive representation of "?x=1&x=2" in PhutilURI so it can raise errors immediately
ClosedPublic

Authored by epriestley on Feb 11 2019, 5:17 PM.

Details

Summary

Ref T13250. Currently, PhutilURI uses a map to represent parameters internally. This is destructive for query strings in the form x=1&x=2, and has PHP-flavored behavior for query strings in the form x[]=1&x[]=2.

In the case of T13250, it would be better for PhutilURI to raise an exception earlier (when a nonscalar parameter is set), rather than later (in __toString()). Partly, handling exceptions from __toString() in PHP is a mess, since PHP fatals. Partly, raising errors sooner is generally better.

Since we use query strings in the x[]=... form, and may possibly rely on PhutilURI being able to parse and represent them, we probably can't make PhutilURI reject URIs in this form. The destructive behavior on x=1&x=2 also "feels bad", even if it does not concretely cause any specific problems.

To fix this stuff:

  • Change the internal representation from a map to a list of pairs.
  • In new PhutilURI(X), parse X to a list of pairs.
  • Retain the same semantics for setQueryParam() (replace keys).
  • Add a new appendQueryParam() with new semantics (add keys, even if they are duplicates).
  • Then, add strict error checking on parse/set so errors are raised immediately.

The behavioral changes are:

  • set/append/QueryParam(X, Y) now raises an exception if X or Y are nonscalar, and setQueryParams(M) now raises an exception if any key or value in M is nonscalar. This is good, and the primary goal of the change.
  • ?x=1&x=2 is no longer parsed destructively. This is good.
  • getQueryParams() no longer contains an 'x' => array(1, 2) for ?x[]=1&x[]=2. Instead, it will return 'x[]' => '2'. That is, it has lost the PHP-specific notion of what x[]=1&x[]=2 means. I claim we never use this behavior and never expect PhutilURI to have it, and I think this is generally a good change, although it's possible it breaks something.
    • You can still get the full list with getQueryParamsAsPairList().
    • You can use PhutilQueryStringParser to explicitly say "I want PHP behavior". I don't think we ever want it, and it's good if opting into PHP behavior is explicit.
Test Plan
  • Added unit tests, ran unit tests.
  • Without D20134, loaded one of the affected interfaces. Got a more useful exception sooner (when building the URI) rather than a less useful exception later (when converting the URI into a string).

Diff Detail

Repository
rPHU libphutil
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley created this revision.Feb 11 2019, 5:17 PM
epriestley requested review of this revision.Feb 11 2019, 5:18 PM

Instead, it will return 'x[]' => '2'. That is, it has lost the PHP-specific notion of what x[]=1&x[]=2 means.

This might be a good thing to put in the logs? "Warning: you are potentially relying on PHP-style query parameter semantics and have thus brought shame on your whole family".

I can't find any callsites which appear to expect the x[] behavior.

There are some which expect a single value, like v. We could log or fatal when the query string actually includes two or more copies of this value. However, almost all of these are fluff: for example, if you type http://youtube.com/?v=1&v=2 into Remarkup, we'll hit this condition.

Let me see if I can just remove getQueryParams() completely to moot this. I think it's an inherently bad method because returning a map always means there's a risk of data destruction.

epriestley updated this revision to Diff 48083.Feb 11 2019, 7:19 PM
  • Remove "getQueryParams()".
  • Add "getQueryParamsAsMap()". This method fatals throws if there are duplicates.

We need "getQueryParamsAsMap()" because some API signature methods work like this, roughly:

  • Turn the query parameters into a map.
  • Sort the keys.
  • Build the query string.
  • Hash the query string.

So we need to produce a map. These API calls don't support duplicate keys (their sort order would be ambiguous) so throwing in their presence seems reasonable.

There are no getQueryParams() callsites in arcanist/.

See followup for getQueryParams() callsites in phabricator/.

epriestley added inline comments.Feb 11 2019, 7:21 PM
src/future/aws/PhutilAWSv4Signature.php
248

I tested this by calling bin/provision events in core/, which is just a simple way to make a modern AWS call.

src/future/oauth/PhutilOAuth1Future.php
97

This is only JIRA, Twitter, and Bitbucket. I don't currently have any set up so I didn't actually test this, but it's hard to imagine it breaks anything.

amckinley accepted this revision.Feb 11 2019, 8:13 PM

It has taken some soul searching on my part to be ok with the fact that interleavings of appendQueryParam() and setQueryParam() are effectively changing the "type" of the key in question. I can't decide if it would make sense to change the code to do something like "if you're doing a set and you find some duplicate key names that you're going to blow away, decide that now is a good time to go through the rest of the params and blow away all the other dupes, except for the first one of each". That's probably even even more confusing though. Maybe the class could have a variable that can only be set on construction called $duplicate_keys_allowed or something, that would make append throw on duplicate key names? This is all probably overthinking things and the current implementation is fine, even though it doesn't feel like it's totally eliminating surprises.

src/parser/PhutilURI.php
212

Wait, what does list($key) do? I was expecting this to be a no-op, but it appears to just drop the 2nd element of the array:

$foo = array('bang', 'bar');
list($x) = $foo;

var_dump($x);
// string(4) "bang"

Any reason to prefer this construction over $key = head(phutil_http_parameter_pair($key, ''))?

215

$list_key is more like $idx, right? And should always be naturally-ordered keys?

236

And shouldn't this be redundant unless someone has screwed up and tried to invoke insertQueryParam("key", "value", new Banana())?

241

And if the above line isn't redundant, shouldn't we be doing it after appendQueryParam()?

249–254

Maybe throw if $this->query isn't still equal to array()? There's nothing wrong with implementing it like this, other than potentially surprising a caller who has a URL with existing embedded query params, and then a list of params they intend to overwrite. (Or add a second $replace_existing_params = true argument)?

This revision is now accepted and ready to land.Feb 11 2019, 8:13 PM
epriestley added inline comments.Feb 11 2019, 8:20 PM
src/parser/PhutilURI.php
236

This may have an effect if you have a list like this:

a=a
b=1
a=a
b=2
a=a

...then set(b, 3). It will replace b=1 and leave a hole where b=2 used to be.

241

(It's redundant after appending.)

249–254

This is mostly about preserving the existing behavior, since we have a fair number of callers assuming they can do stuff like setQueryParams(array()) to remove the query parameters.

I agree that set isn't a great name in the abstract, and doesn't make it obvious whether it means "clear, then append" or "overwrite keys without clearing".

I think we can make setQueryParam() throw if the query has multiple parameters with the same name to improve this, but let me try to get a sense of how much of a mess that is.

There are only like 100 combined setQueryParam() + setQueryParams() callsites, so maybe both API methods can just be removed.

I think we can also almost certainly get rid of the null magic and all the complexity around ordering.

epriestley added a comment.EditedFeb 12 2019, 1:13 PM

Since PHI1069 still has a broken-in-stable issue with hovercards that needs to cherry-pick to stable, I think I should be a little less ambitious with this change. I'm going to do this instead:

  • Keep the new getQueryParamsAsMap() method.
  • For compatibility for now, also keep the getQueryParams() method, which works like getQueryParamsAsMap() but does not throw on duplicate keys.
  • Keep the setQueryParam() and setQueryParams() methods and semantics for now, since fixing them is a larger change.

Once stable is actually stable we can plot a slower course away from getQueryParams(), setQueryParam(), and setQueryParams(). The set...() methods have too many callsites to fix/test in one diff anyway so this would more or less have needed to happen no matter what.

epriestley added inline comments.Feb 12 2019, 1:15 PM
src/parser/PhutilURI.php
212

list($key) = $x is the same as $key = head($x), it's just for consistency with list($key, $value) = $x.

215

Yeah.

epriestley updated this revision to Diff 48105.Feb 12 2019, 1:19 PM
  • For now, restore "getQueryParams()" with the nearly-the-same-as-the-old behavior: return as a map, overwriting when we find duplicate keys.
This revision was automatically updated to reflect the committed changes.