Page MenuHomePhabricator

During startup, guarantee "(string)1.23" is "1.23", not "1,23"
Open, NormalPublic

Description

See T7339. A specific, chaotic locale behavior is that (string)1.23 is 1,23 in some locales. This can create issues when PHP code interacts with essentially any other system. For example:

$ cat ltest.php 
<?php

setlocale(LC_NUMERIC, 'de_DE');
echo http_build_query(array('n' => 1.23));
echo "\n";
$ php -f ltest.php 
n=1,23

So if you're building an API call this way, the call behavior will change if the locale changes.

It also affects the behavior of (double)(string)$x: the result of that expression depends on the current locale (for $x = 1.23, the result is 1.23 in locales that use decimal point as a separator, but the result is 1 in locales that use a comma).

In T7339, the particular point where this broke was INSERT INTO ... VALUES ('1,23'). In sensibly strict modes, MySQL raises a truncation warning about this. An argument can be made that the query layer's use of %ns ("nullable string") to build all the values could be improved, but this is still a bizarre behavior.

Display code already uses PhutilNumber and related format-aware code to format numbers correctly for display, so our choice of what (string) should do at runtime has (in theory) no impact on how users see numbers printed. In practice, since we provide a small set of locales and don't include rich number formatting locale data there may be some actual impact, but this can be remedied by updating locale definitions.

During other startup checks, we should guarantee that (string)1.23 is "1.23" and refuse to run if we can't satisfy this guarantee. T7339 has more nuanced variations on this and it isn't a complete description of desired locale behavior, but this should be a minimum property of the runtime environment we provide.

Event Timeline

epriestley triaged this task as Normal priority.Mar 6 2020, 5:46 PM
epriestley created this task.
cmmata added a subscriber: cmmata.Mar 6 2020, 6:04 PM

I made some effort to find an approach here, but I think it needs to wait until Toolsets define our config/caching better. In particular, guaranteeing this naively in arc requires running a subprocess (to execute locale -a, to enumerate available locales on the system) and possibly functionally-testing hundreds of locales (or, in the general case, an arbitrarily enormous number of locales). The functional test for UTF8 behavior requires running another subprocess, to echo 🐑.

The C locale does not reliably have usable UTF8 behavior (on Ubuntu 14, it fails; the C.UTF-8 locale has satisfactory behavior).

Although this isn't incredibly expensive, it's expensive enough that I don't want to put this on the mandatory path during arc startup for every request, and we currently have no infrastructure to let me cache the output or reliably read configuration to override it.

See also T13209.

The Windows equivalent of this seems to approximately be chcp ("change codepage").