Page MenuHomePhabricator

Add a `phutil_person()` wrapper for the string extractor
ClosedPublic

Authored by epriestley on Nov 8 2016, 3:31 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 19, 8:07 AM
Unknown Object (File)
Tue, Dec 17, 8:42 PM
Unknown Object (File)
Fri, Dec 13, 8:56 PM
Unknown Object (File)
Sat, Dec 7, 3:35 PM
Unknown Object (File)
Sat, Dec 7, 3:35 PM
Unknown Object (File)
Sat, Dec 7, 3:35 PM
Unknown Object (File)
Thu, Nov 28, 3:00 AM
Unknown Object (File)
Wed, Nov 27, 3:52 AM
Subscribers

Details

Summary

Ref T5267. When extracting strings, we can currently statically infer information about plurality (via PhutilNumber and phutil_count) but not about gendered variables.

This adds a marker for gendered variables. Usage will be:

pht(
  '%s stood silently.',
  phutil_person($variable));

The string extractor can then annotate this as gendered during extraction.

Also, use more consistent terms for grammatical gender.

Test Plan

arc unit, see next diff.

Diff Detail

Repository
rPHU libphutil
Branch
i18n2
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 14399
Build 18755: Run Core Tests
Build 18754: arc lint + arc unit

Event Timeline

epriestley retitled this revision from to Add a `phutil_person()` wrapper for the string extractor.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.Nov 8 2016, 3:36 PM
This revision was automatically updated to reflect the committed changes.

Wouldn't it be better to add some sort of %g format string? If I understand correctly, the current approach doesn't work in cases like this:

$x = new PhutilPerson();
echo pht('Hello, %s', $x);

I guess the problem with %g would be the need for some translation layer which maps %g to %s, because presumably TranslateWiki doesn't understand %g.

Here are the approaches I can come up with:

  1. Annotate types in the string itself (%g).
  2. Annotate types on the parameter.
    • Use phutil_person().
    • Use some /* @person */ annotation.
    • Use a variable naming convention like $person_.
  3. Annotate types in separate metadata: directly write a map of "Hello, %s" -> array('person') next to the translation files.
    • Build this map manually.
    • Build this map by sampling which arguments are Person objects (via D16821) at runtime and compiling it through some offline process.

These approaches are not mutually exclusive, and we could theoretically implement all of them at the same time.

I think all approaches in category (1) create an immediate performance cost because we can't pass the strings directly to sprintf() anymore. We have to mangle the string before we can pass it to sprintf(), or we have to implement our own sprintf(). Either of these will be slower -- and, potentially, much slower -- than what we do today, as PHP tends to perform more poorly on these types of tasks than C does. There are some ways we can mitigate this (PHP extensions, prebuilding a map of all the manglings) but they seem not-great and very complex to me.

I think approaches in category (2) are slightly more annoying to write, but that's about the extent of the downside.

I think approaches in category (3) are more complex to build initially, and if the map is manual to some degree I'd guess it probably has a high long-term maintenance cost.

Overall, it isn't really clear how valuable this data is. I think phutil_person() is the shortest reasonable pathway to being able to export the data in some form without immediately paying a big performance or complexity price. We can see how well translators do with and without it and get a better idea of how much effort we should be spending on more exhaustive annotations.

If this data is valuable, it would almost certainly make sense to fill it in from runtime sampling, at least in the short term, since we plan to do that sampling anyway to collect frequency data. That might be all we need to do, but I think the cost to get a minimum version of that is higher than the cost to implement a function that does nothing and add a check for it in the extractor.

We could possibly do %g stuff eventually, but I think the instant we do that we just wrote a big performance regression, and I want to be more confident that we're getting a ton of value out of it before opening up that can of worms.