Although we haven't had much in the way of requests for internationalization, it would be good to come up with a plan of action and maybe lay in some groundwork for it since I think we'll want to offer it eventually. At the very least, we effectively support two text corpuses right now anyway, "English" and "English (Serious Business)". It would be nice to support these formally through a translation system, so it can be a per-user setting instead of a global setting and I can answer claims that some of the buttons labeled "Submit" aren't serious enough more directly.
General context:
- I think we have sufficiently few text strings and a sufficiently technical audience that we can do contributor-translation rather than user-translation.
- I don't think we need description/context strings, since the audience is technical and the source is available -- we can do string extraction reliably via XHPAST and annotate string tables with file/line information for translators to provide more context. For example, one could imagine the primary source of translation data being a text file that looks like this:
~ // Translate the 'English' string into 'English (Serious Business)' on the line underneath. // This string is used in these places: // src/applications/differential/view/addcomment/DifferentialAddCommentView.php:112 ~ Clowncopterize ~ Submit ~
- I don't think we need fbtc() (common strings) since it was mostly a community-sanity thing?
- We can probably push RTL languages to v2?
- We should be well-situated technically to implement this since we do everything in utf8 already and mostly use utf8-aware methods to manipulate text. The View classes should generally handle rendering text-like objects gracefully. There may be some exceptions with collation, but I think this is generally not too important.
Stuff I'm less sure about:
- Do we need all the magic around people and numbers that Facebook had? Can we accept lower-quality translations for the gender/plural breakouts? As I understand it, a made-up example is a string like "{user} accepted this revision.", which might use a different verb for "accepted" in some languages depending on the gender of "{user}". Maybe we can just make sure the pht()-style function accepts the most-general sorts of objects (e.g., PhabricatorUser, PhabricatorObjectHandle) and deal with this later/never.
- I think we can use the English text as the translation key in all cases? Are there cases where "Cancel" should translate into different things in different languages? The UI is generally so straightforward that I think we'll have fairly few cases of this. We can fake it by writing the source in some proto-language and translating some strings in English to drop contextual information. Or we could actually implement descriptions.
- Do we need to support any HTML? Really hoping we can get away without this, or with a complete punt like supporting only a tiny micro-language with bold.
- Do we actually need to do this? We have a fair number of non-English tweets about Phabricator and seem to be getting some adoption in Europe, but have been able to fake our way through this issue so far. Programming languages also tend to be implemented in English and some things won't reasonably be translatable (like URIs and the API), so it's not clear there's a huge amount of motivation for this.
Rough implementation plan:
- Write some pht()-style function and wrap all user-displayed text in it, pht("{user} accepted this revision.", $user);.
- This returns some AphrontTranslatedStringView object which behaves like a normal view and can be magicked up later as necessary.
- Use XHPAST to extract all the strings into some kind of translation file (this file should have the most-human-readable format possible).
- Build the human-readable file into a machine-readable file.
- At runtime, check the machine-readable version for strings.