The Social Aspects of Internationalization

by Adam Trachtenberg


In my article Internationalization and Localization with PHP, I outline an all PHP method of adding multi-language support to web sites. What I didn't discuss, however, is that the technical aspect of translation support is only part of the process. There's also the human aspect -- going back and forth with your team of translators to get the translated text and double-checking to make sure it accurately conveys the original message.




A translator doesn't want to wade through pages of code just to translate your phrases. It's a pain in the ass; plus, it's possible they'll accidently insert a stray character into a line and break the site. In the article, in order to keep the logic behind the code clear, all the classes appear in the same page as the document.




But, unless your site is only one page, this isn't a good idea. Instead, you should break each class out to a separate file and include them at the top of the page (maybe even using the auto_prepend configuration directive). This allows you to pass back-and-forth an individual translation file without fear. (Of course, you're using a version control system, like CVS, so it's easy to compare file revisions and back out breakages, right?)




To use the examples from the article, put the base class, pc_MC_Base, in a file with other common classes. Then, the US English and US Spanish classes go in their own files: pc_MC_en_US.php and pc_MC_es_US.php. Here's what belongs in the US Spanish file:




class pc_MC_es_US extends pc_MC_Base {
function pc_MC_es_US() {
$this->lang ='es_US';
$this->messages = array(
'chicken' => 'pollo',
'cow' => 'vaca',
'horse' => 'caballo'
);
}

function i_am_X_years_old($age) {
return "Tengo $age años";
}
}



From this, it's pretty easy for a person to go through the pc_MS_en_US::messages array typing the translated words in as the array keys. This also applies to the methods at the bottom of the class, like i_am_X_years_old().




But, even this can be asking for trouble. On the projects where I used this code, developers would only send the translator the portion of the file with the text and then manually integrate the returned document into the class. This not only simplifies the process, but also allows you to verify everything works as needed, like pluralized words and HTML entities.




While we covered the difficult topic of pluralization in PHP Cookbook, I omitted it from the article. On the face of it, it's easy to think "How hard can it be to pluralize a word, you just add an 's' to the end?" But, what about "fish" or "person"? You need to carve out special cases for those words. And, as if the exceptions in English weren't bad enough, different languages pluralize words using a whole host of rules and exceptions. (And, in the case of languages like Chinese, the single and the pluralized character are one and the same!)




So, instead of forcing all this on your poor translator, it can be easier to have her alert you to this situation and then write the code yourself to make everything work out programmatically. Additionally, don't make the translator type ñ instead of ñ. Do this yourself.




Another alternative is to use a specially formatted plain text file and write code to convert the document into PHP code. This technique is similar to how GNU gettext utility operates. But, if you're going to go that route, I advise actually using gettext itself. PHP supports gettext, so while it adds another dependency to your project, it's worthwhile if you're doing many translations with non-technically savvy people. gettext still exposes translators to problems of escaping out quotation marks and printf() style place holders, but it already supports a method for handling pluralized words.



Share your comments on managing the i18n process: