Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples

The Worldwide Lexicon: Adding Collaborative Translation to Your Site

by Brian McConnell

The Worldwide Lexicon, an open source project I have led for several years, recently published a suite of collaborative translation tools that enable you or your readers to create, edit and share translations to and from almost any human language. We have been testing the system throughout the summer, and in this article I explain how you can use WWL to make your site or content accessible in many languages.

WWL applies the concept of user-generated content, similar to systems like Wikipedia, to the task of creating, improving, and sharing translations for texts. The system does not use machine translation, but instead relies on people. Human language demands people to comprehend it, and while machine translation has improved, even accurate machine translations are not usually enjoyable to read. The key insight in WWL is that a web site that has an audience will have bilingual readers, often without knowing it. These people are both interested in and more knowledgeable about the subject matter, so some of them will be willing to translate it, whether for goodwill or for money. WWL creates a simple way for a web site's readers to contribute, edit and share translations.

We began testing the system with a Word Press plug-in this summer, and have since released PHP libraries, as well as a Firefox extension. More tools are planned for release soon. The system is open source, and we are encouraging developers to embed this process in a wide range of platforms. The ultimate goal is to make collaborative translation a checkbox option on most publishing platforms, so that anyone who wants to be accessible can be. Since the release of the Word Press plug-in, WWL has logged users in 107 countries representing some 50 languages, with over two-thirds of the users coming from outside the United States, suggesting a pent up demand for multilingual publishing tools.

WWL is easy to incorporate into a wide variety of web services. In this article, I'll describe how to use the different tools we've created so far, and how they can be adapted for custom use.

System Overview

WWL is designed as a client-server system that stores translated texts, revision histories, etc., on a central translation server that can deliver translations to a wide variety of client applications, ranging from a browser plug-in, to an extension to a content management system. One of our design goals with this version of WWL was to make it platform neutral, so that it would work well with many different content management systems, including old systems.

In a typical implementation, the translation server will receive a request to display a translation for a document, identified via an MD5 hash derived from the URL. If a translation is available, the translation server sends it back along with metadata. The client application, a Word Press plug-in for example, displays this as an overlay to the source document. Document management services (edits, revision history, etc) are hosted on the translation server, so that client applications can be simple, lightweight, and not require frequent updates. While this approach has some drawbacks, they are offset by the translation server's ability to serve many different frontend applications, content management systems, etc.

Getting Started with Word Press

The best way to demo WWL is to use our Word Press plug-in. This is easy to install and will give you an idea of how WWL works from end to end. Setting up the WP plug-in is simple. Just download the plug-in, extract the files into your Word Press server's plug-ins directory. Then activate the WWL plug-in. There is nothing to configure, just turn it on.

Once the plug-in is active, you should see a list of languages beneath each headline on your WP blog. The list will vary depending on each visitor's location and browser language preferences. If you're visiting from Brazil, for example, Portuguese should show up at the head of the list. If you have Norwegian set as a preferred language, you'll see Norwegian, regardless of where you visit from.

If there are translations for a text, you'll see the first few words of each translation. When you click on a translation, the plug-in will replace the original text with the translated text. If no translation is available, you can 1) contribute one, 2) subscribe to an RSS feed for translations to that language, or 3) fetch a machine translation, if that's possible for the desired language. If you decide to contribute a translation, you'll jump to an editor that displays a text box that floats on top of the original text, making it easy to compose the first translation using a split-screen editor.

Translations are published in several ways: 1) as an overlay to your Word Press blog (the reader does not know the translations reside on another server), 2) via static HTML pages on www.worldwidelexicon.org (for sharing and search engine discovery), and 3) as RSS feeds. Search engine optimization is an important aspect of the project, as the system makes original and translated texts searchable. Thus, someone may find a document that was originally published in another language when they do a search in theirs.

The Word Press plug-in, while it's easy to use, is designed for authors, not programmers, and it does not have a lot of access control or workflow management features. If you want to customize WWL or incorporate it into another system, you'll want to look at our PHP libraries. Most of WWL is PHP based, and as such, is easy to adapt to other systems.

PHP Libraries

We have published a PHP tool that presents a similar user interface to the Word Press plug-in. This can be embedded in any PHP-based site, and as the source is available, it is easy to customize the appearance and behavior of this tool. From the user's standpoint, it works just like the Word Press plug-in, but as a developer you can customize its appearance, add access control features, etc.

Integrating it into a site is straightforward. The PHP script expects a small set of variables that contain the source document title, language code, and source text, making it easy work to fetch this information and feed it back into the hosted WWL service. Visit blog.worldwidelexicon.org for info.

Hosting Your Own Translation Server

If you don't want to use worldwidelexicon.org servers to host your translated texts, you can host your own, either by mimicking the client/server communication interface we've implemented, or by hosting a copy of our translation server. We have not yet released this as a turnkey system, as we are still making frequent changes to the servers, but are glad to provide source on request. The WWL servers are PHP based. If you're knowledgeable about PHP and MySQL, you should have an easy enough time with this.

Later this fall, we will publish a turnkey package that allows users to run a WWL translation management system out of the box. Users will be able to run this as is, if they simply want to host documents on their own system, but also to customize the system to add more access control or workflow management features, localize the user interface, etc. One of the reasons we decided early on to make WWL an open source project was to enable developers to embed it in a wide variety of existing systems. We're glad to host translations for users, but we also want to see this become part of both mainstream and niche publishing platforms. We do this in our spare time, and as such, don't have time to build a version of WWL for every platform and locale in existence.

Firefox Plug-in

For independent translators, we recently published a Firefox plug-in that enables users to create and share translations for any web page that has a permanent URL. This works like a transparent overlay. If you are reading a page you'd like to translate for a friend, or for the world at large, you simply right click on a globe icon in the lower right corner of your browser window, type your translation, and publish it. The translation will appear to anyone else who visits this page, and will also be republished for search engines, RSS readers, etc. The Firefox plug-in is still fairly new, and will be coupled with tools for translation communities that enable roaming groups of bilingual Internet users to discover, translate, and share interesting content.

Results So Far

We have been testing WWL throughout the summer, primarily via Word Press blogs, and have noticed some interesting trends so far. While WWL is still fairly low profile, it has attracted users worldwide, over 100 countries representing 50 languages, with the majority of users coming from outside the United States.

Although the project is still at an experimental state, we have noticed a few early trends that may point toward future uses, among them:

Brian McConnell is an inventor, author, and serial telecom entrepreneur. He has founded three telecom startups since moving to California. The most recent, Open Communication Systems, designs cutting-edge telecom applications based on open standards telephony technology.

Return to Emerging Telephony.

Copyright © 2009 O'Reilly Media, Inc.