ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Andy and David's Top Ten Internationalization Tips

by Andy Deitsch, David Czarnecki
03/06/2001

Software internationalization is a topic best described as the process of engineering a piece of software so that one application binary can accommodate all the languages and regions in which the software will be run. Localization is the process of adapting an internationalized piece of software to a specific language and/or region. It is only in recent years that software companies have realized that to conduct a software business on a global scale, they need to engineer their software from the start with internationalization and localization in mind. Otherwise, companies can be faced with costly reengineering efforts to retrofit existing software packages that need to be made internationalization ready.

If you are writing software in Java, there is a good deal of internationalized code available in the Java platform and APIs. However, you need to be aware of proper usage patterns and common pitfalls to truly make your software internationalized. In Java Internationalization, we try to cover all the aspects of the classes in the Java API related to internationalization. Topics include the use of resource bundles to isolate locale-specific data, formatting messages, and building input methods to enter non-Latin text.

In this article, we share our Top Ten Internationalization Tips. Enjoy!

  1. Don't assume all letters of the alphabet fall between A and Z.

    The code in the following example would be quite acceptable to an American programmer unaware of the global marketplace.

    char c;
    
    // Get user input
    
    if ((c >= 'A' && c <= 'Z') 
           || (c >= 'a' && c <= 'z')) {
      // accept the input
    } else {
      // handle error case
    }
    

    This code would not work correctly, however, if it were used to process Danish text. In addition to the 26 letters that exist in the English alphabet, the Danish alphabet has three additional letters (æ, ø, and å), which appear after the letter Z. As you can imagine, entering text into a system with this piece of code embedded in it might frustrate Danish users. Now imagine what would happen if a Korean user tried to enter data into this system!

    The correct way to handle character input verification in Java is to use the static method Character.isLetter(). The following code shows how this is done:

    char c;
    
    // Get user input
    
    if (Character.isLetter(c)) {
      // accept the input
    } else {
      // handle error case
    }
    
  2. Don't hardcode strings.

    One of the easiest problems to solve with non-localizable software is text-resource externalization. This means that any strings that were previously hardcoded into the application binary are moved into an external collection of elements, collectively known as a resource bundle. This resource bundle contains all the strings that will be displayed to the user. The canonical "Hello World" application best illustrates the problem you'll encounter with hardcoded strings.

    public class HelloWorld {
      public static void main(String [] argv) {
        System.out.println("Hello World!");
      }
    }
    

    As you can see, the string "Hello World!" is hardcoded in the application. To make localization of this simple application possible, we use the java.util.ResourceBundle class. The following program is a rewrite of the "Hello World" application above, with two resource files for English and French:

    HelloWorld.java

    import java.util.*;
    
    public class HelloWorld {
      public static void main(String [] argv) {
        ResourceBundle resources;
        try {
          resources = ResourceBundle.
               getBundle("HelloWorldText");
          System.out.println(
               resources.getString("Hi"));
        } catch (MissingResourceException mre) {
          // An error has occurred loading the resource bundle
        }
      }
    }
    

    HelloWorldText_en.properties

    Hi = Hello World!
    

    HelloWorldText_fr.properties

    Hi = Salut tout le monde!
    

    If you want to run this program using the English language use:

    C:\>java -Duser.language=en HelloWorld
    Hello World!
    

    If you want to run this program using the French language use:

    C:\>java -Duser.language=fr HelloWorld
    Salut tout le monde!
    

    In our book, we devote Chapter 4, "Isolating Locale-Specific Data with Resource Bundles," to describing how to take locale-specific data and externalize it in resource bundles. We also show you how to externalize other data types, such as images or sound files.


  3. Sample Chapter 4, Isolating Locale-Specific Data with Resource Bundles, is available online in PDF format.


  4. String concatenation is evil ... particularly evil!

    A common pitfall when forming user messages is to concatenate fragments of text together to build a complete text string. For example, you might see code as follows:

    int numErrors = 3;
    String filename = "foo";
    StringBuffer str = new StringBuffer();
    
    str.append("There were ");
    str.append(numErrors);
    str.append(" spelling mistakes in file ");
    str.append(filename);
    
    System.out.println(str.toString());
    

    The problem with this approach is that different languages form sentence structure differently. This problem becomes clear when you see the same sentence shown in both English and German below:

    English German
    There were 3 spelling mistakes in file foo. Datei foo enthält 3 Rechtschreibfehler.

    The emphasized pieces of the text indicate the variable arguments that would be dynamically inserted at run-time. Note that the order of the arguments in these two translations is reversed. Concatenating the text fragments together in a hardcoded manner as we did causes all sorts of localization problems down the road.

    Related Reading

    Java Internationalization
    By Andy Deitsch, David Czarnecki

    Instead, you should use the MessageFormat class. This class provides a way of inserting arguments into a string (called a pattern), independent of the order in which the arguments appear to the user. The following code snippet demonstrates this capability:

      String englishMsg = There were {0} spelling mistakes in file {1}.
      String germanMsg = Datei {1} enthält {0} Rechtschreibfehler.
    
      int numErrors = 3;
      String filename = "foo";
    
      Object[] arguments = 
         { new Integer(numErrors), filename };
    
      System.out.println(
         MessageFormat.format(englishMsg, arguments));
      System.out.println(
         MessageFormat.format(germanMsg, arguments));
    

    Typically, you would externalize the pattern strings into localized resource bundles, which would then be loaded dynamically at run-time.

  5. Reading is fundamental.

    Do you need to get international text into your application to be displayed or manipulated? Wait! Did you consider that the character encoding of the text might not be the same as the character encoding of the platform on which your application is running? Fortunately, Java provides the facilities for you to do this correctly. Using the java.io.InputStreamReader class, you can specify the character encoding of the text that you will be reading from a file.

    The following code snippet shows how to use InputStreamReader to accomplish this task. We are assuming that the file is called foo.in and is encoded in Shift-JIS, a character encoding used to encode Japanese text.

    FileInputStream fis = 
       new FileInputStream(new File("foo.in"));
    BufferedReader in =
      new BufferedReader(
      new InputStreamReader(fis, "SJIS"));
    

    Reading is good, but writing is just as fundamental to application development. Writing international text is also covered at the end of Chapter 6, "Character Sets and Unicode." Here we use the java.io.OutputStreamWriter class to write international text with a specific character encoding. We are writing to a file called foo.out in Shift-JIS.

    FileOutputStream fos = 
       new FileOutputStream(new File("foo.out"));
    BufferedWriter out =
       new BufferedWriter(
       new OutputStreamWriter(fos, "SJIS"));
    
  6. Sort using the Collator class.

    Almost every introductory class in computer science teaches a section on sorting algorithms. Typically, students are taught how to sort a collection of English words using a number of algorithms, such as bubble sort, heap sort, merge sort, or quick sort. These algorithms are fine, except that at the heart of all of them is some comparison logic that checks if one word should appear before or after another in the list.

    Typically, students are taught that it is safe to assume that the encoded values of the letters in the alphabet are in numerical order. Thus, a < b < c assumes that the encoded value of a is less than the encoded value of b, which is less than the encoded value of c. Unfortunately, this assumption falls apart when trying to sort other languages.

    Let's look at the following three strings: äpple, banan, and orange. The order shown is the correct order if we were to sort these strings using German collation rules. An uninformed programmer might try to sort these strings using the following program:

    public class IncorrectSort {
      public static void main(String [] argv) {
        String fruit[] = { "orange", "äpple", "banan" };
        String tmp;
    
        for (int i = 0; i < fruit.length; i++) {
          for (int j = i + 1; j < fruit.length; j++) {
            if ( fruit[i].compareTo( fruit[j] ) > 0 ) {
              // Swap fruit[i] and fruit[j]
              tmp = fruit[i];
              fruit[i] = fruit[j];
              fruit[j] = tmp;
            }
          }
        }
        
        for (int k = 0; k < fruit.length; k++)
          System.out.println(fruit[k]);
      }
    }
    

    The program sorts the strings incorrectly as banan, orange, äpple. It does this because the encoded value of "ä" is greater than "b" and "o". Now, let's look at the correct way to sort these strings. The emphasized lines indicate new or modified lines of code:

    import java.util.Locale;
    import java.text.Collator;
    
    public class CorrectSort {
      public static void main(String [] argv) {
        String fruit[] = { "orange", "äpple", "banan" };
        String tmp;
        Collator collate = 
           Collator.getInstance(Locale.GERMAN);
    
        for (int i = 0; i < fruit.length; i++) {
          for (int j = i + 1; j < fruit.length; j++) {
            if ( collate.compare( fruit[i], fruit[j] ) > 0 ) {
              // Swap fruit[i] and fruit[j]
              tmp = fruit[i];
              fruit[i] = fruit[j];
              fruit[j] = tmp;
            }
          }
        }
        
        for (int k = 0; k < fruit.length; k++)
          System.out.println(fruit[k]);
      }
    }
    
  7. Displaying complex text.

    Latin-based text is relatively simple to display. Several non-Latin writing systems however, have features that make them complex. These complex writing systems include such characteristics as characters changing shape, bidirectional text, and the use of mandatory ligatures to name a few. There are a number of ways to display complex text within your applications. In the following example, we show you how to use the Graphics2D.drawString method to render a string containing English and Hebrew text. We are using one of the Lucida fonts included in every Sun Java Runtime to display both English and Hebrew:

    import java.awt.*;
    import java.awt.event.*;
    import javax.swing.*;
    
    public class DrawStringDemo extends JFrame {
    
      String message =
        "David says, \"\u05E9\u05DC\u05D5\u05DD " +
        "\u05E2\u05D5\u05DC\u05DD\"";
    
      public DrawStringDemo() {
        super("DrawStringDemo");
      }
    
      public void paint(Graphics g) {
        Graphics2D graphics2D = (Graphics2D)g;
        GraphicsEnvironment.getLocalGraphicsEnvironment();
        Font font = new Font("LucidaSans", Font.PLAIN, 40);
        graphics2D.setFont(font);
        graphics2D.drawString(message, 50, 75);
    }
    
      public static void main(String[] args) {
        JFrame frame = new DrawStringDemo();
        frame.addWindowListener(new WindowAdapter() {
        public void windowClosing(WindowEvent e) 
           {System.exit(0);}
        });
        frame.pack();
        frame.setVisible(true);
      }
    }
    

    drawString is a useful method when you have to render a single line of text. javax.swing.JTextComponent and its subclasses also support the rendering of complex text. If you need more sophisticated control over the layout of complex text, look into java.awt.font.TextLayout. This component is most useful if you are writing something equivalent to a multilingual word processor.

  8. Make use of ComponentOrientation for GUI layout.

    In the java.awt package, you will find a class called ComponentOrientation. In Chapter 9, "Internationalized Graphical User Interfaces," in our book this is one of the first classes we describe. This class is used to specify to graphical user interface (GUI) components how the elements contained within, whether they be text or a collection of GUI components, should be laid out when rendered. Various languages, most notably Arabic and Hebrew, are actually not written left to right as in English, but right to left.1

    As such, a graphical layout should render not only the text but also the entire user interface (buttons, menus, etc.) in the proper format for a given language and locale. The java.awt.Component class, the base class for all graphical components (both AWT and Swing components), has both get and set methods for applying a particular component orientation to a particular component. Based on the value of the component orientation set for a component, it will render its contents appropriately.

  9. Don't hardcode fonts in your application.

    In the Java 2 SDK, Standard Edition, version 1.2, you are allowed to make use of True Type Fonts that exist on the system on which your application will be deployed. However, you should not assume that the system that will eventually be running your application contains a particular font. The current Sun Java Runtime ships with 12 different Lucida True Type fonts that can be used in applications. These fonts are capable of displaying characters from a number of different languages, including Arabic and Hebrew.

    It is useful to provide users of your application with a list of available system fonts that they can choose from to render text in the application. By doing this, you can allow the user to select the font that is most appropriate for rendering text in your application. Also, fonts may not always contain the proper glyphs for rendering text you want to display. If you've hardcoded the fonts available for rendering text in your application, your users may eventually run into a situation where a font from the hardcoded list of fonts that they have to choose from is insufficient for rendering one of their documents.

  10. Use NumberFormat to format numbers correctly.

    In the United States, the decimal separator (the delimiter that separates the whole and fractional parts of a number) is a period (.), and the character used for grouping is the comma (,). An American can easily read 1,234.56 as one-thousand, two-hundred thirty-four and 56 one-hundredths. In Germany, however, the decimal separator is a comma (,) and the grouping character is a period (.). In other words, a German would expect the same number to be formatted as 1.234,56. In Russia, the number is formatted as 1.234 56, and in France the number is formatted as 1 234,56. These are just a few of the issues you must deal with when formatting numbers for different regions of the world. Other related number-formatting issues include currency formats, percentage formats, and numeric shapes. These must also be dealt with in an appropriate manner for each locale.

    Clearly, the following code won't cut it for internationalized applications:

    double theNumber = 1234.56;
    System.out.println("The number is " + theNumber);
    

    There are a couple of things wrong with this code. First, as we mentioned in tip number 2, you should not hardcode strings into your application. (We did it here just to make things easy.) The second problem, and the main point of this tip, is that theNumber will simply be printed out as 1234.56, ignoring any locale formatting issues. The solution is to use the NumberFormat class as follows:

    double theNumber = 1234.56;
    NumberFormat nf = NumberFormat
       .getInstance(Locale.GERMAN);
    System.out.println("The number is " 
       + nf.format(theNumber));
    

    This code generates the following output:

    The number is 1.234,56
    
  11. Use DateFormat to display dates.

    The format for displaying date and time information varies according to local conventions. The names of weekdays and months, the ordering of fields, and the delimiters used between the fields differ around the world. An American would interpret the date 03/10/2001 as March 10, 2001. A British reader, however, would interpret the same date as October 3, 2001. Additionally, some cultures use a non-Gregorian calendar system. Look at the following list of dates as an example of how they differ:

    Tip 10 table

    Many programmers attempt to use the get methods of java.util.Date, such as Date.getDate(), Date.getMonth(), or Date.getYear(), to construct a date. For example, the following code snippet shows the wrong way to display dates:

    Date today = new Date();
    int month = today.getMonth() + 1;
    int year = today.getYear() + 1900;
    int day = today.getDate();
    
    System.out.println(month + "/" + day + "/" + year);
    

    Besides being deprecated since JDK 1.1, the problem with using this approach is that the format for how the date is displayed has been hardcoded into the program. To correctly display dates you should use the java.text.DateFormat class. The following code snippet shows the correct way to display a date for the default locale:

    Date today = new Date();
    DateFormat df = DateFormat.getDateInstance();
    
    System.out.println(df.format(today));
    

    There are several ways to get a DateFormat instance, depending on your needs. In this example, we retrieve a DateFormat object using the default locale and a default formatting style. The output from this code with a default locale of English in the United States is:

    Feb, 18 2001
    

    The same code run with the default locale set to German in Germany is:

    18.02.2001
    

    and the code run with the default locale set to French in France is:

    18 févr. 01
    

To write truly internationalized software in Java requires knowledge on the part of software engineers to use the Java APIs properly. In this list, we have only skimmed the surface of the issues you may be faced with in internationalizing a piece of software.

Footnote:

1. Actually, Arabic and Hebrew are bidirectional languages in that text is written from right to left, while numbers are written from left to right.


O'Reilly & Associates will soon release (March 2001) Java Internationalization.