Catching up with Unicode 5.0

by Rick Jelliffe

Unicode 5.0 was released a week ago: congratulations to all concerned. Unicode now has about 99,000 characters defined, though many of the improvements in Unicode 5.0 are related to how to use characters (their properties or display algorithms) rather than additions. There are only 1369 new characters compared to Unicode 4.1; and no milestone for implementations such as Unicode 3.1 in 2001 when the number of characters broke the 16-bit range.

I find Unicode very inspirational. Of course the mad scripts like Tifinarg not to mention the beautiful Burmese have their own fascination. But the diligence and effort in Unicode demonstrates a community with a love of communication and refined respect for culture. There are three main drivers for enhancements:

  • For Western text, the basics have long been in place and the emphasis is on additions for specialist publishing, academic and historical scripts: maths characters, Phoenecian,

  • For text from the industrializing nations, the emphasis is on completeness and coping with national variation: variant glyphs between China, Korea, Vietnam and Japan; the pronunciation used by Koreans, improved bidirectionality algorithm for Arabic for example.

  • As the codes, algorithms and properties for national languages sort themselves out, it becomes politically possible to address the requirements for minority scripts: Balinese, for example.