Bigger than Unicode

by Rick Jelliffe

Embedding Glyph Idenfifiers in XML Documents (EGIX) is a good PDF article by Christian Wittern that can help explain some approaches to overcoming Unicode's limitations for Han ideographs (kanji). (Tim Bray's blog also has an item on an unrelated set of problems related to non-ASCII characters this week too.)

Christian also has a general intro to writing systems and Unicode, preparatory for the Text Encoding Initiative (TEI) P5 chapter 25 Representation of non-standard characters and glyphs.

I am not sure what the equivalent markup in Office Open XML or ODF is. I presume OOX has it, because the ability to define your own private characters (or, at least, glyphs) has long been a feature of East Asian word processors. ODF is still imature in several areas of internationalization: the Egyptian standards body commented on the lack of Arabic support in the voting at ISO. But internationalization is a slow business. It will take years before privately defined characters support, such as allowed by EGIX or TEI or even Unicode Ideographic Description Sequences, becomes ubuiquitous, because it really requires platform support: Java, .NET, libxml and the dynamic languages.