O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  

Buy the book!
XML Hacks
By Michael Fitzgerald
July 2004
More Info

Edit XML Documents with Emacs and nXML
nXML mode for GNU Emacs provides a powerful environment for creating valid XML documents
[Discuss (0) | Link to this hack]

If you've been editing XML from within GNU Emacs using PSGML, here's a tip: get rid of it. That's right, tear it out, dump it, make it disappear—because there's a much better tool available: nXML. (Grab the latest nxml-mode-200nnnnn.tar.gz file from http://www.thaiopensource.com/download/.) nXML was developed by James Clark, the man who brought us groff, expat, sgmls, SP, and Jade, as well as being a driving force behind the development of XPath, XSLT (and before that, DSSSL), and, along with Murata Makoto, RELAX NG (http://www.relaxng.org/).

Which brings us back to what nXML is all about: nXML is a very clever mechanism for doing RELAX NG-driven, context-sensitive, validated editing. What's particularly clever about it is that, unlike PSGML and unlike virtually every other XML editing application available—with the exception of the Topologi Collaborative Markup Editor (http://www.topologi.com/products/tme/)—it provides real-time, automatic visual identification of validity errors.

This hack assumes that you are familiar with Emacs. The README file that comes with nXML states that you must use Emacs version 21.x (preferably 21.3 or later) in order to use nXML. To get nXML to run in Emacs, you must first load the rng-auto.el file. In Emacs, type:

M-x load-file

Then load the file rng-auto.el from the location where you downloaded and extracted the latest version of nXML. This file defines the autoloads for nXML. Now open an XML document (C-x C-f) and enter:

M-x nxml-mode

You are good to go! For help, type:

C-h m

Spotting Validity Errors in Real Time

What "automatic visual identification of validity errors" means is that if you create and edit documents using nXML, you never need to manually run a separate validation step to determine whether a document is valid; i.e., if a document contains a validity error, you will know instantly as you edit the document because it will be visually flagged. Here's how it works. As you're editing a document:

  • nXML incrementally reparses and revalidates the document in the background during idle periods between the times when you are actually typing in content. You can wait for nXML to finish validating the entire document (which usually takes only a matter of seconds), or if you're working with a large document, you don't need to wait: the moment you start typing in content, nXML will stop its background parsing and validating until you're idle once again.

  • nXML describes the current validity state in the mode line at the bottom of the Emacs interface; at any point while you're editing a document, the mode line will say either Valid, Invalid, or Validated nn%, where nn is a number indicating what percentage of the document has been validated so far.

  • nXML visually highlights all instances of invalidity it finds in the part of the document it has validated so far (by default, the value of the Emacs face it uses is a red underline, but the highlighting can be changed by customizing that face).

If you mouse over or move your cursor over one of the points that nXML has highlighted as invalid, text appears describing the validity error, either as popup text or in the minibuffer echo area at the bottom of the Emacs interface.

Figure 1. nXML validation error message

Entering and Displaying Special Characters

Another area where nXML is very clever is the way in which it enables you to enter and display special characters. To enter a special character, such as a copyright sign:

  1. Type C-c C-u. nXML then prompts you for the name of the character to enter.

  2. Type the first few letters of the character name and then hit tab. nXML then does completion, presenting you with a list of all character names that start with the letters you type in. For example, if you enter cop, nXML will present you with a list of several character names that starts with COPTIC, along with the name of the character that's probably the one you're looking for: COPYRIGHT SIGN.

  3. Either use your mouse to select one of the choices from the completion buffer, or type more letters then tab again to narrow down the choices to the character you need. Or, if you just type copy to begin with, you'll get straight to the copyright sign (because it's the only character name that begins with COPY).

Note that, by default, nXML inserts the hexadecimal character entity reference, not the actual character; e.g., for the copyright sign, nXML inserts the character reference ©. This ensures that you will be able to interpret what the character is if it is displayed by software that does not understand Unicode.

But this is where things get interesting: even though nXML writes only the numeric character reference to the file, it displays the glyph for the character (along with the character reference itself). And if you mouse over the character reference, nXML displays the full name of the character, either as pop-up text or in the minibuffer echo area at the bottom of the Emacs interface ().

Figure 3. nXML display of special characters

As far as special characters go, nXML lets you have your cake and eat it too. You get:

  • An easy way to enter special characters as character references, without needing to memorize or look up their numeric values or ISO entity names.

  • The ability to see glyphs and full names for all the character references in your documents, while still being able to distribute them to others as ASCII-encoded files (so you're not depending on others having editors that support Unicode or some other encoding).

To enter special characters in other ways:

See also:

O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.