Word's giant leap forward

by Andrew Savikas

Related link: http://www.oreilly.com/catalog/officexml/



One of the finest features of Adobe FrameMaker (long a favorite among tech publishers) is the MIF file format (MIF stands for Maker Interchange Format).

MIF is a lossless, ASCII representation of a FrameMaker file, and it's well document and easily parsed. Things that would be impossible or nearly so from the Frame UI are a walk in the park from MIF.

Though RTF can occasionally serve as a nearly lossless ASCII format for Word files, it's rather abstruse, and notoriously difficult to parse.

But with Word 2003, Microsoft has finally opened up their books, so to speak, providing a truly lossless ASCII file format that even bests MIF, by being XML. WordprocessingML is cumbersome, to say the least, but Word documents are complicated, cumbersome things, that need a lot of description. And while WordprocessingML won't ever be anyone's (except Redmond's) idea of a standard document format, it opens the floodgates for those of us who traffic in Word files.

For example, as I was reading through the excellent Office 2003 XML, I noticed a 7-line XSL stylesheet that removes all direct (not style-based) formatting from a Word document. Seven lines. No macros. No need even to open up Word. That's (finally) true batch processing in Word. For what I do, this is a much bigger deal than the Task Pane.

Though much of the attention paid toward the XML features in Word XML has been about data exchange beteween Word and the rest of the world, I for one am more excited about what I can do to (and with) Word XML without ever leaving Office.

1 Comments

teejay
2004-06-11 07:14:22
great if people use the format
This sounds good, but requires that users actually save the files in that format. I don't see MS moving their default native save format to this in the near future.


Also what about embedded content - does it use xlink or does it embed binary or COM trash that requires you to be running windows (or a win32 API) in order to access them?