Bad XML

by Jeni Tennison

Markup design fascinates me. What is it that makes one format easier to use than another? Why, even within that subset of markup that uses XML syntax, are some markup languages elegant and others unreadable? When is it best to use XML, when YAML, when a custom format?



Not all XML is created equal, and I think the biggest distinction between a good markup language and a bad one comes down to whether the XML was designed as a markup language or whether it’s a serialisation of a completely different model. Practically all the XML serialisations that I’ve seen of object-oriented models, or relational models, or graph models, have been dreadful as markup languages.


14 Comments

M. David Peterson
2008-05-17 15:44:01
Jeni Tennison's on XML.com!!!


w00t! :D


(That's all I can comment on at the moment... Haven't read your post. That's next. Just was excited to see your first post so felt the need to say so :D)

M. David Peterson
2008-05-17 15:53:15
Ugg! I'd never noticed the config file format in Oxygen before now. Yikes!


Fortunately the tool *ROCKS*! :D

Anup
2008-05-17 19:59:13
Good post. I agree with the comments about the office xml format. Trying to write an XSLT that will produce clean (X)HTML from Word is a pain! That being said, I don't find Open Office/Open Document Format XML that much better, either....
Jean K.
2008-05-17 20:34:58
Well said! I've wrapped myself around a tree a few times over WordprocessingML 2003, OOXML, and InDesign's serialized INX files.


Beyond the terse markup, there's a whole other issue in that most serialized XML is not necessarily valid XML. This is especially true of the INX format - you'll get a well-formed document, but not necessarily the consistency of valid XML required for downstream processes.


Thank you for clarifying the difference between "bad" and "used badly."

Andrew Welch
2008-05-18 03:07:25

I wonder if OOXML's lack of mixed content is a hangover from MSXML's policy of stripping insignificant whitespace...
fauigerzigerk
2008-05-18 03:57:27
I can see the usefulness of these rules for document data, but if I need to store something that is inherently a graph of structured data, the rules make no sense at all. So, basically, your advice is that people who store or exchange anything but tree structured documents should just go away and leave XML alone.


But why? Because representations of graphs in XML don't look pretty and are hard to read and edit by hand? That's a bad reason because graph structured data is hard to read and edit by hand in any format I know of. I believe this problem is inherent to the way the human brain works. Physical containment (like element subtrees) is easier to grasp than networks, because networks lack the locality our senses need.


I see no reason why I should not make use of the other strengths of XML beyond those resulting from stree structure and mixed content, like Unicode and existing mature parsers.

M. David Peterson
2008-05-18 04:21:32
@fauigerzigerk,


>> I see no reason why I should not make use of the other strengths of XML beyond those resulting from stree structure and mixed content, like Unicode and existing mature parsers.


This is a fair point. And you're right, there are times when there isn't a pretty well to move your data into XML. I think the problem occurs when people are forced to look at the XML and attempt to make sense of it which at which point you get the classic "Oh, that's ugly! Isn't there a better way?" which then leads to rants similar to Jeff Atwood's. Of course we all know what happens after such rants, a global "conversation" which ultimately ends up leading back to your point,


>> I see no reason why I should not make use of the other strengths of XML beyond those resulting from stree structure and mixed content, like Unicode and existing mature parsers.


Of course I can't help but agree.


M. David Peterson
2008-05-18 04:32:02
@Andrew,


>> I wonder if OOXML's lack of mixed content is a hangover from MSXML's policy of stripping insignificant whitespace...


Hmmm.... Interesting point. As far as XSLT is concerned, while it's impossible for it to become as complicated given <xsl:text>foo</xsl:text> doesn't allow anything other than plain text with escaped markup, I certainly have a tendency to put all text that isn't generated into an xsl:text element. And as Anup points out, ODF isn't any prettier (okay, maybe it's a little prettier ;-)), so maybe this is really a simple matter of guaranteeing a lossless document format when moving from one application to another?


Food for thought...

Jeni Tennison
2008-05-18 06:19:15
@fauigerzigerk,


I certainly don't think that people who want to use XML for graph structures should go away and use something else! All I'm arguing is that everyone should think about the way the XML they use is designed as a markup language rather than simply dumping out a graph or other data-oriented structure in a generic serialisation.


Yes, graph structures are inherently harder for humans to read because they aren't linear; that's when you have to work particularly hard on the design of your XML to make it as usable as possible (for programmers as well as authors).

Theo
2008-05-19 01:16:44
> There are some things that OOXML does right. It uses meaningful element names


I fail to see how "r", "pPr", "i" and "t" are "meaningful element names".

Jeni Tennison
2008-05-19 02:12:16
@Theo,


They're more meaningful than 'element' :) They may be short, but at least they stand for something that reflects their semantics (r = run, pPr = paragraph properties, i = italic, t = text).

Roger
2008-05-19 04:50:03
I don't agree about ODF not being better than OOXML. I think it's a lot cleaner and easier to understand.


I just did a simple test with a one page document, containing a table, foot notes, and several styles. It is created in OpenOffice. I saved it as Word 2003 XML, which resulted in a 63KB file. I unzipped the ODT file, and the content file is 34 KB, about half the size. I know this is Word 2003, which is not OOXML, but I suppose it's very similar, looking at the examples here.

Kurt Cagle
2008-05-19 17:06:00
Jeni,


I want to publicly second David's enthusiastic response. Welcome to XML.com - I for one am eagerly looking forward to your posts!


-- Kurt Cagle

M. David Peterson
2008-05-20 16:55:35
One post: #1 on the Hot 25: http://weblogs.oreillynet.com/


/me is looking forward to the next #1 post. :D