Is XML human readable?

by Rick Jelliffe

One of the most odd comments that is coming up on DIS 29500 is that plain old XML is not human readable. I would love to hear an explanation of this. A string of characters saved in a text file with a .xsd extension is not human readable, but exactly the same string when cut and pasted into a word processor is human-readable?

(To forestall talking in circles, this is not about whether XSD is baroque, nor whether a human who can read XML can then necessarily understand the intended semantics of the markup.)


2007-08-08 09:45:31
Well-formatted XML is a lot more readable than badly formatted XML. Otherwise, it is a lot more readable than CSV for the same data without a template. XML has an internal template and that is why they call it structured data.

We've been hearing that same canard about human readability since the earliest days of SGML. What is more useful is to know how cut and pastable a particular application language is. I can show anyone a few thousand lines of X3D with those indexed face sets and if they can read those in XML or Classic VRML encoding, they are an expert, but for cut-and-paste, Classic VRML is a lot better as long as one has a brace matching editor and better still if one has a JIT validator in an editor.

I don't get this question anymore. Who asks that sort of thing these days?

2007-08-08 11:16:40
Sure it's human readable. Consumable is another question completely.
2007-08-08 11:34:22
I've heard this too many times. Of course it's readable. But "Readable" and "Understandable" are different things. Read some Kant (philosopher). It's plain english prose. Readable? Yes. Understandable....
2007-08-08 12:00:35
Everything is human readable. It's just that some things take more time to grok than others.

Is SGML/XML more readable than roff for document markup? Definitely. Is it more readable than .ini for config files? Definitely not.

2007-08-08 13:04:32
So by your definition, machine code is human readable. As long as it's formatted as hex or ASCII "1" and "0" characters. That's nonsense.

"Readability" doesn't mean the ability for humans to see things any more than an illiterate person staring at a page of text is "reading".

But the larger point is that readability is a spectrum. XML has a much lower signal-to-noise ratio than a great many other formats. YAML, anyone?

Rick Jelliffe
2007-08-08 14:57:07
Rob: But the comments are specifically that the XML files supplied with DIS29500 are not human-readable because they are in plain XML files. And the remedy: put the same text in a word processor file.

So the people making this claim are simply not talking about "understandability": an angle-bracket looks the same whether it is a printed ASCII file or a printed word processor file. It cannot be an issue of grokking.

It seems like the kind of claim that only someone who has never actually seen an XML file would make. Like I said, I would love an explanation.

M. David Peterson
2007-08-08 18:55:33
>> A string of characters saved in a text file with a .xsd extension is not human readable, but exactly the same string when cut and pasted into a word processor is human-readable? <<

Depends on whether or not the word processor in question has the ability to render the XML into a tree/table-like structure. If yes, then sure, it's definitely more readable in the same way that a CSV file is more readable when loaded into a spreadsheet table.

Could it be that those who are claiming that XML is more readable inside of a word processor are doing so based on their experience of a copy/paste operation from raw XML originating from say, Notepad, into something like Word that might potentially convert from raw text into a structured table view?

I have no clue if this is even normal behavior, to be honest... I've never even considered the idea of using Word to edit raw XML, though I guess I have used it enough times when writing technical documents that contained snippets of XML, and I've never noticed any attempt at reformatting the view automagically. Then again it would certainly be easy enough to write a VBA application that sniffed for XML, CSV, or any other common raw data container, transforming the data into an Excel table, so I can only assume that someone, somewhere has done just that to make the task of editing raw data that much easier for the remaining 99% of the world who can easily grok table views but get all blurry eyed at the sign of anything remotely resembling an angle bracket or any type of data delimiter.

J David Eisenberg
2007-08-08 22:09:12
For once, I agree with you -- I can't figure out what on earth they mean by this. Grasping at straws, maybe they are thinking of the fact that the XML, packed into a ZIP format, is certainly not readable. To paraphrase Jef Raskin, "If you can read the ZIP file directly, you are a mutant, and will go far in the world of computers."

Is there anyone claiming that XSD is simple and straightforward, BTW?

2007-08-09 00:24:04
XML is more human readable than JSON. Let the arguments commence!
Rick Jelliffe
2007-08-09 02:00:30
David: I think they would be asking for diagrams or tables or trees, if that was what they wanted. But there would be no value, because the schemas have to be in text in order to be distributed and understood.

For some people (lets call them "professionals"), the schema is the first port of call (or at least the first port as soon as the introduction gets boring.) That is why having nice terse RELAX NG compact schema is really useful for scoping.

David: XSD simple and straightforward? Mutant indeed. (As I said, there are versions of the XSD schemas translated into RELAX NG syntax as part of DIS 29500.)

Ryan Lang
2007-08-09 06:42:35
Yes, I think that XML is very readable. One the other hand, the XML output by MS Word is a bit confusing the way they mark it up, but if you XML created by human hands, it makes a little more sense. I suppose being familiar with coding syntax helps when reading raw XML
Rick Jelliffe
2007-08-09 06:49:33
Ryan: Open XML adopts what my 1998 book The XML & SGML Cookbook calls the "skeleton" (head/body) approach on its container elements. This is a little odd for the HTML generation when they see it, but is not at all unusual. When you have a lot of attributes, or the attributes may need to be further structured or have composite values, it becomes appropriate to put them into a separate head section.

In Open XML, these head sections are named using the container name plus "Pr" for "properties". So is container for the property elements for a

(paragraph) element.

In some cases, the head/body pattern became quite elaborate, with feet (CALS tables), with titles (as in CALS formal paragraphs) and so on. But odd does not mean bad, vicar.

Rick Jelliffe
2007-08-14 07:42:58
Jesper: The best patterns for expressing information in XML is indeed an interesting topic. One of Christopher Alexander's problems with his architectural patterns movement was that, counter to what he had expected, when normal people build on patterns extracted from the architecture they know, they end up building the same things as before. XML patterns are quite similar: people make all sorts of extravagant claims that A is better than B, but often if you look at it HTML people end up with patterns that look like HTML, CALS people end up with documents that look like CALS, and so on.

But it is certainly a discussion that is worth having. It is the kind of thing that Roger Costello of spooky US thinktank MITRE does :-) ODF's goals included easy transformability with XSLT, so it does make sense that ODF has nested structures while Open XML zags rather than zigs there, to fit in with its goals.

However, ODF as I understand it makes up a "automatic" style rather than having inline overrides. If you have a document with no styles, the XPaths in Open XML are simpler; if you have a document with only styles, ODF and Open XML are the same; if you have a document wit styles and overrides, ODF is simpler because it only has one mechanism. All other things being equal.

2007-08-18 04:43:12
There is beauty and there is kludge.

Open XML is not beautiful at all, you don't want to read it.

Now: XML is like C or C++: kludge is made possible. So you need to rely on good design. Design by humans. Open XML looks machine designed.

Rick Jelliffe
2007-08-18 06:01:49
Andre: If you want to attack Open XML, you need to attack its goals: you need to prove that it would not be useful for *any* niche industry sector to have standard access to a thorough description of the format of the world's most popular application, in a form that has been vetted through 2 years of intensive standards work involving thousands of participants, and with its IP issues clearly sorted out. That OOXML is less than optimal in area X or Y is a given: its value is in being a snapshot, not an exemplar.
2007-08-20 11:05:04
As previous commentors have said, XML can be read by humans. But, so can a core dump...if you know how to read it. The crux of the issue is "whether a human can...understand the intended semantics." What good is it to understand the text, if you can't interpret them?

In a truly global computing environment that has a lot of different lanaguages and even character sets, it is impossible for all files to be human readable by all people. Kanji may as well be encrypted machine code to me.

CSV is really easy to read. Just load the files into Excel or any number of other software packages that can import CSV and it is often much easier than defining how to interpret the XML.

Say what you want, but most XML of any complexity quickly becomes unreadable by humans. Even defining column based input becomes more readable.

I transferred data around long before XML became the fad. All that really matters is that the format of the data are well documented. XML is not the most efficient way to handle large data, and it is often less human readable in practice than many claim. Especially, when the data are complex.

Maybe if people would normalize their overly complex XML formats into multiple XML files, with a small XML file that defines the relationships. These could all be placed in one ZIP file.

Taking this one step further, the relationships file might also define the formats of the files that the actual data files are in. If you want to use JSON, XML, CSV, etc., or even a combination of different formats. This way you could cover the latest geek fads.

Rick Jelliffe
2007-08-20 22:08:02
Chuck: All good points. (In fact, OPC --part of OOXML-- does provide a ZIP format modularized with multiple smaller ZIP files, tied together with a relationship system, but I guess that is what you were referring to.)

To provide context for the blog though, see BSi comments on DIS 29500 Wiki where somehow putting a text file into a word processing or PDF document and adding line numbers magically makes it "human readable". Hrumph.