ECS - Moving from Well-Formed XML to Amply-Tagged XML

by Rick Jelliffe

Opera's Anne van Kesteren has blogged that HTML browser makers should "define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it." It has been widely reported.

My company, Topologi, has been using exactly such a grammar for five years in some of our products. We use it as our HTML and SGML editing mode, its not perfect but perfectly workable in most cases. It seems to be in Ann's ballpark for an "XML 2.0".

ECS ("Editor's Concrete Syntax") takes XML and puts back the forms of end-tag minimization and close-delimiter omission (which is what Ann is calling "graceful error handling" AFAICS) that XML removed from SGML under the mantra 'terseness is of minimal importance." This moves it much closer to idiomatic HTML; it is Forgiving XML rather than Superbitch XML, and this is far more suitable for just folks to use.


2007-02-05 21:32:00
What's the value of that? Just that some lazy human does not have to write those end tags? That cannot be that important... Or to save some bytes over the wire? We are in the age of broadband Internet... Webpages should be valid XHTML. We do not need yet another (useless) standard.
Rick Jelliffe
2007-02-05 22:08:29
Simon: Yes, I think you are right. If we could only eliminate those lazy humans, there would be nothing to stop XHTML.
2007-02-05 22:09:51
Producing well-formed XML aint that hard. I'm mystified as to wtf the point of "graceful" XML error handing is.

Rick, do you think this will result in there being LESS or MORE not-well-formed XML documents in the world?

As a consumer of XML, I must say it seems to me it will be MORE, which seems like it will make XML less useful. I must be missing something. (?)

2007-02-05 22:23:34
s/not-well-formed/unintentionally malformed/
Rick Jelliffe
2007-02-05 22:57:17
Finite: Exactly the same amount of WF XML as would exist without ECS.

One approach might be to define it as a text transform that generates WF XML. So documents that are expected to be XML could, if non-WF and at user option, be then sent through the transform to generate XML as a method of error recovery. And then to make it clear that this transform is only suitable for casual data where access is more important than accuracy, such as EcsHTML.

I don't think XML is rocket science either, but if there is a credible community of users who want something difficult, they have a right to get a technology that is suited to them without being told by outsiders that their needs don't count.

Democracy! Pluralism! Motherhood!

2007-02-05 23:11:11
What if an XML producing human uses a tool that tolerates malformed XML, so they don't realize or do realize but don't care that they're able to produce malformedd XML, and I'm using a circa-present day XML consuming program that doesn't have the magical "transform" you speak of... hasn't the producer's application's tolerance of malformed data just made XML a lot less useful for me? I think it has!
Philip Fennell
2007-02-06 01:52:18
I'm affraid you have me stumped. Being a dyed-in-the-wool XML proponent, you will never be able to convince me that the ECS example has any merit. To be honest it made my stomach clench when I first read it. From a human reable perspective I don't think it scans as well as the XML example. That may appear trivial but no more so than the fewer keystrokes it took to write. Sorry, but I'm not interested.
J David Eisenberg
2007-02-06 10:01:36
I believe Mr. van Kesteren spells his name "Anne."
J David Eisenbeg
2007-02-06 14:35:10
First, in the ECS spec, you say "the delimiters "<" and "&" do not need to be converted to entity references if followed by a name start character,..." Should this be "if NOT followed by a name start character"?

Second, the restriction on an element not being able to contain itself directly would forbid directly nested <div> elements in HTML, and those can be useful.

Third, from the standpoint of teaching "just folks," I've found that minimization makes things harder, not easier. Consider:

<p id=t1> is fine, but <hr width=25%> isn't. You know that 25% is two tokens. I know that. But Mr. Joe Novice doesn't know it, and is in a constant state of paranoia over whether he needs those quote marks or not.

Samantha Beginner will write <p this is my text</> and expect it to work. After all, you left out the closing angle bracket on <b<i>, didn't you? She will also be wasting neurons deciding whether she needs that angle bracket or not, and whether she should (or even can) use a minimization or not.

Your mileage, of course, may vary.

Rick Jelliffe
2007-02-06 20:13:33
David: Thanks for picking up the typo in the name, I'll get it corrected. (I don't have access to change the PDF from where I am, but you are right that there is a missing "not".)

Yes, certainly it is impossible to have a markup language in which no mistakes are possible. For example, I was training once and using examples like

<p id="p3" class="doggy">

and one participant thought that this was an element with two attriutes: one called "p id" and one called "class", because they hadn't grasped that an element has a name or that attribute names cannot contain spaces.

There are lots of other problems with trying to define HTML minimization strictly interms of SGML minimization. For example, the token issue you mention.

But van Kesteren was mentioning"graceful error handling" and I was just showing an example of one in use, and to mention that there would be no problems with its staus wrt the ISO standard, since that seems to be not uninteresting to some people at the moment.

Leigh Klotz
2007-02-06 22:53:20
Why does nobody ever talk about the desperate need to make JavaScript syntax so much more forgiving?
Daniel Billotte
2007-02-07 00:04:42
Hogwash! Over the last handfull of years I've had to digest all kinds of user/client generated crap XML with parsers that I have written in a number of different languages with a number of different XML libraries or without. I think that it is crazy to not just allow, but encourage, more crap. As some other posters have mentioned, most XML is machine generated so its not like using this ECS is going to save anybody from CTS. Creating more levels of vague-ness about what is "legal" and what is not will only produce more crap XML. If people are going to use it, lets keep a simple rule set and make them stick to it!

Do I love XML? In one word, "NO". But please don't go and try to replace it with some syntactically unbalenced mess like has been demonstrated here. JSON is nice, YAML is nice, I'd rather use perl style data-structs that ECS. I think we can find a better representation for data than XML, but it should be a move forward instead of a move backwards.

Kurt Cagle
2007-02-08 14:48:19
I'd also be inclined to question this. I see a far more compelling case for treating JSON as a "compact" infoset model that can be mapped to (and used by) an XML process than I do in seeing ECS gain hold. Anne's also been trying to push "forgiving" HTML for years and bypassing the W3C XML efforts as being too complicated for the average Joe, despite the fact that most HTML in this day and age is either written by people who are perfectly comfortable with XML or is autogenerated by systems that can just as easily be refactored to handle XML generation. Like much of the WhatWG "standard" (with the possible exception of Canvas) this strikes me as being a rear-guard action by a handful of developers who live in something of a fantasy world where everyone is still hand-coding their own web pages and just can't get the hang of (or are too lazy to get the hang of) making their notation well-formed and self-contained.
John Cowanh
2008-01-07 14:38:20
FWIW, TagSoup pretty much supports ECS. At some point I'll probably extend it to do so in full. (Plug: TagSoup 1.2 is out.)
Rick Jelliffe
2008-01-07 22:56:59
John: Thanks for that (and thanks for all your effort in developing and maintaining TagSoup, while I am at it!)

Yes, I think ECS does correspond much more to how people do HTML than XML does, so I would expect it to be much closer to TagSoup's effective syntax.

2008-04-17 14:22:03
Who controls the past controls the future. Who controls the present controls the past.
2008-04-17 17:19:51
You have a beautiful website here..
2008-04-18 19:19:48
There is nothing so easy but that it becomes difficult when you do it reluctantly.
2008-04-19 02:17:49
Cool!!! Nice work.
2008-04-20 01:13:38
Fear is the main source of superstition, and one of the main sources of cruelty. To conquer fear is the beginning of wisdom.
2008-04-20 02:17:01
Intense feeling too often obscures the truth.
2008-04-21 19:17:38
Thank you.
2008-04-23 13:31:47
very interesting and informative
2008-04-24 19:24:20
very interesting and informative
2008-04-25 05:20:04
Thank you.
2008-04-26 11:21:00
Hi... Very interesting site.
2008-04-26 15:20:26
This is a very informative site.
2008-05-01 10:00:06
Thank you.
2008-05-07 00:20:38
This is a very informative site.
2008-05-08 12:54:47
very interesting and informative
2008-05-09 12:20:39
2008-05-10 07:20:26