Ten years later, time to repeat the trimming?

by Simon St. Laurent

In celebration of XML 1.0's tenth anniversary, I signed back on to XML-DEV to suggest that it's time to do to XML - just the core of it, please - what XML did to SGML around SGML's tenth anniversary.


2008-02-12 19:21:51
Cleanup XML? I'd love that. I've been a big fan of it since '99. I think the fat should be trimmed and a more HTML 5 style error handling needs to be implemented to replace the draconian thing.
2008-02-14 06:03:04
Can you clean up XML without cleaning up HTML?
Simon St.Laurent
2008-02-14 07:12:32
"Can you clean up XML without cleaning up HTML?"

Uh, I guess I'd have to ask whether you could clean up SGML without cleaning up HTML - and that part seemed to go fine ten years ago.

I'm not sure why that's even a relevant question though.

2008-02-14 08:36:41
XML IS the cleanup of SGML. I don't expect any more progress there. Too few care.

Ten years ago was a very different time. The resistance to SGML was fierce, the knowledge base was shallow, there were dreams of sugar plums (Say LOTS of money) driving markup to the web, and the existing applications with the exceptions of print publishing had already done the experiments to show that the XML design could work (XML is SGML As Practiced in many shops, particularly the advanced hypertext shops such as USAMICOM IADS). There was no doubt in the relatively small cadre of SGML experts that this was needed and would work. It came down to a political burglary over who, where and what. Once Berners-Lee bought in, resistance was futile. The Web hive-mind was borg then. I don't think that is true today.

The same conditions don't exist today. Anyone taking this on has to get very big buy-ins outside the W3C. One of the problems of the SGML cleanup was the mess that HTML had become. XML captured the mess but didn't really clean it up because the kudzu was already in the fields and expanding rapidly. Whereas beating the mean drum against SGML was a political tactic that worked in 1996, today it is meaningless. On the other hand, saying you can 'clean up' XML rattles a lot more cages even if for comp-sci it is a reasonable request and yes, it is time to talk about it. I don't disagree; I just don't like Quixotic quests for their own sake.

If I understand your initiative, you want to create a sanctioned subset/profile of XML for data binding applications. IOW, this profile would never be applied to HTML. Cleaning up HTML is hopeless IMO despite all efforts made. Kudzu is kudzu. But the other applications of XML rely on the intertwined application languages (XSLT, XSD, XPath, XQuery, etc.).

So briefly, what would be the benefits of an XML profile?

Simon St.Laurent
2008-02-14 09:06:02
I don't think you answered my question, which was asking why you thought HTML was relevant to this conversation.

I'm also not interested in a reduced profile for data-binding purposes, though sure, they can tag along if they want. JSON seems to me to have done a better job of addressing that space than XML could, and if anyone's paying attention and needs more, there's always YAML. Data-oriented folks wouldn't likely lose anything in the profiling - I don't think a lot of them are using NOTATION anyway.

So why bother? Mostly, I'd suggest, to clear out the debris and open new possibilities. A lot of the material in XML 1.0 is now deadwood, but the mere fact of its presence has been a barrier to solving those problems (think entity processing) in ways that fit modern use.

It'd also give a lot of additional credibility to the various patches to XML to have them incorporated into a single standard, instead of having to wonder whether this "XML" processor handles XIncludes and xml:id or whether I have to do it manually or geez, is there a way to switch that off?

It doesn't seem like that incredibly difficult a project. I don't expect that it would sweep the world immediately, but I do suspect that a lot of people would find it useful over time.

2008-02-14 11:01:03
Because any change to XML has to cope with the legacies of the applications just as SGML coped with HTML -> XML. That's why I ask.

So cost-benefits: say we take out general entities (assuming you want to keep character entities), DTDs, namespaces and attribute normalization. Say xml:id is made part of the standard. Say someone has a solid proposal for adding namespaces. Which kinds of applications benefit? What legacy HAS to be grandfathered. The reason for asking about HTML; rather than fix it, we grandfathered it making everything suck more just to get buy in. What are you willing to bargain?

It's more of a 'look-ahead' question than a 'must be' question. XML-Dev could be resuming an active role if the presentation is well-thought through instead of chewing on theoreticals day in and out.

I think it can be difficult in several dimensions. What I won't repeat (fwiw) is another SGML On The Web where the agendas became apparent only after some had signed up for it. The rosy memory thing is too much a means of wish-fulfillment. Those arguments were bitter and long in the working group regardless of how often and when the self-selected ERB went to dinner together. Companies lost business, people's careers took a hard hit, and in the end, what we got was a metalanguage that simply sucked equally for almost everyone except the HTMLers.

So cost-benefit feature by feature instead of the 10 Precepts that create a small throw it to the floor and claim victory while the real work was yet to be done. My advice is continue the XML-Dev thread. Make a list of the suggested changes, then get feedback on them one at a time. If nothing else, it is a very worthwhile review for the lurker gallery.

Simon St.Laurent
2008-02-14 11:12:51
Len - I think you've deeply overread what I'm hoping to do, as well as my personal level of commitment to seeing it get done.

Most of this work has actually been done before. There's Common XML, SkunkWorks XML, and a lot of other past discussion. About the only parts that might be new would be integrating xml:id and maybe XInclude, and finding somebody to stick a label on it.

I don't understand at all why you keep going on about what a drastic business this would be. I don't, in fact, expect much of the world to notice. The revolution already blew through when XML came out.

This is just cleanup - and even cleanup preferably done by other people. I've already fought these battles, but thought they were worth reconsidering among the 10th anniversary gala celebration. Folks including you spent a lot of time telling me it was too early to do this, or a bad idea to do this. A 10th anniversary celebration of such a process that did just this to a 10-year-old spec is way too tempting a point in time not to ask if it's time yet.

2008-02-14 15:09:17
You asked. :-)

There is no clean up possible as long as different constituencies rely on different pieces that other consituencies consider dross. Again, we could succeed with XML because of the numbers and the constituency. SGML was sold to institutions and publishers. XML was sold to programmers. HTML is just kudzu.

It is precisely because I've read all of the earlier attempts that I am asking these questions. It comes down to common practice: is there a large enough body of practicioners to endorse a clean up? So if there is a list of suggestions and some sort of straw poll for each, that could answer the question of need. Ten years is an opportune time to ask if there is a need. Last time, the need was clear.

As to who does it: I didn't think you were going to do it. My guess is some pretty big guns/names would have to want it.

Rick Jelliffe
2008-02-15 04:08:31
I guess some reasons to have a smaller XML would be:

1) To make it more popular... But it is already used in places that stretch it. So that won't work.

2) To make it useful for small data interchange... But JSON does that better, because it has typing and structures and fits into programming languages. So that won't work.

3) To make it more useful for XHTML... But XHTML systems already are not supposed to use the DTD, and if you look at HTML 5's rules, user-friendliness = programmer sweat. So that won't work.

4) To make it faster for parsing... But the extra syntax doesn't add to parse time and there is the Fast Infoset etc stuff out now for that niche. So that won't work.

5) To get more implementations... But it is ubquitous everywhere now anyway. And, as was found with XML 1.1, alterations without general *user* benefit are not attractive to programmers. So that won't work either.

2008-02-15 06:03:11

True. So it comes down to elegance for the programmers, who were in fact, the audience it was sold to originally.

I'm leary of losing a standard means to validate at the trust boundaries particularly when most programmers I interview do understand DTDs and there is little need of new implementations. OTOH, keeping them keeps most of the worms that the cognoscenti object to in place, so once again, that won't work.