XML Versus TAP

by Curtis Poe

Preface: if you love XML, that's fine. I've nothing against the technology per se, but it's not always the best tool for the job.

I'll be in Copenhagen next weekend for the Nordic Perl Workshop giving a talk about multi-language test suites. This will be based on the work done with TAP::Parser and it will contain a brief discussion of TAP (the Test Anything Protocol), a protocol which is almost 20 years old and is gaining in popularity.

One question I'm sometimes asked by those not involved with TAP is why we don't use XML for our test results. This is a brief attempt to answer that.


23 Comments

Andy Armstrong
2007-04-22 06:50:08
I wrote a basic TAP implemention in GNU Forth for giggles last night :)
Ovid
2007-04-22 08:03:34

Andy: I can believe it! TAP is ridiculously easy to implement. That's part of the reason why it's available in so many languages. So far I've heard of implementations in:



  • C

  • C++

  • Forth

  • Javascript

  • PHP

  • Perl

  • PostgreSQL (!)

  • Python

  • Ruby


Outside of Perl, it's most popular in the PHP world, though I've seen it used heavily in C and C++.

M. David Peterson
2007-04-22 10:02:20
>> "how many of you know what an unbound prefix is and how to fix it?"


Isn't that a bit of a loaded question? I mean, as soon as someone states "an unbound prefix is a prefix without a matching namespace declaration and to fix it you simply add an xmlns:prefix="uri:pick-a-namespace-any-namespace" within proper document context in regards to the first use of the prefix" I can only guess what your response is going to be,


Q: "So what is the proper document context?"
A: "prefix and namespace declared on or before the first prefixed element in the document."


Q: "Okay, so then what about the processor? What happens if the namespace of the prefix referenced in the XML document doesn't match the prefix of the same referenced namespace inside of the processor?"
A: "The namespace is what matters, and is what is used to bind elements to their matching processing instruction inside of the processor."


Q: "So what if there are no matching namespaces inside of the processor?"
A: "Depends on the processor. Anything from nothing to the text nodes being output from the elements that didn't match any of the rules/instructions and therefore defaulted to built in rules/instructions that output the text nodes contained inside of the element"


Q: "So then you would get the incorrect and/or unexpected result?"
A: "Yep. Just like with TAP if you haven't spent the time to understand what it is you are doing in the first place."

Dominic Mitchell
2007-04-22 10:14:44

In fairness to XML, it's often better used for documents rather than data. And namespaces trip up everybody, which is really scary. In a recent training course for an XML database, the namespaces section consistently tripped up everybody in the class, even those with 5+ years of continuous XML experience. Quite depressing.


Anyway, regarding digests of XML. That's what Canonical XML was invented for. My XML::Genx module will output in this format.

M. David Peterson
2007-04-22 10:35:35
Also,


>> And did you see the mistake in the XML?


For those who understand how an XML writer actually works, they would understand that to get,


<descriptin>one is one</description>


as the misspelled output would not be possible unless you had a faulty XML writer, as the opening and closing element are written from the same memory location which has been held in that location until such time as the signal to close the tag has been given. And given that there are literally hundreds of XML writers out there that have been tested within an inch of their life to ensure proper conformance, to get the above output would mean,


1) You chose to write your own XML writer instead of using the XML writer that comes as standard issue as part of each and every respected language and/or platform on the planet.
2) You chose to hack together a half a$$ solution instead of taking the time to think through the process of writing a proper XML writer.
3) Either of which, of course, would showcase that you have either no clue what you're doing, or could care less about spending the time to write software that actually works, and if it's the latter of the two, then your problem has nothing to do with XML, and *everything* to do with having chosen the wrong career path.


Maybe its just me, but shouldn't the focus be less about "Test Anything" and more about "Test Always"? If it's not, why not? It seems to me that if you place the core focus on 'testing anything' that comes your way as opposed to testing continuously as you develop your code, the result tends to be that of "Test Anything. Always Testing." instead of "Test Always. Spend the Rest of Your Time Doing Anything You Want."

Ovid
2007-04-22 12:25:45

M David Peterson: the point is that TAP is simple. Even if a line of TAP is invalid, it doesn't ruin the parsing of the entire document. There are plenty of broken XML parsers and generators out there but TAP is so ridiculously easy that just a minute or two of reading through the spec gives you a grasp of just about everything. The same cannot be said for XML. The problem space for test results is relatively restricted compared to the problem spaces that XML is suitable for and having the full power of XML necessarily introduces the complexity of XML.


TAP is gaining popularity in part because it's not XML. It may not be appropriate for your needs and that's fine, but for those who find it suitable, they're quite happy to have something so easy to use.

chromatic
2007-04-22 23:24:42
Don't forget TAP generators written in Perl 6 as well as PIR (Parrot's native programming language). I'm especially proud of the latter because it allows me to write the tests for the Pheme programming language (built on Parrot) in Pheme itself.
M. David Peterson
2007-04-22 23:33:53
@Ovid,


Fair enough. These are some good points. Thanks for taking the time to bring them up!

M. David Peterson
2007-04-22 23:35:40
@chromatic,


>> PIR (Parrot's native programming language).


Seems I need to do some research. First I've heard of this, though that's not surprising given that I have about as much experience with Perl as I do Fortan. ;-)

M. David Peterson
2007-04-22 23:42:58
@Ovid,


One thing I have been dieing to ask you ever since the first time I saw the pic you have on your profile: What is the object in the left portion of the photo?

Ovid
2007-04-23 01:04:30

M. David Peterson. You know, I should create an "Ovid FAQ" and put that question near the top of the list.


A friend of mine is a professional photographer (link not safe for work) and he invited me over for a photo shoot (not an "adult" one, I should add). While we were there, he started playing around with other ideas and said "here, hold this". Neither of us are sure what it is, but we think it's part of a TV set, from the cathode ray tube. You can see a slightly larger version at my Perlmonks page.

M. David Peterson
2007-04-23 02:21:33
@Ovid,


>> You know, I should create an "Ovid FAQ" and put that question near the top of the list.


That would be fantastic! Thanks! ;-)


re: You friends site: One word: Ouch!


re: The bigger pic: thanks, that helps!

M. David Peterson
2007-04-23 02:23:10
s/you/your
Brianary
2007-04-23 17:09:03
I've nothing against [XML] per se

Uh-oh, here it comes...

but it's not always the best tool for the job

Everybody takes a drink!

After looking at the sea of hands in front of me, waving gently back and forth, I'll then hit them with my follow up question: "how many of you know what an unbound prefix is and how to fix it?" I'll explain that if they hold their hand up, I might just call on them and ask them technical questions about this. I fully expect the sea of hands to evaporate to a puddle.

I wonder if this is any indication of the number of people affected by this corner-case.

Real-world problems implementing XML are legion.

One example, and "another example" which isn't even XML? That hardly supports a claim of "legion".

I bet the contractor could've found a way to mis-implement TAP just as poorly as XML. Don't blame the syntax for the implementation.

I'm not sure what you are getting at in this article. Is TAP a good format for narrative documents? Why is XML not suitable for test results? Would some other standard structured format, like JSON, be better?

The problem is everyone has a favorite pet format that is perfectly suited to their way of thinking, which is sometimes not sufficiently flexible to accommodate an XML approach. This pattern will always lead to dozens of independent libraries for nearly identical problem spaces (slightly differing only in how a small group perceives the problem space) with incomplete platform or environment support. For example, how many .NET implementations does this language-agnostic format include? None? That's a pretty big hole. Java? For a ten-year-old format, implementations are pretty narrow.

How well will this format scale to unanticipated functionality in the future? How extensible is it?

Heck, why don't you write a generator for that? Yeah, go ahead. I'm waiting.

Are you kidding? I was done at "Heck". I'd still be learning your niche format! Dammit, I'm getting tired of having to learn pet formats!

but far, far more verbose

If space is such a concern, compress the XML.

And did you see the mistake in the XML?

Yes, because consistently working in a common format has attuned me to it. If I had to learn a dozen niche formats, I wouldn't be any good at seeing errors in any of them.

chromatic
2007-04-23 23:32:10
@Brianary, I thought the point of XML was to create niche formats. You certainly don't get semantics for free.
Ovid
2007-04-24 00:14:40

Everybody takes a drink!


Why are you being rude to a complete stranger? But then, that's the curse of the Web, eh?


As for your comments, I've learned a long time ago that there's a strange problem with discussing issues. I have to include examples of the issue to make people come to grips with it, but when I do that, people focus so much on one or two examples that they seem to miss how they relate to the larger issue. I suspect this has something to do with relatively shorter attention spans people seem to have today. They focus on a paragraph and not the point. When you wondered "if [Ovid's example] is any indication of the number of people affected by [the unbound prefix] corner-case", it's clear that I didn't get my point across. There are plenty of other questions I could have asked and gotten similar results. Please don't focus on the unbound prefix example.


One example, and "another example" which isn't even XML? That hardly supports a claim of "legion".


This is a blog entry, not a court case. There's no way I could have provided an exhaustive list and it certainly wouldn't be appropriate to do so here. As for the second example, if yet another vendor supplies yet another pseudo-XML format (I've hit plenty and I'm sure others have too) and you don't see how this relates, we'll just have to agree to disagree.


I bet the contractor could've found a way to mis-implement TAP just as poorly as XML.


I suspect that you didn't read the other responses. You can learn the basics of TAP in a couple of minutes. It's flexibility and tolerance make it ridiculously easy to implement.


Don't blame the syntax for the implementation.


Of course I will. With so many people implementing XML incorrectly, it's fair to ask why. Simply saying "they didn't read the spec!" ignores the question "why?" If XML is so simple, why do so many people get it wrong? TAP is ridiculously simple and people get it right.


OK, I'll stop addressing your points one by one and I'm sure others are tired of it, but I can't ignore this one:


If space is such a concern, compress the XML.


Look at the XML snippet I wrote and the TAP snippet which I had before that. They represent the same data. Verbosity affects legibility. There's no way around that. Compression has nothing to do with that.


On the off chance that you feel I have stopped addressing individual points because I can't, please feel free to email me. The domain is cpan.org and I'm "ovid". There's a lot I didn't cover because this is just a blog entry, not an indictment.

Ovid
2007-04-24 00:16:10

Oops. In case it's not clear, my previous response was @Brianary.

Danno
2007-04-24 13:25:27
TAP looks nice, but what gives it better juice than YAML? (I'll admit I don't know YAML's rules very well and that from what I *have* seen, it can get quite complex).
Brianary
2007-04-25 18:18:55
Why are you being rude to a complete stranger?

Calm down, calm down. The problem with rudeness is that it is extremely locale-dependent, and therefore utterly subjective. Imagine a smiley in the first reply.

My problem with that opening is that the "I have not come to praise Caesar..." bit comes off as offensively condescending sometimes, and I've heard it about XML in particular so often, I just had to call it out.

...people focus so much on one or two examples that they seem to miss how they relate to the larger issue.

Perhaps I've misinterpreted this post, but it appears to be a piece of persuasive writing, a logical argument. If this is not the case, just ignore my replies completely.

I guess you may have been starting from the premise that "XML is bad in many cases", rather than attempting to establish that. If that is a premise, and not what you were trying to show, then everything follows fine. It isn't a premise I agree with, but maybe I'm not the intended audience, either.

Otherwise, don't expect to convince anyone based on one example and one non sequitor. Supporting links to previous discussions (particularly any that helped to form your opinion on the subject) would make the post more factual, and less like flamebait.

Don't blame the syntax for the implementation.

Of course I will. With so many people implementing XML incorrectly, it's fair to ask why. If XML is so simple, why do so many people get it wrong?

So many? Really?

Clearly we code in very different circles. I just haven't seen that many poor implementations of XML. Most of the code I see uses the libraries that come with whatever standard language library for building and parsing, and typical implementations don't even venture into namespaces.

In any case, you didn't make even a cursory enumeration of problem XML implementations, so I guess this is also a premise. Of course, if you just assume all the hard topics, and ignore anyone that suggests re-examination of them, I just don't see the point of posting or discussion at all.

Look at the XML snippet I wrote and the TAP snippet which I had before that. They represent the same data. Verbosity affects legibility. There's no way around that. Compression has nothing to do with that.

Granted, but this only applies if you are editing XML in a low-features text editor like MS Notepad or gedit that can't help much. Add syntax highlighting, and I'd wager the effect on readability is pretty negligible. Add an XML editor or specialized UI, and the issue is entirely moot anyway.

Brianary
2007-04-25 18:20:34
Hmm... I hit Post rather than Preview, so that last reply is probably riddled with errors.
Foot Tapper
2007-04-26 22:15:51
TAP looks so clean and easy compared to XML but is limited to 3 fields per line. You could make TAP a bit more flexible by using delimiters other than spaces. Change
ok 1 - input file opened
to something like
ok1input file opened


Limiting an entryy to one line is a pig of a restriction. You could wrap all the values in one element like so
ok1
input file opened


That 1..7 looks weird. It would be easier to wrap the result set in an element such as .


Yeah, that pretty much fixes up TAP to be flexible. Maybe just an extra element at the start to indicate the version of TAP and maybe some pointers to the definition of TAP so people know where to look when they find a TAP file.

Mike
2007-04-30 08:02:31
fuckin schools blocking the word proxy and all the other site behind it plz help
Rick Jelliffe
2007-04-30 20:10:10
There are a multitude of line-oriented formats; CSV is of course the major one: you can even get subfields by using different delimiters.


The trouble with line-oriented formats is that text editors often add newlines themselves, when auto-wrapping. You can corrupt the document just by opening it in an editor! And these are very difficult to detect, if you have full lines. These issues and their trade-offs were well-known and discussed in the 70s and early 80s when GML and SGML was developed: the terseness and formatting of line+tab oriented formats is not new. (In fact, SGML allowed line-oriented sub-formats to be declared; you could embed TAP or troff inside angle-bracket containers.)