Stop making those gigabyte XML files, already
by Uche Ogbuji
I've been hollering this for years now (softly counseling in the case of my clients), and I'm glad to hear others giving the same advice. As no less a sage than Mike Kay says:
"I wonder whether [creating huge XML files] is a wise way of using XML. Even with XML databases, most databases are optimized to handle large numbers of small/medium documents rather than a single gigantic one. I don't think that using an XML document as a replacement for a database is a particularly good idea. It's not the job it was designed for."
Yes folks. XML is not designed to be a monolithic database instance implementation. If you're dealing with gigabyte XML files, I can almost guarantee your design is broken somwehere. Between modern file systems and modern archive formats and tools, there is no reason not to decompose XML into reasonable chunks.
Update: for a bonus, see Kay's argument against some overcooked RDBMS dogma. I strongly agree with him here, as well, even though I'd guess Fabian Pascal and gang are still looking for scalps of such heretics.
The problem is that linking isn't intuitive...
...to anybody used to the HTML model of linking that was so simple. There may be now a standard means of referencing another document (xlink), but that standard isn't easily understood, and has the added hassle of making it so that people writing otherwise simple XML have to deal with the complexity of other namespaces/schemas. In other words, their own business knowledge (what does MY xml file represent) is all mixed up with the technical knowledge needed to process a linked document correctly.