Going DITA

by Constantine Hondros

It's hard to go to a content management or publishing technology conference these days without there being a presentation on DITA — the Darwinian Information Typing Architecture. For the uninitiated, DITA is an XML architecture for authoring and publishing topic-based content, typically technical documentation. The brainchild of IBM, where it is used internally for many documentation projects, DITA is now an open-source standard under the aegis of OASIS. A reference implementation containing a toolkit is available from Sourceforge.

So what's all the fuss about?

While single-source XML publishing has benefits that are well lauded — for example, content reuse and multi-channel publishing — implementing it can be an absolute battle for an organisation. A large project starting from scratch needs major upfront development: extensive information analysis; the development of company-specific DTDs, and significant programming to create publishing processes. Not to mention the effort of migrating legacy content to the new formats, installing new editing software, training users to think in XML, and possibly paying for a content management system.

DITA offers help at two of these stages, by providing a set of thoughtfully designed, extendable DTDs and the tools to publish conforming documents to multiple channels. As the current surge in interest attests, it can make the process of adopting single-source publishing an easier pill to swallow.

Much thought has been put into the development of the core topics that comprise DITA, and the result is a set of semantically rich DTDs and schemas. These encapsulate three types of topic required in the majority of technical documentation — conceptual, task-oriented, and reference information.

On the publishing side, the toolkit provides a set of XSLT stylesheets driven by an Ant pipeline which transforms conforming XML documents into HTML, PDF, JavaHelp, EclipseHelp and more.

However, the real beauty of DITA — and what turns DITA into an XML architecture rather than application — is the ability to specialise core topic-types into new document classes that more closely encapsulate a given information domain. DITA prescribes a method for doing this that lets your new document classes retain compatibility with existing XSLT transforms.

Specialisation works through well-understood XML syntax rules — external entities and attributes. Your new document class pulls in the element definitions of a core topic type, and overrides certain elements where necessary with elements of your own design. But crucially, where a new element overrides an existing one, you provide a mapping to the overridden one using an attribute called class.

This class attribute is generic to all DITA elements, and is critical for the selection of template rules during DITA XSLT transforms. By providing a mapping, you ensure that your new element matches the same XSLT templates as the element it overrides.

This is seriously useful stuff. Using a few simple rules of XML I can subclass a core DITA DTD, creating an entirely new class of document that encapsulates my own information domain. But this new document class is born with multi-channel publishing capabilities already in place.

There are exciting possibilities for information interchange if, as looks likely, a significant body of organisations start basing their information designs on DITA. For, all content based on a DITA specialisation is theoretically interchangeable as any specialised topic can be generalised back into one of the core DITA topics. (The toolkit contains XSLT transformations to perform this backward-mapping - again the class attribute which declares an element's path of derivation is critical).

Although this results in some loss of granularity, it does means that it's an awful lot easier to re-use another organisation's content for your own purposes if it's based on a DITA derivation than if it's based on a bespoke DTD. For example, integrating third-party XML data into your site or CMS becomes less daunting if it shares the majority of its element vocabulary with your own data. This is a great development : after all, effective information interchange is part of the promise of XML.

The DITA toolkit, in the hands of an XML-savvy documentation group can significantly lower the barrier to adopting single-source XML publishing. It’s possible for a team to create its own topic specialisations basing them on the rich DITA core topics, then hit the ground running with basic publishing transformations already taken care of by the open-source toolkit.

Now for the disclaimer : I'm not an evangelist or contributor to DITA; however I am currently involved in a project to migrate a mission-critical document corpus to the DITA architecture. There will be issues along the way, and I will be blogging at regular pit-stops as the project progresses.


2005-11-14 08:37:55
Great Intro Article on DITA

Thanks for the great article on DITA. Please keep me informed of your "lessons learned" as you move forward with your DITA project.


Scott Abel

2005-11-17 20:18:06
Please keep us informed
Each time you post a new episode in your DITA saga, could you send a post to dita-users on Yahoo! Groups, please?
We can all learn from tales from the trenches.