DocBook Elements in the Wild

by Keith Fahlgren

The newly-formed DocBook SubCommittee for Publishers is currently researching commonly-used DocBook elements to explore whether a subset of DocBook 5.0 would be generally useful. I've been spending a lot of time getting O'Reilly's (DocBook 4.4) content into our new Atom Publishing Protocol repository, and decided I'd rather explore the commonly used markup in our own content instead of making up my own (unfounded) opinions.

9 Comments

Michael Day
2007-05-02 00:49:23
It's funny that there is only one single <formalpara> element in the histogram; a lonely, unloved element, outshined by it's more popular if less adorned relative, <para>. Which book was it used in? :)
Keith Fahlgren
2007-05-02 06:52:44
Michael: <formalpara> certainly isn't something we've used a lot in the past, but the one book that did use it, Unicode Explained, shows the relatively ugly approach we've taken more recently (2007 books produced in DocBook) to marking up the printing history of a book:
<printhistory>
<formalpara>
<title>First Edition</title>
<para>June 2006</para>
</formalpara>
</printhistory>
Keith Fahlgren
2007-05-02 07:07:03
Sean McGrath shared his own findings about element distribution always looking like a power graph here: http://seanmcgrath.blogspot.com/2004_05_23_seanmcgrath_archive.html
John Craft
2007-05-06 07:13:07
In DocBook XML, how do you know how many pages are in a book without rendering out to PDF using XSL-FO? Are pages defined in the original DocBook XML or, if not, how do you determine the number of pages?
Keith Fahlgren
2007-05-06 08:02:35
John: I used the number of pages in the printed book (regardless of whether it was typeset using DocBook or not). We've designed our customizations to the DocBook-XSL stylesheets as a mirror to our other typesetting systems, so the pagecounts usually end up being similar.


We don't have any notion of the pages in our DocBook markup itself, so the numbers above are just there to give a general sense rather than anything definitive.

malakas
2007-05-06 17:59:28
Looks like another example of Zipf's law in full effect!
Keith Fahlgren
2007-05-07 10:57:10
Here's a followup with some newer content: http://www.oreillynet.com/xml/blog/2007/05/docbook_elements_in_the_wild_a.html
Kurt Cagle
2007-05-19 08:33:39
Interesting about <formalpara> - I actually use that construct quite often for creating titled bulleted points:



<listitem>
<formalpara>
<title>Relevance</title>
<The concept that information must contain some significance to other information</para>>
</formalpara>
</listitem>


What do you use for this kind of construct, as I can't imagine this isn't a markup requirement for O'Reilly?

Keith Fahlgren
2007-05-25 08:52:18
I actually use that construct quite often for creating titled bulleted points


Kurt: The O'Reilly style for that would be to use a <variablelist> rather than other markup (you can see the huge number of <varlistentry> in the graphs above).