Linking Architecture and the U.S. House of Representatives
by Bob DuCharme
Related link: http://thomas.loc.gov/home/xml_help.html
The Library of Congress' Thomas web site (named for a former resident of the town where I live) is now making some new legislation available in XML. The XML points to XSLT stylesheets that format it for viewing, so that if you go right to http://thomas.loc.gov/home/gpoxmlc108/h3701_ih.xml with a browser you'll see centering, bolding, and even links. (Do a View Source to see the markup.) Not all the bills available on their web site are available in XML as well, but I found one directory with links to over 200 XML documents. Their XML Display: Help page has a bit more background, and http://thomas.loc.gov/dtd/ has links to their DTDs.
The XML document mentioned above includes a working link, and I was very pleased when View Source showed me that it wasn't an HTML a/@href one. The comments in the DTD that the document references describe an interesting evolution of its linking architecture: there was an attempt, later abandoned, to keep it in line with XLink; I was tickled to see the phrase "architectural form" come up in one comment. Ultimately, they modeled the links around the relationships between their particular document types instead of trying to shoehorn these relationships into some wider linking standard, and then the XSLT stylesheet that prepares it for web delivery turns the links into a/@href links. The following shows the attribute list declaration for one of the DTD's linking elements, external-xref:
<!ATTLIST external-xref legal-doc (usc | public-law | statute-at-large |
bill | act | executive-order |
regulation |senate-rule | treaty-ust |
treaty-tias |usc-appendix | usc-act |
usc-chapter | usc-subtitle) #IMPLIED
parsable-cite CDATA #IMPLIED>
People who conflate linking and hypertext forget that the former is about relationships between data and the latter is about the presentation of those relationships. The markup community learned long ago that separation of content structure from content presentation is a Good Thing—this realization was actually a key driver for the growth of this community. The notion that content relationships and the user interface to express those relationships are also distinct (or rather, that keeping them distinct can offer the same advantages as keeping content structure and content presentation separate) is not quite as widespread, so I was happy to see a great example used in a publicly accessible tax-dollars-at-work project. The links describe the relationships in terms of the document types themselves, not in terms of the UI for expressing those relationships. It says that a House resolution has an external reference to a legal document, which may be part of the United States Code, a public law, a statute, etc.; a stylesheet then converts this relationship to more broadly-used markup necessary to display it as hypertext on a web browser. If another delivery medium uses different markup to describe hypertext links, another stylesheet can convert the same House Resolution XML to the appropriate markup for the other medium. It's a great model.
Should they have done it differently?
Google trick in this vein....
When poking around government sites for XML resources, Google's filetype searching can be fun: