[Sean McGrath:QOTD] Hashtables, Pre-Cooked XML, and You + Recipes For Blazing Fast XML Processing

by M. David Peterson

I don't agree with everything Sean McGrath writes in his latest post as I think there are a lot of really smart people who have developed some really smart ways to handle the variable width nature of XML w/o turning to malloc() every time the length of an element or attribute name reaches past any given preset constraints. That said, I can't help but agree with,

Memory-based caches of "cooked" data structures are your friend.


For you .NET developers here's a pre-written recipe that handles all of the dirty work of determining whether to create a new XmlReader or return the in-memory cached version based on the generated ETag for the source file (see Extended Overview below for a deeper understanding of how this works.) To use this recipe you need to do nothing more than create a new XmlServiceOperationManager when your application starts up like so,

XmlServiceOperationManager myXmlServiceOperationManager =  new XmlServiceOperationManager(new Dictionary<int, XmlReader>());

and then use the GetXmlReader method of the XmlServiceOperationManager, passing in the Uri (an actual System.Uri object, not the string value of the URI, though I guess it would be easy enough to create an overload that takes the string value of the URI. Another task for another day. ;-)) of the desired XML file to get an XmlReader in return like so,

XmlReader reader = myXmlServiceOperationManager.GetXmlReader(requestUri);

That's it! Now you can use your "new" XmlReader however you might need and the next time that file is requested for processing if it hasn't changed you save all of the time it would normally take to read the source file and convert it into an XmlReader which is fairly significant.

Source code and extended explanation inline below. Enjoy!

Oh, and stay tuned for the next installment of this recipe where we learn how adding,

1 Part memcached
1 Part ETag's


1 Part GZip encoding

... can turn your lame a$$ performance sucking web application into a lean, mean, kick a$$ performing machine. For a precursor, see Joe Gregorio's AtomPub presentation slides from this past OSCON. I assure you, it's worth every second you spend studying this gem of a resource.


2008-01-12 04:30:12
This code looks very interesting. I have used XML functions in .Net for years, and I know the value of eTag header for caching, but I never seen anything like this.
Something seems wrong though. I would love to be able to test the code, but I need a little help getting the complete code from SVN as well as setting up the Saxon and other external libraries.
Who do I have to knock off?
M. David Peterson
2008-01-12 15:57:30

Actually, you don't need Saxon or any other external libraries. That was in there by force of habit. I'll take it out on the SVN copy.

What problems are you having checking out the repository? There's some external dependencies, but none of them should require authentication. Let me know what you're running up against and I'll look into it deeper.

M. David Peterson
2008-01-12 16:18:53

>> Something seems wrong though.

I think you might be right. I just read through the code above and it seems that I described the process incorrectly. I'll update the post to represent what's actually taking place which is that the hash of the URI is used as the key for both tables. Using the hash of the URI as the key ensures I can save the cost of removing the key which really isn't all that much, but when the entire purpose of the code is to *save* resources then it makes sense to save as much as you possibly can, everywhere you can.

With this in mind, if you see somewhere I can make the code more efficient, please don't hesitate to let me know. Every tick counts! :D