Rewriting podgrabber, part 2

by Jeremy Jones

In my last post on the topic of rewriting my podgrabber utility, I promised to post the rewrite-code-in-progress to a Bazaar repository. You can branch from here if you're interested. In this post, I'm going to discuss the paradigm I'm following for getting files from a webserver, pulling them onto a computer, then onto a portable media device.

In the current version of podgrabber, there was a concept of a download manager which would take a URL and save the file to a particular directory. This download manager was built with a small amount of extensibility in a very clunky way. I looked at the URL in order to determine how to download the file. After getting the files from the webserver to my computer, a single function would synchronize files between my computer and my portable media device.

This approach works, but it doesn't provide a cohesive approach to the problems. It also isn't very extensible. In order to come up with new file sources (such as FTP) would probably involve a lot of cut and paste and an ever-growing download method. And synchronizing downloaded files to anything other than some MP3 player that shows up as a USB disk drive would prove quite painful.

4 Comments

binky
2007-05-24 10:37:42
Reminds me of: http://jakarta.apache.org/commons/vfs/index.html
...and in Ruby, http://rio.rubyforge.org
Doug Hellmann
2007-05-26 04:57:56
Hi, Jeremy,


I'm curious about why you chose to use urllib and parse the RSS feed directly, instead of using Mark Pilgrim's feedparser module. It lets you handle RSS and Atom feeds transparently, and also understands how to tell if the feed content has changed since it was last fetched (if you cache the modified date and etag). Maybe I'm jumping the gun, and you plan to add that optimization at a later phase.


Doug

Jeremy M. Jones
2007-06-07 09:05:53
Hi Doug,


Since I'm only interested in items which have enclosures (which I think is common between RSS and Atom - not totally positive) it was pretty easy to just parse them out manually. If I were doing anything more than that, I'd probably use Mark's feedparser. I'm still going to look at it eventually and see if it adds any greater value to what I'm doing.


Thanks for the post!

Doug Hellmann
2007-06-10 18:27:14
The feedparser module makes working with feeds incredibly easy. You don't have to worry about the parsing (which becomes a real pain when you encounter a feed that isn't well-formed) or format (since it handles Atom and RSS). If you track the time when you last fetched a feed, feedparser will only download the full feed again if there have been changes to its contents. That can speed up processing and reduce network overhead.


Have a look at http://blog.doughellmann.com/2007/04/pymotw-queue.html and http://www.doughellmann.com/PyMOTW/fetch_podcasts.py for some simplistic examples. I have some more complex code I wrote for http://www.CastSampler.com to take advantage of the timestamp checking, but I haven't cleaned it up for general release yet. I'll see if I can get to that in the next week or so.


Doug