PyYaml Mini Review

by Jeremy Jones

I have looked at several podcast grabber applications and have been unhappy with each one of them in one way or another. So I decided to roll my own. I've been storing the configuration in XML, but decided to use a different format that 1) looks more human readable, and 2) provides a good method of serialization and deserialization. I decided to look at Yaml. Yes, I know that I can serialize and deserialize XML and even do pretty XML, but that kind of violates intention number 1 above.

According to the Yaml website, Yaml is a "straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python." I found a Python library for dealing with Yaml called PyYaml. It installed easily with "easy_install" and appears pretty easy to work with.

If I have a Yaml file that looks like this:

db_file: /home/jmjones/podcasts/oreilly.db
description: OReilly Future
download_dir: /home/jmjones/podcasts/oreillyfuture/
mode: OK
db_file: /home/jmjones/podcasts/change.db
description: Accelerating Change
download_dir: /home/jmjones/podcasts/change/
mode: OK

Here's all it takes to deserialize it:

In [1]: import yaml

In [2]: yaml.load(open("podcasts.yaml", "r"))
[{'db_file': '/home/jmjones/podcasts/oreilly.db',
'description': 'OReilly Future',
'download_dir': '/home/jmjones/podcasts/oreillyfuture/',
'mode': 'OK'},
{'db_file': '/home/jmjones/podcasts/change.db',
'description': 'Accelerating Change',
'download_dir': '/home/jmjones/podcasts/change/',
'mode': 'OK'}]

And given the exact data structure we took in, here's all that's required to get it back out to Yaml:

In [5]: y = yaml.load(open("podcasts.yaml", "r"))

In [6]: print yaml.dump(y, default_flow_style=False)
- db_file: /home/jmjones/podcasts/oreilly.db
download_dir: /home/jmjones/podcasts/oreillyfuture/
mode: OK
description: OReilly Future
- db_file: /home/jmjones/podcasts/change.db
mode: OK
download_dir: /home/jmjones/podcasts/change/
description: Accelerating Change

If you don't give it the "default_flow_style=False" keyword argument, the output is not as pretty as otherwise:

In [4]: print yaml.dump(y)
- {db_file: /home/jmjones/podcasts/oreilly.db, download_dir: /home/jmjones/podcasts/oreillyfuture/,
description: OReilly Future, mode: OK}
- {description: Accelerating Change, db_file: /home/jmjones/podcasts/change.db, mode: OK,
download_dir: /home/jmjones/podcasts/change/}

It looks more like a plain Python dictionary. When I get a chance, I'm going to rework my podgrabber to incorporate Yaml.


2006-06-23 12:05:38
How does it compare with ?
Jeremy Jones
2006-06-23 14:27:57
JSON is serialization to Javascript. This would probably have worked fine, but I wanted something more human readable. Here's a dump of "y" using simplejson for a comparison:

In [16]: simplejson.dumps(y)
Out[16]: '[{"download_dir":"/home/jmjones/podcasts/oreillyfuture/", "db_file":"/home/jmjones/podcasts/oreilly.db", "description":"OReilly Future", "mode":"OK"}, {"download_dir":"/home/jmjones/podcasts/change/", "db_file":"/home/jmjones/podcasts/change.db", "description":"Accelerating Change", "mode":"OK"}]'