Thank You, HTML Tidy!

by chromatic

I use HTML Tidy in a well-tuned shell alias that cleans up HTML from articles and weblogs before I post them. We use a subset of XHTML on the O'Reilly Network, and this wonderful utility turns poor HTML (especially converted from word processor files) into valid XHTML. It's simple to parse that with an XML parser to transform into something useful and clean.

I've even used it on hand-written HTML just to make sure things were correct. It's a great utility I use almost without thinking. Thank you, developers of and contributors to HTML Tidy!


5 Comments

Josh Peters
2007-02-12 10:36:11
HTML Tidy truly is one of the best gifts to the net. In library form it is quite handy indeed for spiders, as you can convert a HTML document into an XHTML document via the proper parameters. From there the whole world of XML tools are available. Quite handy indeed :)
curious
2007-02-12 14:41:17
chromatic - care to share your html tidy settings? also, anybody care to write HTML::Critic ?
chromatic
2007-02-12 14:47:15
@curious, my settings are almost embarrassingly simple: tidy -asxml -c. Yet that gets the job done.
Andy Lester
2007-02-13 08:18:02
There's also HTML::Tidy, the Perl wrapper for the tidy library. We're sort of overhauling it, but it's slow going. Any help would be welcomed.


http://code.google.com/p/html-tidy/

Chris Tyler
2007-02-16 04:01:35
There's a Firefox extension which embeds Tidy into Firefox. It gives you a valid/invalid indication for each page viewed using an icon at the bottom of the Firefox window, and the View Source window shows you the error detail and gives you the option of cleaning up the code to make it valid XHTML (or HTML). The extension can sometimes be a bit of a chore to set up, but it's incredibly useful once it's running.