Thank You, HTML Tidy!
I use HTML Tidy in a well-tuned shell alias that cleans up HTML from articles and weblogs before I post them. We use a subset of XHTML on the O'Reilly Network, and this wonderful utility turns poor HTML (especially converted from word processor files) into valid XHTML. It's simple to parse that with an XML parser to transform into something useful and clean.
I've even used it on hand-written HTML just to make sure things were correct. It's a great utility I use almost without thinking. Thank you, developers of and contributors to HTML Tidy!
|HTML Tidy truly is one of the best gifts to the net. In library form it is quite handy indeed for spiders, as you can convert a HTML document into an XHTML document via the proper parameters. From there the whole world of XML tools are available. Quite handy indeed :)|
|chromatic - care to share your html tidy settings? also, anybody care to write HTML::Critic ?|
@curious, my settings are almost embarrassingly simple:
|There's also HTML::Tidy, the Perl wrapper for the tidy library. We're sort of overhauling it, but it's slow going. Any help would be welcomed.|
|There's a Firefox extension which embeds Tidy into Firefox. It gives you a valid/invalid indication for each page viewed using an icon at the bottom of the Firefox window, and the View Source window shows you the error detail and gives you the option of cleaning up the code to make it valid XHTML (or HTML). The extension can sometimes be a bit of a chore to set up, but it's incredibly useful once it's running.|