I have a few thoughts:
One: There's nothing wrong with using REST and XPath in unison. After you've queried the site with REST, you can make a DOM object, perform some XPath magic, and then then feeding the result set to an XSLT processor. I've recently become a big fan of XPath, as I've started to play with PHP 5's SimpleXML extension, which has integrated XPath query support.
Two: That article implicitly implies that you have complete control over the input XHTML files. With a published REST interface, there's at least the hint of a promise of a fixed API. If you're screen scraping (regardless of the legalities involved), there's no social contract between you and the site to maintain a working relationship. (i.e. They can redesign their data model at any time and potentially break your scripts.) If a site publishes a REST interface, then you know it's unlikely to shift without warning.
Three: For your own site, OTOH, you may want to make your REST interface a front-end for an XPath query of your XHTML documents, but that assumes all those documents actually exist as real files on the hard drive or are stored in a XML database. If you're Amazon, you probably don't have millions of individual HTML files; they're probably springing out of some SQL databases into a template, so I don't think that's really an option here.