O'Reilly Mac OS X Innovators Contest
Innovators Contest. Mac DevCenter. Mac OS X Conference.
  Contest Rules     Entry Form     Contest Prizes  

 
LinuxDevCenter.com
MacDevCenter.com
WindowsDevCenter.com
ONJava.com
ONLamp.com
OpenP2P.com
Perl.com
WebServices.XML.com
XML.com
Perl
Java
Python
C/C++
Scripting
Web
Web Services
XML
Oracle
Networking
Security
Databases
Linux/Unix
Macintosh/OS X
Windows
.NET
Open Source
Wireless
Bioinformatics
Game Development
Enterprise Development 
The Missing Manuals




Creating Sherlock Channels, Part 2

by Harold Martin
04/01/2003

Editor's note -- In part one, Harold Martin showed you how to get started creating Sherlock channels. Now it's time to dig into the code...

Getting to the Code

Just as you spend time thinking about your interface before you make it, you should also spend time thinking about your code, what you want it to do, and how you want it to react to the UI. There is a tremendous difference between code that just works and code that is fun to work on.

In PB, open Channel/Channel.xml. Here's the code that will be powering your channel. You can delete the green text near the top (it just contains the default Apple copyright information). Next you'll see the <initialize> tag. It contains the code that will be run when your channel is first loaded by Sherlock. Here's where you'll want to put all the code that will modify any behavior of the channel before Sherlock. The code that is there now is just what we need, so we'll move on. Next you'll see <triggers>. This is where most of your code will go. Everything inside here will be a tag of the form:

<trigger path="path" language="XQuery | JavaScript">
code here
</trigger>

Getting and Displaying the Latest Hints

The first thing we want to do when the channel starts is to load the newest hints and display them in the table. There are two ways we can go about getting these:

  • Parse the main page's HTML
  • See if the site has an RSS file, which is much easier to parse

Thankfully, a search shows that the site does have an RSS file (at http://www.macosxhints.com/backend/geeklog.rdf) that we can use. But don't worry if you wanted to parse HTML, we'll do that in a little while. Since we want to open the newest hints when the channel starts up, we'll put the code for parsing the RSS file in the Internet.didInstall trigger, which executes right after <initialize>. But since we use XQuery for parsing, we'll want to change the language attribute of this trigger from JavaScript to XQuery. The first step in parsing the file is to get it from the Web. To do that, we'll put this code inside the Internet.didInstall trigger:

let $httpRequest := http-request("http://www.macosxhints.com/backend/geeklog.rdf")
let $rss := http-request-value($httpRequest, "DATA")

http-request is how the request is set up. It can have additional options, but we won't need them here. They are documented in the Apple Sherlock reference. Here, http-request-value gets the value of the request, i.e. the data, in this case the file, at a particular URL.

So now that we have the RSS data, we have to parse it. To do that, we turn to the next major tool in out toolbox, the XPath Finder channel included with the Apple Development Channels.

Related Reading

JavaScript: The Definitive Guide
By David Flanagan

What we are trying to do is find a pattern of the names hints and their links, so that we can present them to the user. Finding this in a page will often take a bit of work and if you don't know HTML/XHTML you're certainly disadvantaged here (you should probably look at HTML & XHTML: The Definitive Guide, 5th Ed.). This channel looks through the nested tags in the document it is given and then presents them to you in a column view. You navigate through the columns until you find the pattern of names/URLs (and any other information that you might need in a different channel) you are looking for, and the channel will show the XPath for the item in the text box along the bottom.

Enter http://www.macosxhints.com/backend/geeklog.rdf in the field where http://www.apple.com is and uncheck Render in the lower right corner. The reason we uncheck render is because it is designed to help when we're parsing HTML, which we're not. Explore a bit and see if you can find the pattern of names/URLs. In the first two columns there's only one choice, so we choose them. In the third column however, we see quite a few elements. Searching through this column, we find that pattern that we're looking for. Every hint is inside an <item> and inside each of those is a <title> and a <link>. This code extracts the title and link from each item:

let $goods := for $item in $rss/rss/channel/item
return dictionary(
        ("description", $item/title/text()/convert-html(.)),
        ("doubleClickURL", $item/link/text()/convert-html(.))
)

let $goods := for $item in $rss/rss/channel/item gets each item out of $rss and stores in it the variable $item. It then extracts the title out of each $item and stores it in description (which is the default identifier of the name column) and extracts the link and stores in it doubleClickURL which, as we'll see later, is a special key when used with a table. The /text()/convert-html(.) at the end of each path extracts the text from each of the url and then converts it to a format Sherlock can understand.

A dictionary is special type of data structure that you access via keys. In the example above, the keys are description and doubleClickURL. Every dictionary will always be of the form dictionary(("key", "value"), ("key", "value")). An important thing to remember is that a comma should come after every key/value pair in the dictionary except the last one. Each dictionary that is returned is stored as a row in $goods. Now to put $goods in the table, we say:

return dictionary(
    ("Internet.SearchResultsTable.selectedRows", null()),
    ("Internet.DetailHTMLView.HTMLData", ""),
    ("Internet.SearchResultsTable.dataValue", $goods)
)

This is a dictionary being returned at the end of a trigger, and because this dictionary is being returned within the trigger and not any other statements, the keys are mapped to paths. First we make sure that no row is selected, then that the HTMLView is cleared, then we give the $goods to the results table. When a dictionary is return'd to a table's dataValue, each row in the dictionary matches to a row in the table and each dictionary key matches to a cell in the table (doubleClickURL matches to a hidden cell).

You can now open up your channel in Sherlock (be sure to reload it if Sherlock is already open) and see a list of the newest hints. You can click on a hint to watch it load in the HTMLView or double click one to open it in your browser.

Displaying the Selected Hint

One thing that we'd really like to have to set our channel apart from a regular browser is to display only the hint in the HTMLView and not the whole page. To do that, we first need to find out what the path is for showing the page. The trigger path is Internet.SearchResultsTable. Right now, it's just grabbing the URL from the selected row's doubleClickURL, but we want it to take the URL, use XPath to extract the hint itself and put that in the HTMLView. Once again, we start in the XPath Finder channel.

Double click one of the hints to open it in your browser and copy the URL to the XPath Finder channel. Since we're parsing HTML, we'll want to leave Render on. You can see several choices in the first column, so it'll take some searching to find the right path. You may wonder why you can select tags like <img>, <br>, and <hr>. Remember that XPath was designed for parsing XML, and in XML every tag must be explicitly closed, otherwise subsequent elements show up as children of that tag. We find the path that will extract the hint from a page is: /img/table/tr/td[1]/table/tr/td/table/tr[4]/td[2]/span

In the Internet.SearchResultsTable trigger, we'll first want to delete it's return. You might notice that it doesn't return a dictionary. This is possible because output="Internet.DetailHTMLView.url" is specified in the trigger tag itself. The problem with this, of course, is that we can't use it to return to more that one path. But since we only need to modify HTMLView, we can just change Internet.DetailHTMLView.url to Internet.DetailHTMLView.htmlData . But we also need to get data into the trigger, which we can do via the input attribute. You can specify multiple comma separated variable assignments of the form variable=path, but we won't need to change any of the inputs. To get the page and extract the hint, we'll add:

let $httpPageRequest := http-request($selectedItem/doubleClickURL)
let $page := http-request-value($httpPageRequest, "DATA")
let $hint :=
  data($page/img/table/tr/td[1]/table/tr/td/table/tr[4]/td[2]/span)
return $hint

The first two lines should look familiar from earlier code. The third line (which should appear as one line, but is broken into two here for the sake of space) extracts the hint from the page, but since we don't need to repeatedly find something, we don't need any for loops. We then return the hint, and since it's not a dictionary, Sherlock uses the output attribute of the trigger (make sure to have that set to Internet.DetailHTMLView.htmlData). Now open up the channel in Sherlock (this is the last time I'm going to remind you to refresh it!), and you can click on one of the hints, wait a few seconds for it to load, and you will see the hint in the HTMLView.

Searching

The last major piece we want to add is the ability to search MacOSXHints.com. The default channel's structure is to call Internet.SearchButton.action which clears the table, starts the network arrows spinning, and then calls DATA.action.performSearch to actually get and display the results. This is a good structure (for reasons you'll see in a moment) and we'll use it.

Start by deleting all the code in DATA.action.performSearch. We now need to find a search page to parse. This URL is from a search from the front page of the site for "sherlock": http://www.macosxhints.com/search.php?query=sherlock&type=stories&mode=search

We could jump right in to parse this, but there's a problem. We want the user to be able to search for anything, not just Sherlock. We also need to properly escape any space characters (with a "%20") and we want to make sure all URLs are absolute since the search might give us relative URLs. We can accomplish all that with these three lines:

let $base := http-request("http://www.macosxhints.com/")
let $httpSearchRequest :=
  http-request(string-combine(("http://www.macosxhints.com/search.php?
  type=stories&mode=search&query=
  ",string-combine(string-separate($query, " "),"%20")), ""))
let $htmlSearch := http-request-value($httpSearchRequest, "DATA")

(Line 2 has been broken into four lines and indented here, but should appear on all one line).

The first line specifies the base URL to use later on when we're parsing the links. The second line, starting with the innermost function working outward, grabs the query (with the variable specified with the trigger's input attribute) and splits it with a " " as the separator, combines the split query with "%20", then combines it with the rest of the URL and then readies the http-request. The third line should look familiar by now.

Pages: 1, 2

Next Pagearrow





Copyright © 2000-2006 O’Reilly Media, Inc. All Rights Reserved.
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.
For problems or assistance with this site, email