Google Web API

by Rael Dornfest


In "Inventing the Future", an article for the InfoWorld CTO Forum last week in San Francisco, Tim O'Reilly writes:




Why would a company that has a large and valuable data store open it up [via XML-based Web Service APIs



My answer is a simple one: because if they don't ride the horse in the direction it's going, it will run away from them. The companies that "grasp the nettle firmly" (as my English mother likes to say) will reap the benefits of greater control over their future than those who simply wait for events to overtake them.]?





O'Reilly Emerging Technology Conference.


Rael Dornfest is a
Researcher at O'Reilly & Associates focusing on technologies just
Rael Dornfest beyond the pale. He assesses, experiments, programs, and writes for the O'Reilly network and O'Reilly publications. Dornfest is Program Chair of the O'Reilly Emerging Technology Conference, May 13-16, 2002 in Santa Clara, CA.






Consider the nettle firmly grasped. The Google Web API opens a dialogue with the developers and researchers inventing the next Internet and quite possibly shaping the future of Google itself.
By exposing its cache of over 2 billion Web pages via simple Web services, the Google Web API is a breath of fresh air in a specification-dense yet implementation-sparse arena.




John Piscitello, Project Manager for Google Web APIs, says, "In part, we're simply responding to developers who have been asking for Google to try something like this. We also see Google Web APIs as an opportunity to collaborate with developers who have great ideas for making the web more accessible and useful."




What better way to interview potential new programming talent?




And Google offers a good lesson for others considering testing the open Web Services waters in its 1,000 search-per-day / 10 results per search limits. Opening up your data store doesn't mean opening the floodgates; there's plenty of room for embracing experimentation without simply giving everything away -- you only stand to learn.





Google software engineer Nelson Minar will be talking about "Google and Web Services" at the O'Reilly Emerging Technology Conference, May 13-16, 2002 in Santa Clara, CA.



Web Services Get Street Cred




When the story leaked last week, the path it took and people who offered up their $0.02 was a veritable who's-who of the Google API's target audience: from initial post on the ruby-talk mailing list, to Geek journal
Slashdot,
Userland's Dave Winer, Weblogger Cory Doctorow, IBM SOAP-builder Sam Ruby, and, of course, it spread rapidly through the Weblog community, evidenced by its immediate rise to #1 on DayPop.




News.com, Wired, et al didn't say a word.




Much of the current noise around Web Services, both in the media and offerings, is all about business process, back-end integration, B2B transactions and remuneration, EDI-replacement, and the like -- way beyond the purview, or at least interest, of many an intelligent coder. Complex specifications overshadow simple, yet often more interesting, implementations. High order discussions of workflow orchestration seldom include more than a nod to the kind of *nix-style pipelining that makes for interesting cross-pollination and unitended consequences. Not that some share of all this isn't rather important. It just doesn't have much in the way of street cred.




The Google Web API, while hopefully grabbing the attention of more traditional Web Services types, is really for who Tim O'Reilly affectionately refers to as "alpha geeks":




The alpha geeks are often a few years ahead of their time. They see the potential in existing technology, and push the envelope to get a little (or a lot) more out of it than its original creators intended. They are comfortable with new tools, and good at combining them to get unexpected results."



Google's arrival at the Open Services experimentation party finds them in good company. Userland's Radio Userland is a wellspring of DIY Web Services bootstrapping. Jabber-RPC transports XML-RPC messages over the Jabber instant messaging framework. Watson provides a stunning example of putting a GUI front-end on Web Services. My own Meerkat Open Wire Services provides open URL-line and XML-RPC interfaces which have reaped some unintented yet wonderful uses.




These are the sort of grassroots projects that finally put to rest what Jon Udell refers to as "the tired stock-quote example" -- not to mention the equally tiresome state number and calculator interop favourites.



What's in the Offing?




Diving into the package, what the Google Web API offers is a
SOAP (Simple Object Access Protocol) interface to searching Google's index, accessing info and Web pages from its cache, and checking the spelling of words (even proper names) from the comfort of Google.com's standard search syntax.




A freely downloadable Developer's Kit contains:




  • A complete API reference describing the semantics of method calls and fields

  • Sample SOAP request and response messages

  • Google Web API WSDL file

  • A Java library, example program, and Javadoc documentation

  • A sample .NET program

  • A simple SOAP::Lite-based Perl script

  • README, Licensing, and so forth




Getting started is about as easy as 1-2-3. 1. Download the Developer's Kit. 2. Create your Google Web API account. 3. Code. Each account key is entitled to 1000 queries per day, so use them wisely.



Getting Down to Brass Tacks




Enough preamble; let's dive into some of the samples.




Note that I've replaced my actual key (a not particularly attractive string of characters) with X's; I only have 1000 queries a day and so guard them jealously. Get your own key!




GoogleAPIDemo is a demonstration Java app that quickly gets you searching Google, grabbing from its cache, and spell-checking.




% java -cp googleapi.jar
com.google.soap.search.GoogleAPIDemo
XXXXXXXXXXXXXXXXXX search rael

Parameters:
Client key = XXXXXXXXXXXXXXXXXX
Directive = search
Args = rael
Google Search Results:
======================
{
TM = 0.066088
Q = "rael"
CT = ""
TT = ""
CATs =
{
{SE="", FVN="Top/Society/Paranormal/UFOs/Organizations"}
}
Start Index = 1
End Index = 10
Estimated Total Results Number = 65800
Document Filtering = true
Estimate Correct = false
Rs =
{

[
URL = "http://www.rael.org/"
Title = "Welcome to the Raelian Revolution"
Snippet = "Arabic - PAGE UNDER CONSTRUCTION.
Click here to Skip Flash Intro. "
Directory Category =
{SE="", FVN="Top/Society/Paranormal/UFOs/Organizations"}
Directory Title = "Raelian Religion "
Summary = "Raelian Religion, the world's largest
UFO religion with 50,000 members. Life on Earth
is the result... "
Cached Size = "11k"
Related information present = true
Host Name = ""
],
...
[
URL = "http://www.oreillynet.com/~rael/"
Title = "raelity bytes"
Snippet = " ... that's not actually me. "They say
Vorilhon, who calls himself the prophet Rael and
testified before Congress last year in a futuristic
white jumpsuit ..."
Directory Category = {SE="", FVN=""}
Directory Title = ""
Summary = ""
Cached Size = "35k"
Related information present = true
Host Name = ""
],
...



GoogleAPIDemo simply data dumps search results, providing a peek at the resultant data in its Java-native form.




Let's try our hand at a little speling. Google's new spelling function, while predictably good at common words, really shines when it comes to uncommon words and proper names.




% java -cp googleapi.jar
com.google.soap.search.GoogleAPIDemo
XXXXXXXXXXXXXXXXXX spell meekrat

Parameters:
Client key = XXXXXXXXXXXXXXXXXX
Directive = spell
Args = meekrat
Spelling suggestion:
meerkat

% java -cp googleapi.jar
com.google.soap.search.GoogleAPIDemo
XXXXXXXXXXXXXXXXXX spell "real dormfest"

Parameters:
Client key = XXXXXXXXXXXXXXXXXX
Directive = spell
Args = real dormfest
Spelling suggestion:
rael dornfest



Lost or deleted that page you spent hours on yesterday? Perhaps Google got there just in time. Let's see what we find in the Google cache for our oreilly.com home page.




% java -cp googleapi.jar
com.google.soap.search.GoogleAPIDemo
XXXXXXXXXXXXXXXXXX cached http://www.oreilly.com

Parameters:
Client key = XXXXXXXXXXXXXXXXXX
Directive = cached
Args = http://www.oreilly.com
Cached page:
============
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<BASE HREF="http://www.oreilly.com/">
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US"
xml:lang="en-US">
<head>
<title>www.oreilly.com -- Welcome to O'Reilly &amp;
Associates -- computer books, software conferences, online
publishing</title>
<meta name="keywords" content="O'Reilly, oreilly,
computer books,
...


Rolling Up Our Sleeves




Let's write some Perl and Java code.




The first simplistic applications of Google's API will predictably be "Google Boxes," relevant search results incorporated into portal and weblog pages. My "raelity bytes" weblog, for example, sports a "Googling for Rael" sidebar entry (on the right), scanning Google once a day for references to (what else) me -- not to mention assorted UFO believers and renewable energy projects. Here's the Perl code behind it:




#!/usr/bin/perl

use SOAP::Lite;

@ARGV == 3 or die "Usage: googly <key> <query>
<number of results>\n"

my($key, $q, $maxResults) = @ARGV;

# key, q, start, maxResults, filter, restrict, safeSearch,
# lr, ie, oe
my @params = ($key, $q, 0, $maxResults, 0, '', 0, '',
'latin1', 'latin1');

my $result =
SOAP::Lite
-> service("file:GoogleSearch.wsdl")
-> doGoogleSearch(@params);

print join "\n",
map( { qq{<a href="$_->{URL}">} . ($_->{title} ||
$_->{URL}) . qq{</a>
<br />} } @{$result->{resultElements}} );



% ./googly XXXXXXXXXXXXXXXXXX rael 5
<a href="http://www.rael.org/">Welcome to the Raelian
Revolution</a><br />
<a href="http://www.rael.org/press/">RAL</a><br />
<a href="http://www.oreillynet.com/~rael/">raelity
bytes</a><br />
<a href="http://www.oreillynet.com/weblogs/author/35">
O'Reilly Network: Weblogs [April 10, 2002]</a><br />
<a href="http://www.oreillynet.com/pub/au/35">O'Reilly
Network: <b>Rael</b> Dornfest [February 03,
2002]</a>
<br />



And the same thing in Java -- borrowing heavily from the GoogleAPIDemo.java included in the Google Web API Developer's Kit.




import com.google.soap.search.*;
import java.io.*;

public class Googly {

public static void main(String[] args) {

if (args.length != 3) {
System.err.println("Usage: java Googly <key>
<query> <maxResults>");
System.exit(1);
}

String clientKey = args[0];
String query = args[1];
int maxResults = Integer.parseInt(args[2]);

GoogleSearch s = new GoogleSearch();

try {
s.setKey(clientKey);
s.setQueryString(query);
s.setMaxResults(maxResults);

GoogleSearchResult r = s.doSearch();

GoogleSearchResultElement[] re = r.getResultElements();
for ( int i = 0; i < re.length; i++ ) {
System.out.println("<a href=\"" + re[i].getURL()
+ "\">" + re[i].getTitle() + "</a><br />");
}
} catch (GoogleSearchFault f) {
System.out.println("The call to the Google Web APIs failed:");
System.out.println(f.toString());
}
}
}



% java Googly XXXXXXXXXXXXXXXXXX rael 5
<a href="http://www.rael.org/">Welcome to the Raelian
Revolution</a><br />
<a href="http://www.oreillynet.com/~rael/">raelity bytes</a><br
/>
<a href="http://www.oreillynet.com/weblogs/author/35">O'Reilly Network:
Weblogs [April 09, 2002]</a><br />
<a href="http://socrates.berkeley.edu/~rael/rael.html">Renewable and
Appropriate Energy Laboratory (<b>RAEL</b>)</a><br />
<a href="http://www.wibble.org.uk/"></a><br />


Good, Clean SOAP (and WSDL too)




Taking a quick gander at a Google Search SOAP request (lifted right from the Developer's Kit's soap-samples folder) reveals a rather simple underlying set of XML documents.




<?xml version='1.0' encoding='UTF-8'?>

<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<ns1:doGoogleSearch xmlns:ns1="urn:GoogleSearch"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<key xsi:type="xsd:string">XXXXXXXXXXXXXXXXXX</key>
<q xsi:type="xsd:string">shrdlu winograd maclisp teletype</q>
<start xsi:type="xsd:int">0</start>
<maxResults xsi:type="xsd:int">10</maxResults>
<filter xsi:type="xsd:boolean">true</filter>
<restrict xsi:type="xsd:string"></restrict>
<safeSearch xsi:type="xsd:boolean">false</safeSearch>
<lr xsi:type="xsd:string"></lr>
<ie xsi:type="xsd:string">latin1</ie>
<oe xsi:type="xsd:string">latin1</oe>
</ns1:doGoogleSearch>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>



What this boils down to is a call to the GoogleSearch service with access key XXXXXXXXXXXXXXXXXX for the first 10 results in a search for "shrdlu winograd maclisp teletype". For details on the particulars of the rest of the settings, I leave you to the API Reference documentation available in the Developer's Kit.



In Conclusion




I do so look forward to the community reaction to Google's Web API as well as seeing and playing with what folks build on top of it.




And it seems I finished my article just in time...




The call to the Google Web APIs failed:
com.google.soap.search.GoogleSearchFault: Fault
Code = SOAP-ENV:Server
Fault String = Exception from service object:
Daily limit of 100 queries exceeded
for key XXXXXXXXXXXXXXXXXX



Update: The limit just raised to 1,000 queries per day with 10 results per query.



Resources