Google Web API

Email.Email weblog link
Blog this.Blog this
Rael Dornfest

Rael Dornfest
Apr. 11, 2002 12:46 PM

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

In "Inventing the Future", an article for the InfoWorld CTO Forum last week in San Francisco, Tim O'Reilly writes:

Why would a company that has a large and valuable data store open it up [via XML-based Web Service APIs

My answer is a simple one: because if they don't ride the horse in the direction it's going, it will run away from them. The companies that "grasp the nettle firmly" (as my English mother likes to say) will reap the benefits of greater control over their future than those who simply wait for events to overtake them.]?

O'Reilly Emerging Technology Conference.

Rael Dornfest is a Researcher at O'Reilly & Associates focusing on technologies just Rael Dornfest beyond the pale. He assesses, experiments, programs, and writes for the O'Reilly network and O'Reilly publications. Dornfest is Program Chair of the O'Reilly Emerging Technology Conference, May 13-16, 2002 in Santa Clara, CA.

Consider the nettle firmly grasped. The Google Web API opens a dialogue with the developers and researchers inventing the next Internet and quite possibly shaping the future of Google itself. By exposing its cache of over 2 billion Web pages via simple Web services, the Google Web API is a breath of fresh air in a specification-dense yet implementation-sparse arena.

John Piscitello, Project Manager for Google Web APIs, says, "In part, we're simply responding to developers who have been asking for Google to try something like this. We also see Google Web APIs as an opportunity to collaborate with developers who have great ideas for making the web more accessible and useful."

What better way to interview potential new programming talent?

And Google offers a good lesson for others considering testing the open Web Services waters in its 1,000 search-per-day / 10 results per search limits. Opening up your data store doesn't mean opening the floodgates; there's plenty of room for embracing experimentation without simply giving everything away -- you only stand to learn.

Google software engineer Nelson Minar will be talking about "Google and Web Services" at the O'Reilly Emerging Technology Conference, May 13-16, 2002 in Santa Clara, CA.

Web Services Get Street Cred

When the story leaked last week, the path it took and people who offered up their $0.02 was a veritable who's-who of the Google API's target audience: from initial post on the ruby-talk mailing list, to Geek journal Slashdot, Userland's Dave Winer, Weblogger Cory Doctorow, IBM SOAP-builder Sam Ruby, and, of course, it spread rapidly through the Weblog community, evidenced by its immediate rise to #1 on DayPop., Wired, et al didn't say a word.

Much of the current noise around Web Services, both in the media and offerings, is all about business process, back-end integration, B2B transactions and remuneration, EDI-replacement, and the like -- way beyond the purview, or at least interest, of many an intelligent coder. Complex specifications overshadow simple, yet often more interesting, implementations. High order discussions of workflow orchestration seldom include more than a nod to the kind of *nix-style pipelining that makes for interesting cross-pollination and unitended consequences. Not that some share of all this isn't rather important. It just doesn't have much in the way of street cred.

The Google Web API, while hopefully grabbing the attention of more traditional Web Services types, is really for who Tim O'Reilly affectionately refers to as "alpha geeks":

The alpha geeks are often a few years ahead of their time. They see the potential in existing technology, and push the envelope to get a little (or a lot) more out of it than its original creators intended. They are comfortable with new tools, and good at combining them to get unexpected results."

Google's arrival at the Open Services experimentation party finds them in good company. Userland's Radio Userland is a wellspring of DIY Web Services bootstrapping. Jabber-RPC transports XML-RPC messages over the Jabber instant messaging framework. Watson provides a stunning example of putting a GUI front-end on Web Services. My own Meerkat Open Wire Services provides open URL-line and XML-RPC interfaces which have reaped some unintented yet wonderful uses.

These are the sort of grassroots projects that finally put to rest what Jon Udell refers to as "the tired stock-quote example" -- not to mention the equally tiresome state number and calculator interop favourites.

What's in the Offing?

Diving into the package, what the Google Web API offers is a SOAP (Simple Object Access Protocol) interface to searching Google's index, accessing info and Web pages from its cache, and checking the spelling of words (even proper names) from the comfort of's standard search syntax.

A freely downloadable Developer's Kit contains:

  • A complete API reference describing the semantics of method calls and fields
  • Sample SOAP request and response messages
  • Google Web API WSDL file
  • A Java library, example program, and Javadoc documentation
  • A sample .NET program
  • A simple SOAP::Lite-based Perl script
  • README, Licensing, and so forth

Getting started is about as easy as 1-2-3. 1. Download the Developer's Kit. 2. Create your Google Web API account. 3. Code. Each account key is entitled to 1000 queries per day, so use them wisely.

Getting Down to Brass Tacks

Enough preamble; let's dive into some of the samples.

Note that I've replaced my actual key (a not particularly attractive string of characters) with X's; I only have 1000 queries a day and so guard them jealously. Get your own key!

GoogleAPIDemo is a demonstration Java app that quickly gets you searching Google, grabbing from its cache, and spell-checking.

% java -cp googleapi.jar 
Directive  = search
Args       = rael
Google Search Results:
TM = 0.066088
Q  = "rael"
CT = ""
TT = ""
CATs =
  {SE="", FVN="Top/Society/Paranormal/UFOs/Organizations"}
Start Index = 1
End   Index = 10
Estimated Total Results Number = 65800
Document Filtering = true
Estimate Correct = false
Rs =
  URL  = ""
  Title = "Welcome to the Raelian Revolution"
  Snippet = "Arabic - PAGE UNDER CONSTRUCTION. 
  Click here to Skip Flash Intro.  "
  Directory Category = 
  {SE="", FVN="Top/Society/Paranormal/UFOs/Organizations"}
  Directory Title = "Raelian Religion "
  Summary = "Raelian Religion, the world's largest 
  UFO religion with 50,000 members. Life on Earth 
  is the result... "
  Cached Size = "11k"
  Related information present = true
  Host Name = ""
  URL  = ""
  Title = "raelity bytes"
  Snippet = " ... that's not actually me. "They say 
  Vorilhon, who   calls himself the prophet Rael and 
  testified before Congress last year in a futuristic 
  white jumpsuit ..."
  Directory Category = {SE="", FVN=""}
  Directory Title = ""
  Summary = ""
  Cached Size = "35k"
  Related information present = true
  Host Name = ""

GoogleAPIDemo simply data dumps search results, providing a peek at the resultant data in its Java-native form.

Let's try our hand at a little speling. Google's new spelling function, while predictably good at common words, really shines when it comes to uncommon words and proper names.

% java -cp googleapi.jar 
XXXXXXXXXXXXXXXXXX spell meekrat        
Directive  = spell
Args       = meekrat
Spelling suggestion:

% java -cp googleapi.jar 
XXXXXXXXXXXXXXXXXX  spell "real dormfest"
Directive  = spell
Args       = real dormfest
Spelling suggestion:
rael dornfest 

Lost or deleted that page you spent hours on yesterday? Perhaps Google got there just in time. Let's see what we find in the Google cache for our home page.

% java -cp googleapi.jar 
Directive  = cached
Args       =
Cached page:
<meta http-equiv="Content-Type" content="text/html;
<html xmlns="" lang="en-US" 
<title> -- Welcome to O'Reilly &amp; 
Associates -- computer books, software conferences, online 
<meta name="keywords" content="O'Reilly, oreilly, 
computer books, 

Rolling Up Our Sleeves

Let's write some Perl and Java code.

The first simplistic applications of Google's API will predictably be "Google Boxes," relevant search results incorporated into portal and weblog pages. My "raelity bytes" weblog, for example, sports a "Googling for Rael" sidebar entry (on the right), scanning Google once a day for references to (what else) me -- not to mention assorted UFO believers and renewable energy projects. Here's the Perl code behind it:


use SOAP::Lite;

@ARGV == 3 or die "Usage: googly <key> <query> 
<number of results>\n"

my($key, $q, $maxResults) = @ARGV;

# key, q, start, maxResults, filter, restrict, safeSearch, 
# lr, ie, oe
my @params = ($key, $q, 0, $maxResults, 0, '', 0, '', 
'latin1', 'latin1');

my $result =
    -> service("file:GoogleSearch.wsdl")
    -> doGoogleSearch(@params);

print join "\n",
        map( { qq{<a href="$_->{URL}">} . ($_->Google Web API || 
$_->{URL}) . qq{</a>
<br />} } @{$result->{resultElements}} );
% ./googly XXXXXXXXXXXXXXXXXX rael 5
<a href="">Welcome to the Raelian 
Revolution</a><br />
<a href="">RAL</a><br />
<a href="">raelity 
bytes</a><br />
<a href="">
O'Reilly Network: Weblogs [April 10, 2002]</a><br />
<a href="">O'Reilly 
Network: <b>Rael</b> Dornfest [February 03,
<br />

And the same thing in Java -- borrowing heavily from the included in the Google Web API Developer's Kit.


public class Googly {

 public static void main(String[] args) {

  if (args.length != 3) {
   System.err.println("Usage: java Googly <key> 
<query> <maxResults>");

  String clientKey = args[0];
  String query = args[1];
  int maxResults = Integer.parseInt(args[2]);

  GoogleSearch s = new GoogleSearch();

  try {

   GoogleSearchResult r = s.doSearch();

   GoogleSearchResultElement[] re = r.getResultElements();
   for ( int i = 0; i < re.length; i++ ) {
    System.out.println("<a href=\"" + re[i].getURL() 
    + "\">" + re[i].getTitle() + "</a><br />");
  } catch (GoogleSearchFault f) {
     System.out.println("The call to the Google Web APIs failed:");
% java Googly XXXXXXXXXXXXXXXXXX rael 5
<a href="">Welcome to the Raelian 
Revolution</a><br />
<a href="">raelity bytes</a><br 
<a href="">O'Reilly Network: 
Weblogs [April 09, 2002]</a><br />
<a href="">Renewable and 
Appropriate Energy Laboratory (<b>RAEL</b>)</a><br />
<a href=""></a><br />

Good, Clean SOAP (and WSDL too)

Taking a quick gander at a Google Search SOAP request (lifted right from the Developer's Kit's soap-samples folder) reveals a rather simple underlying set of XML documents.

<?xml version='1.0' encoding='UTF-8'?>

  <ns1:doGoogleSearch xmlns:ns1="urn:GoogleSearch" 
   <key xsi:type="xsd:string">XXXXXXXXXXXXXXXXXX</key>
   <q xsi:type="xsd:string">shrdlu winograd maclisp teletype</q>
   <start xsi:type="xsd:int">0</start>
   <maxResults xsi:type="xsd:int">10</maxResults>
   <filter xsi:type="xsd:boolean">true</filter>
   <restrict xsi:type="xsd:string"></restrict>
   <safeSearch xsi:type="xsd:boolean">false</safeSearch>
   <lr xsi:type="xsd:string"></lr>
   <ie xsi:type="xsd:string">latin1</ie>
   <oe xsi:type="xsd:string">latin1</oe>

What this boils down to is a call to the GoogleSearch service with access key XXXXXXXXXXXXXXXXXX for the first 10 results in a search for "shrdlu winograd maclisp teletype". For details on the particulars of the rest of the settings, I leave you to the API Reference documentation available in the Developer's Kit.

In Conclusion

I do so look forward to the community reaction to Google's Web API as well as seeing and playing with what folks build on top of it.

And it seems I finished my article just in time...

The call to the Google Web APIs failed: Fault 
Code = SOAP-ENV:Server
Fault String = Exception from service object: 
Daily limit of 100 queries exceeded 

Update: The limit just raised to 1,000 queries per day with 10 results per query.


Rael Dornfest is Founder and CEO of Portland, Oregon-based Values of n. Rael leads the Values of n charge with passion, unearthly creativity, and a repertoire of puns and jokes — some of which are actually good. Prior to founding Values of n, he was O'Reilly's Chief Technical Officer, program chair for the O'Reilly Emerging Technology Conference (which he continues to chair), series editor of the bestselling Hacks book series, and instigator of O'Reilly's Rough Cuts early access program. He built Meerkat, the first web-based feed aggregator, was champion and co-author of the RSS 1.0 specification, and has written and contributed to six O'Reilly books. Rael's programmatic pride and joy is the nimble, open source blogging application Blosxom, the principles of which you'll find in the Values of n philosophy and embodied in Stikkit: Little yellow notes that think.