O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  

Google search scraper bookmarklet
Scrapes the search links from a Google search results page and produces a comma-delimited list of <item no>, "<title>", "<link>" in a popup window

Contributed by:
David Crossman
[03/03/04 | Discuss (2) | Link to this hack]

To implement this hack in your browser, drag this link to your browser toolbar: gglscrp
The bookmarklet contains the following JavaScript code:

z = document.getElementsByTagName('p');
s = location.href.match(/start=(\d+)/) ? parseInt(RegExp.$1) : 0;
x = '';
for (y = 0; y < z.length; y++)
  if (z[y].className == 'g') {
    m = z[y].innerHTML.match(/<a .*?<\/a>/igm);
    for (w = 0; w < m.length; w++)
      if (!!(v=m[w].match(/href="?([^ ">]+)[^>]*>(.+?)<\/a>/i)))
        x += s + ',"' + v[2].replace(/<br>/ig,' ').replace(/<[^>]+./g,'').replace(/"/g,"'") + '","' 
          + (/^\//.test(v[1]) ? 'http://' + location.host:'') + v[1]+'"\n';
with ((window.open('')).document)
  write('<html><pre>', x, '</pre></html>')
You might be asked for confirmation, because JavaScript is not a standard protocol.
Popup blockers may block this hack.
Works internationally, at, e.g., google.co.uk, google.fr, google.de, ...

Correction made 17 Mar 2004 20:30 GMT: Removed extraneous space between comma and quotation mark.

Sample output:

0,"oreilly.com -- Online Catalog: Google Hacks","http://www.oreilly.com/catalog/googlehks/"


0,"Similar pages","http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&newwindow=1&c2coff=1&q=related:www.oreilly.com/catalog/googlehks/"

1,"Amazon.com: Books: Google Hacks: 100 Industrial-Strength Tips & ...","http://www.amazon.com/exec/obidos/tg/detail/-/0596004478?v=glance"

1,"Similar pages","http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&newwindow=1&c2coff=1&q=related:www.amazon.com/exec/obidos/tg/detail/-/0596004478%3Fv%3Dglance"


Requirements: JavaScript supporting level 1 DOM (Gecko, IE5+)
Verified in IE6.0 (6.0.2800.1106;SP1) and NS7.0 (rv:1.0.1 Gecko/20020823).

O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.