O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  

Scrape Yahoo! Buzz for a Google Search
A proof-of-concept hack scrapes the buzziest items from Yahoo! Buzz and submits them to a Google search
The Code
[Discuss (0) | Link to this hack]

The Code

Save the following code to a plain text file named buzzgle.pl:

# buzzgle.pl
# Pull the top item from the Yahoo! Buzz Index and query the last
# three day's worth of Google's index for it.
# Usage: perl buzzgle.pl
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of days back to go in the Google index.
my $days_back = 3;
use strict;
use SOAP::Lite;
use LWP::Simple;
use Time::JulianDay;
# Scrape the top item from the Yahoo! Buzz Index.
# Grab a copy of http://buzz.yahoo.com.
my $buzz_content = get("http://buzz.yahoo.com/") 
  or die "Couldn't grab the Yahoo Buzz: $!";
# Find the first item on the Buzz Index list.
my($buzziest) =  $buzz_content =~ m!http://search.yahoo.com/search\?p=.+">(.+?)<\/a>!i;
die "Couldn't figure out the Yahoo! buzz\n" unless $buzziest;
# Figure out today's Julian date.
my $today = int local_julian_day(time);
# Build the Google query.
my $query = "\"$buzziest\" daterange:" . ($today - $days_back) . "-$today"; 
  "The buzziest item on Yahoo Buzz today is: $buzziest\n",
  "Querying Google for: $query\n",
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search -> 
      $google_key, $query, 0, 10, "false", "",  "false",
      "", "latin1", "latin1"
# No results?
@{$results->{resultElements}} or die "No results";
# Loop through the results.
foreach my $result (@{$results->{'resultElements'}}) {
 my $output = 
  join "\n",  
  $result->Scrape Yahoo! Buzz for a Google Search || "no title",
  $result->{snippet} || 'no snippet',
    $output =~ s!<.+?>!!g; # drop all HTML tags
    print $output;

O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.