O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  

Yahoo! Directory Mindshare in Google
How does link popularity compare in Yahoo!'s searchable subject index versus Google's full-text index? Find out by calculating mindshare!
The Code
[Discuss (0) | Link to this hack]

The Code

You will need a Google API account (http://api.google.com), as well as the SOAP::Lite(http://www.soaplite.com) and HTML::LinkExtor (http://search.cpan.org/author/GAAS/HTML-Parser/lib/HTML/LinkExtor.pm) Perl modules to run this hack.

Save the code as mindshare_calculator.pl, remembering to replace insertkey here with your Google API key:

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
use HTML::LinkExtor;
use SOAP::Lite;

my $google_key  = 'insert key here';
my $google_wdsl = "GoogleSearch.wsdl";
my $yahoo_dir   = shift || "/Computers_and_Internet/Data_Formats/XML_  _".

# Download the Yahoo! directory.
my $data = get("http://dir.yahoo.com" . $yahoo_dir) or die $!;

# Create our Google object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %urls; # where we keep our counts and titles.

# Extract all the links and parse 'em.
sub mindshare { # for each link we find...

    my ($tag, %attr) = @_;

    # Continue on only if the tag was a link,
    # and the URL matches Yahoo!'s redirectory.
    return if $tag ne 'a';
    return unless $attr{href} =~ /rds.yahoo/;
    return unless $attr{href} =~ /\*http/;

    # Now get our real URL.
    $attr{href} =~ /\*(http.*)/; my $url = $1;
                $url =~ s/%3A/:/; # turn encoding into legits.

    # And process each URL through Google.
    my $results = $google_search->doGoogleSearch(
                        $google_key, "link:$url", 0, 1,
                        "true", "", "false", "", "", ""
                  ); # wheee, that was easy, guvner.
    $urls{$url} = $results->{estimatedTotalResultsCount};

# Now sort and display.
my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;
foreach my $url (@sorted_urls) { print "$urls{$url}: $url\n"; }

O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.