O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


Stock and Pillory Web Filtering
Tired of maintaining the "naughty Website list"? Set up Squid, script up a cat of the logfiles, and every day at 8AM, post the list of users/websites they visited on the intranet.

Contributed by:
Eldon Sprickerhoff
[07/20/03 | Discuss (2) | Link to this hack]

A client of mine wanted to implement a proxy, wanting to exercise some control over what gets surfed on company time. However, they didn't want to be deemed the "Internet Nazi", or have to maintain the list of "bad websites" on an ongoing basis.

I suggested that community moral suasion be used and as such, set up Squid with user authentication, and wrote up a short script which would de-cruft the logfiles and submit a list of what domains each user had visited the previous day. This list is then HTTP'ized and posted on the company's Intranet for all to see on a daily basis.

Internet access is limited to only the proxy, and there are no exceptions to the rule. The only way this works is if everyone is treated exactly the same way, and that people know that you're doing this (and can deal with the privacy concerns).

It's been implemented for a little over a year, and it's worked surprisingly well. While it's difficult to come to an agreement as to what constitutes "pornography" in the abstract, it's generally easy to describe what websites you don't want your co-workers to know you're visiting during the working day.

And yes, you still have to think about ways prevent people from going around the proxies - you've got to think about tunnelling over/through protocols, VPN's and so on, but you've raised the bar. Throw in an IDS probe to look for George Carlin's naughty words updated and/or errant traffic and you've raised it a bit higher still.

I believe that it's very difficult to answer human problems with technological means - it's easier to address human problems by human means - like the fear of public humiliation.

The (admittedly small) meat of the script is:

cat /usr/local/squid/var/logs/access.log  |  grep "192.168" | while read ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT REST
done | tr "\/" " " | while read USERNAME IPADDR THREE WEBSITE REST
done | sort | uniq > /usr/local/squid/var/logs/expose.$BIGDATE.log

upon which this file gets http-ized and sent to the Intranet server. And yes, I could have written this in perl, or put this to a database and do interactive queries against it. I'll leave that as an exercise to the reader.

O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.