Loghetti: an apache log file filter in Python

by Brian K. Jones

As announced earlier on my personal blog, I launched an open source project on Google Code called "loghetti". It's written in Python, and is a foundation for what I hope will become a very flexible tool to help admins (myself included) get whatever data they need out of their Apache logs.

Here are a couple of examples of stuff it can do:

Get a list of all of the 500 errors:

./loghetti.py --code=500 access.log

This will send all matching lines in access.log to STDOUT. To get a bit more complex:

./loghetti.py --ip=192.168.1.2 --code=500 --month=11 --day=21 --urlbase=index.php --count access.log

This will *not* return the lines that match all of those rules - but rather a simple count of the matching lines. This request is a somwhat typical support scenario. You have a client at 192.168.1.2 reporting 500 errors they received on some arbitrary date, when trying to reach your intranet's home page. It's not unusual in a support role to have the client say "it happened like, a million times". Of course, --count will dutifully report that it happened 4 times (for example), which is likely closer to the truth.

Ok, one more example, because I happen to be a fan of this feature:

./loghetti.py --urldata=foo:bar access.log

This causes loghetti to parse the query string, and return lines where the query parameter "foo" matches argument "bar". In other words, lines that look something like this:

http://www.yourdomain.com?stuff=things&foo=bar&this=that

There are billions of features I'd like to implement, but I figured since the tool is useful to me already, it would likely be useful to others, and maybe others can help get features that might help them implemented more quickly.

Let me know your thoughts!

3 Comments

Christian
2008-03-18 14:19:12
Wow, this looks very handy indeed. With an access log like mine, going through it can be a real pain.


Now, how hard would it be to modify this to also parse MySQL's slow query log?


Thanks!

Noah Gift
2008-03-19 02:19:55
Brian just curious why you didn't use optparse in the standard library?
Brian Jones
2008-03-25 17:44:04
Noah:


I used CommandLineApp, which is a preexisting module written by our pal Doug Hellmann, which takes care of the option parsing for me, so I didn't write even one bit of option parsing code. My understanding is that CommandLineApp was written before optparse was part of the standard library, so I guess Doug just hasn't gotten around to supporting it, or hasn't found the value-add, or whatever.


In short, I happen to *hate* writing option parsing code, so I delegated that task.