On log sharing

by Anton Chuvakin

So, it is often reported that since the "bad guys" share technology information (such as exploits, bot access, malware, etc), the "good guys" should ramp up their sharing efforts as well. But companies' unwillingness to share data that might, under the circumstances, be considered sensitive is legendary – and understandable.

Thus, while I was happy to see such projects as Splunk Base which lets users upload their logs that indicate problems (yes, security problems as well) and tag the logs with descriptive tags that enable other Base users to learn from their experience, described via tagged log samples. Just sharing logs is nowhere near as useful as sharing such experiences. Either way, this is a good initiative to watch.

Specifically, CNet says (http://news.com.com/Start-up+brings+glitch+wiki+to+IT+pros/2100-7346_3-6056530.html): "Instead, Splunk has designed its software and Splunk Base to allow system administrators to submit information themselves and then classify and search the collected information of their peers. "

Well, it brings our the standard question: if you start a community for marketing reasons (this one clearly fits such definition), how do you make sure it actually takes off and starts a life as a real community of dedicated users (sometimes ramping up to "raving fans" :-)). I was reading this book by Guy Kawasaki ("Selling the Dream") and it seems to have some answers... In any case, there is a difference between a real community and just a free platform for sharing which might develop into a community, might get monetized or just might tank. We will see what happens to this one.

Security remains an issue as well. Passwords are not too uncommon in Unix and Apache logs (if users mistype them for a username). Other things to watch for include allowed email addresses, IP addresses of critical servers, access control rule information, types of security software used and maybe a few dozen other possible thingies... An intelligent sanitization algorithm seems very important!

My experience with Honeynet Project data tells me that sanitization is not as easy as some think. So, given you have a serious issue – that you might or might not want others to know about, and that might or might not contain sensitive data, do you want to post that data to an open forum hoping that a) someone would help you and/or b) your experience will help someone else? Just post the comments here.

Another fun thing is the "added intelligence" factor. It has to be better (make it "much better") than simply dumping the logs on the public HTML page and having good ole Google search them...

7 Comments

Aristotle Pagaltzis
2006-04-06 03:10:58
I don’t know about the endeavour as such, but I’d be rather uneasy at the thought that bad guys could subvert this system in some fashion. I don’t know if there’d be any way to allay my worries about the idea of posting logs that may indicate security issues in my setup to a wide-open audience.
Bob Sutterfield
2006-04-06 10:46:45
In my early February conversations with Splunk's CTO about topics including the idea that has become Splunk Base, I asked about sanitization. I won't even RMA a device to a vendor without scrubbing its configuration, though I leave the operational parts in place to assist in diagnosis.


He agreed about the sensitivity, and said they scrub logs before submission for userIDs, passwords, hostnames, and email addresses (replace with a token to match other occurrences of the same item), both public and private IP addresses (replace with a unique address in 1918-space), and others. I didn't ask about ACLs, since I think they're safe once scrubbed of the specific information above. Strings identifying software versions are problematic because that information is core to the troubleshooting process.


Splunk's strength and flexibility lies in their method of treating logs as "just text" and matching patterns without prejudice of knowing what generated the log. There lies serendipity: finding clues in places you hadn't thought to look. But sanitization requires the log-submission engine have some semantic knowledge of what they're submitting. This will be harder to get right.

Demetri Mouratis
2006-04-07 11:20:00
Good points. I'm using Splunk Base myself and shared some of these concerns as I began submitting samples. In the end, my fears were allayed and I ended up submitting many raw log samples with out any annonymization. On the other hands, certain logs were obfuscated while preserving the vital bits. This approach provides maximal benefits to the community while still preserving my sometimes paranoid attitude towards systems under my care.


Takeaways:


- Review each log in detail before submission
- Annonymize if sensitive data exists
- Reconsider submission of any log for which sensitive data exists and the benefit to the community is low or non-existant.
- Use rfc1918 addresses or scrub public ones
- Use test data

Anton Chuvakin
2006-04-18 22:15:50
In response to:


"- Use test data"


Doesn't it kinda defeat the purpose of the submission and the Base itself?

fine
2006-12-15 02:24:26
my RSS reader of choice, just passed 100 million blog entries indexed. Wow, that's mind-blowing. First, it's incredible that 100 million blog entries have been written in the past year. Get outside, people! Second, it's incredible that one service does such a great job of indexing and serving so much information. For me, Bloglines is like Google and the Internet Archive in one -- it's not just search and it's not just archival, but instead it's a single interface to find and read and organize all the timely information I receive online.
http://www.xanga.com/tiffanylamp
art
2006-12-15 07:30:14
http://www.artiffany.com
art
2006-12-15 07:31:11
Good points. I'm using Splunk Base myself and shared some of these concerns as I began submitting samples. In the end, my fears were allayed and I ended up submitting many raw log samples with out any annonymization. On the other hands, certain logs were obfuscated while preserving the vital bits. This approach provides maximal benefits to the community while still preserving my sometimes paranoid attitude towards systems under my care.


Takeaways:
http://www.xanga.com/tiffanylamp