Spam blocking: reputation system overcomes antisocial minority

by Andy Oram

Related link:

It's too easy for determined minorities, acting outside the social
pale, to ruin things for the rest of us. Several neighborhoods in
Boston, for instance, are struggling to contain gang activity that
took 75 lives this past year. How can a community preserve itself when
it has no control over a small number of disrupters?

Unsolicited bulk email is a comparable situation: most of us want to
reach other people with email messages of mutual benefit, but we're
overwhelmed by those misusing the system.

The solution is to act together, which software and Internet
connections allow us to do.

Filtering gets a boost from a reputation system developed originally
as an open-source tool (Razor) and then as a commercial product called
Cloudmark. (The desktop version is available only for windows, but
the server and gateway versions support Linux and Solaris too.)

The concept behind the system is simple enough: if several other
people think something is spam, you probably feel the same way. A
worldwide coordination system works much better than millions of
individuals trying to flag spam on their own.

The key to the reputation system--as to any reputation system--is
bootstrapping. You need good data to start with. In this case,
Cloudmark has signed up a few trusted individuals to put the first
ratings in place. As they continue doing ratings and other people sign
up, the system monitors itself. If any spammer decides to throw in
false ratings, he is quickly isolated and demoted so that he has no
effect on further ratings. This is a clever combination of manual,
human intervention and automated support.


2006-01-02 12:54:24
Induction of trusted users.
Hi Andy,

I am one of the co-authors of the paper. Bootstrapping is,
indeed, one of the harder problems with a reputation system. We
bootstrapped with 2 trusted users and as newer, untrusted users
agreed with our assertions (by sending matching reports), they
became trusted. This inductive method has served us very well,
specially because there's a high level of "transitivity" of
opinions (most people agree on what is spam). The inductive
method also makes the trust metric robust over time, as it
becomes harder to achieve a trusted status when there are a lot
of existing trusted users - it's just not that easy to add value
(early, correct reports) to the system as a new reporter today
as it was during the bootstrap stage. Consequently, it's very
hard for spammers to "astroturf".