My SpamBayes set-up notes

by Uche Ogbuji

SpamBayes was already installed (to /usr/bin as it happens, but location doesn't matter as long as sbfilter.py etc. are in the path). My first step was to set up two mail folders: "Junk" and "MaybeSpam" using my mail user agent (Evolution 2.0.2). If your MUA already has some junk controls, such as Evolution 2.x, Thunderbird or Apple Mail, you may have special, local Junk folders. You probably still want to create actual IMAP folders on the server side (though see below for a note on a clash problem in Evolution).



I created a local SpamBayes config dir:



mkdir $HOME/.spambayes


I set up a config file by pasting the following into
$HOME/.spambayes/spambayesrc:



[Storage]
persistent_use_database = True
persistent_storage_file = ~/.spambayes/hammiedb


I added a variable to my environment as follows (on the command line and in my bash profile file):



export BAYESCUSTOMIZE=$HOME/.spambayes/spambayesrc


The next step was to train SpamBayes, first creating the database:



sb_filter.py -n


You should see "Created new database in /home/uogbuji/.spambayes/hammiedb" or such. If you have existing good and spam mail folders, you can kick start things by training on those folders:



sb_mboxtrain.py -d $HOME/.spambayes/hammiedb  -g
$MAILDIR/GoodMailFolder1 g $MAILDIR/GoodMailFolder2 -s $MAILDIR/Junk


Use the -s flag to specify folders with only spam and -g for
folders with only good mail ("ham"). You can have multiple instances of each flag.



Next I updated my .procmailrc with rules to run through SpamBayes and move mail according to the results. Something like the following:



MAILDIR=$HOME
LOGFILE=$MAILDIR/procmail.log
SPAMBAYESRC=$HOME/.spambayes/hammiedb

:0 fw:hamlock
| sb_filter.py -d $SPAMBAYESRC

#SpamBayes tests
:0:
* ^X-SpamBayes-Classification: spam
$MAILDIR/Junk

:0:
* ^X-SpamBayes-Classification: unsure
$MAILDIR/MaybeJunk

#Spamassassin tests, if you also have that
#:0:salock1
#* ^X-Spam-Flag:.*Y
#$MAILDIR/Junk

#:0:salock2
#* ^X-Spam-Status: Yes
#$MAILDIR/Junk

#Uncomment if you really must
#:0
#* ^(From|To|Sender):.*Cron.*
#/dev/null





Finally I added a job to keep up the process of training SpamBayes so it can adopt to changing spam patterns. I added an entry such as the following to my crontab (using crontab -e):



MAILDIR=$HOME
BAYESCUSTOMIZE=$HOME.spambayes/spambayesrc

# use /bin/sh to run commands, no matter what /etc/passwd says
SHELL=/bin/sh
10 6 * * * sb_mboxtrain.py -d $HOME/.spambayes/hammiedb -g
$MAILDIR/GoodMailFolder1 g $MAILDIR/GoodMailFolder2 -s $MAILDIR/Junk


Be sure you don't train MaybeSpam or any other iffy folders as junk (or as ham). All the above set-up did the trick for me.



Evolution gets in the way



Evolution 2.0.2, like many modern MUAs provides fancy client-side spam-filtering, but I ran into problems where this interfered with plain old IMAP folders. Evolution calls its spam folder "Junk" and seems to maintain it locally, even thoug it's in the folder tree under the rspective server (which i think is bad form). Problem is that it seems to mask any actual Junk folder you have on your IMAP server. Most MUAs have an option to override the location of special folders such as Sent, Trash and Junk, but Evolution does not seem to. Has anyone run into thei problem and found a good work-around?



Side note: I've noticed in general that Evolution doesn't get along well with UW-IMAP, especially not the very strange IMAP set-up that comes in SuSE Linux (odd considering Evo's provenience). Not much I can do about that now, since we've outsourced most systems administration.




Do you have any handy SpamBayes tricks for Linux?


1 Comments

Torres
2005-01-17 12:07:58
shell script for Courier and Thunderbird Maildirs
Maybe this works for other combinations ...
$HOME is the basedir for our virtual accounts.

#!/bin/bash
for MDIR in `ls $HOME`
do if [ -d $MDIR ];
then if [ -x $MDIR/cur -a -x $MDIR/.Junk ];
then sb_mboxtrain.py -d $HOME/.spambayes/hammiedb -g $MDIR -s $MDIR/.Junk;
fi;
fi;
done