oreilly.comSafari Books Online.Conferences.


Spam Filtering with Sendmail Milters and Greylisting

by Emmanuel Dreyfus

In the first part of this series, we studied the various spam filtering techniques; specifically, in which place of the electronic mail framework filtering measures work and what kind of filtering techniques are currently available.

This article focuses on the development of a spam filter, through the example of milter-greylist, a greylisting plugin for Sendmail. We assume that the reader knows the C programming language reasonably well. A basic understanding of TCP/IP is also useful.

Sendmail and Milter

Sendmail made MTA-level filtering easy by introducing the Milter API. Milter is a contraction of the term "mail filter." Milters are small daemons that communicate with Sendmail through UNIX sockets or TCP/IP connections. They are easy to configure; you just need to add a few lines to the configuration file. Here is an example for double filtering by milter-regex and milter-greylist:

O InputMailFilters=regex,greylist
Xregex, S=local:/var/run/milter-regex/sock, F=T
Xgreylist, S=local:/var/milter-greylist/sock F=T

O Milter.macros.connect=j, _, {daemon_name}, {if_name}, {if_addr}, {client_addr}
O Milter.macros.envfrom=i, {mail_mailer}, {mail_host}, {mail_addr}
O Milter.macros.envrcpt={rcpt_mailer}, {rcpt_host}, {rcpt_addr}

The first line lists the milters to invoke for each message. Here, filtering first uses regex, then greylist. Those names must correspond to the next lines, which start with an X.

The X lines define each milter property: how to contact the milter (here, a local UNIX socket) and what should happen if the milter fails. (F=T means a temporary error, F=R means a permanent error, and no F= means pass through as if the filter did not exist.) Timeout values are optional.

The remaining lines select which Sendmail macros to export to the milter. We will see how to use them when we deal with the actual implementation.

The milter design allows them to run on the same machine as Sendmail, but also through the network. It is possible to build highly scalable setups, with farms of milter machines and load distributed though rotating DNS or TCP redirection.

Milter Gallery

Many milters are already available for anti-spam, anti-virus, archival, accounting, and various other purposes. Here is a set of my favorites:

  • milter-regex filters mail by applying regular expressions. It can filter out files based on headers (the Win32 header, for instance) or by extension. Here is a sample of a milter-regex config file:

    reject "Sorry, we do not accept ZIP archives anymore"
    body /^(Content-Type: [^;]*; |  )name=".*\.zip"/ie
    body /^(Content-Disposition: attachment; |      )filename=".*\.zip"/ie

    It is also extremely useful when dealing with distributed denial-of-service attacks. If you can find a common pattern in the junk messages, you can filter them out with milter-regex.

  • milter-greylist is an anti-spam tool I wrote. It uses the greylist method, and for now, it just zaps all of the spam without a false positive.

    The principle is simple: on temporary errors, real MTAs wait for a while and retry sending the message. Spam engines do not. When milter-greylist receives a message, it refuses it with a temporary error, storing a tuple (source IP, sender email, recipient email) in a table. On the next attempt, if it finds the tuple in the table, it accepts the message.

    Of course, spammers can start resending their messages. If this happens some day, we can force each message to wait for one hour before being accepted. If the spammer stays at the same address for one hour, the odds are good he will appear in a DNS-based blacklist before the second attempt.

    White-listing and auto-white-listing can also reduce the delay on legitimate mail.

  • milter-sender is a real-time, sender-address validator. It works by trying to send a message to the sender address of each incoming message. If it receives a temporary error, it temporarily refuses the incoming message. If it receives a permanent error, it refuses the incoming message permanently, and so on.

  • j-chkmail checks the message for forbidden attachment files and will refuse them. It is very useful against viruses, and risks fewer false positives than the one-line regular expression matching done by milter-regex.

There are also various milters to interface Sendmail with AMaViS, SpamAssassin, and many other tools. Web sites such as feature lists of available milters.

Writing Your Own Milter

Milters are linked with libmilter, which handles the burden of the communication with Sendmail. Milter authors just have to use the Milter API, by including <libmilter/mfapi.h> and by linking with libmilter. Because libmilter relies on libpthread, libpthread is required in milter linkage as well.

Starting Up

Writing a milter tends to be surprisingly simple. Start by writing a daemon that will parse its command-line options, detach to the background, open log files, and so on. In order to specify the socket that will be used to communicate with Sendmail, use smfi_setconn():


where socket is a string, usually taken from the command line, that identifies the location of the socket. For a local socket, you can just use a filesystem path.

The other required operation is to fill a struct, smfiDesc, with a collection of callbacks and pass it to libmilter through smfi_register():

struct smfiDesc smfilter =
	"greylist",     /* filter name */
	SMFI_VERSION,   /* version code */
	SMFIF_ADDHDRS,  /* flags */
	mlfi_connect,   /* connection info filter */
	NULL,           /* SMTP HELO command filter */
	mlfi_envfrom,   /* envelope sender filter */
	mlfi_envrcpt,   /* envelope recipient filter */
	NULL,           /* header filter */
	NULL,           /* end of header */
	NULL,           /* body block filter */
	mlfi_eom,       /* end of message */
	NULL,           /* message aborted */ 
	mlfi_close,     /* connection cleanup */

/* (some code) */

if (smfi_register(smfilter) == MI_FAILURE) {
	fprintf(stderr, "%s: smfi_register failed\n", argv[0]);

Once this is done, the program hands out control to libmilter forever by calling smfi_main():

return smfi_main();

Pages: 1, 2

Next Pagearrow

Sponsored by: