ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Spam Filtering with Sendmail Milters and Greylisting
Pages: 1, 2

Callbacks

Now, every time the server handles an email, libmilter will call one of the callbacks we registered through smfi_register(). For instance, in this example, the mlfi_connect() callback registers for connection time. Therefore, each time an SMTP client connects to the machine, libmilter will invoke our mlfi_connect() function.



Here is the mlfi_connect() function for milter-greylist:

sfsistat
mlfi_connect(ctx, hostname, addr)
	SMFICTX *ctx;
	char *hostname;
	_SOCK_ADDR *addr;
{       
	struct mlfi_priv *priv;
	struct sockaddr_in *addr_in;

	if ((priv = malloc(sizeof(*priv))) == NULL)
		return SMFIS_TEMPFAIL;
		       
	smfi_setpriv(ctx, priv);
	bzero((void *)priv, sizeof(*priv));
	priv->priv_whitelist = EXF_UNSET;
		
	addr_in = (struct sockaddr_in *)addr;
	if ((addr_in != NULL) && (addr_in->sin_family == AF_INET))
		priv->priv_addr.s_addr = addr_in->sin_addr.s_addr;
		       
	return SMFIS_CONTINUE;
}

We have an opaque context pointer that libmilter will hand us on each callback for the same SMTP connection. libmilter uses it to store various pieces of information about the connection, including a user private pointer that we can use to store our own data. smfi_setpriv() and smfi_getpriv() set and retrieve this private pointer, respectively.

milter-greylist's mlfi_connect() starts by allocating some private memory for a mlfi_priv structure, which is defined like this:

struct mlfi_priv {
	struct in_addr priv_addr;
	char priv_from[ADDRLEN + 1];
	time_t priv_elapsed;
	int priv_whitelist;
	char *priv_queueid;
};

Our goal is to retrieve the tuple (source IP, sender email, recipient email), so mlfi_priv has some storage for this information. In mlfi_connect(), we store the client IP address in the priv_addr field of mlfi_priv.

Anatomy of an SMTP Transaction

Before moving further, let us look at the anatomy of a SMTP transaction. Lines starting with >>> are sent from the client to the server, and lines starting with <<< are sent from the server to the client.

>>> 220 mx1.example.net ESMTP Sendmail 8.12.10/jtpda-5.4 ready at Fri, 26 Mar 2004 15:23:56 +0100 (CET)
<<< HELO mail.example.com
>>> 250 mx1.example.net Hello mail.example.com [192.0.2.26], pleased to meet you
<<< MAIL FROM: <John.Smith@example.com>
>>> 250 2.1.0 <John.Smith@example.com>... Sender ok
<<< RCPT TO: <Reginald.Wesson@example.net>
>>> 250 2.1.5 <Reginald.Wesson@example.net>... Recipient ok
>>> DATA
<<< 354 Enter mail, end with "." on a line by itself
>>> From: <John.Smith@example.com>
>>> To: <Reginald.Wesson@example.net>
>>> Date: Fri, 26 Mar 2004 15:23:57 +0100 (CET)
>>> Subject: Test
>>>
>>> This is a test message
>>> .
<<< 250 2.0.0 i2QENuV9026193 Message accepted for delivery
>>> QUIT
<<< 221 2.0.0 mx1.example.net closing connection

More Callbacks

After smfi_connect(), libmilter will invoke the following callbacks:

  • smfi_envfrom(), after the MAIL FROM command is sent.
  • smfi_envrcpt(), after the RCPT TO command is sent.
  • smfi_eom(), after the DATA command is finished.
  • smfi_close(), at connection close time.

Additionally, the following checkpoints could have callbacks, if we had registered them:

  • After receiving the HELO command
  • On each header
  • After the end of the headers
  • After each body block read
  • When a message transmission aborts

The Milter API documents all of the possible callbacks. In each of the callbacks, it is possible to call smfi_getpriv() to fetch the pointer to our private data, so we can read and modify it.

Accepting or Rejecting

In each callback, the return value can cause Sendmail to reject the message either permanently (SMFIS_REJECT) or temporarily (SMFIS_TEMPFAIL). Returning SMFIS_CONTINUE carries on the transaction.

Depending on the callback, rejecting can have different meanings. For example, mlfi_rcpt() is recipient-oriented. It can be called several times for a message that has several recipients. Rejecting one recipient will remove that recipient from the recipient list, but the message will still go through for the other ones.

In message-oriented callbacks, such as mlfi_eom(), rejecting causes the message to be rejected for all of the recipients.

Cleaning Up After a Message is Handled

Whatever happens to the message, the mlfi_close() callback will be called. This is the place to de-allocate private data. Failure to do so will cause a memory leak that will eventually crash the milter:

sfsistat
mlfi_close(ctx)
	SMFICTX *ctx;
{        
	struct mlfi_priv *priv;

	if ((priv = (struct mlfi_priv *) smfi_getpriv(ctx)) != NULL) {
		free(priv);
		smfi_setpriv(ctx, NULL);
	}

	return SMFIS_CONTINUE;
}

Multi-Threading

We complete our tuple in the mlfi_envrcpt() callback. We already have the source IP and the sender email stored in mlfi_priv(), and now we finally receive one recipient address.

This is the time for various checks, such as the whitelist check that milter-greylist's except_filter() function performs. This function is worth a few words. It walks a chained list of exceptions, looking for an entry matching the recipient address or the source IP:

LIST_FOREACH(ex, &except_head, e_list) {
	if (ex->e_type != E_RCPT)
		continue;

	if (emailcmp(rcpt, ex->e_rcpt) == 0) {
		found = 1;
		break;
	}
}

The LIST_FOREACH macro comes from <sys/queue.h>, along with a few other macros for defining and walking different kinds of chained lists. Theses macros are extremely useful, since they greatly reduce your ability to write bugs in chained-list code.

Whether you use chained lists or fixed size tables, it's impossible to read and write the data shared among threads in a milter, because the code runs in a multi-threaded environment. Each time Sendmail handles a new message, it will make a new connection to the milter, where libmilter spawns a new thread to handle it. The milter may be processing several messages simultaneously.

It is therefore not safe to operate on shared data; another thread might be writing while we read, thus causing bugs. For instance, if we walk a chained list while another thread removes an item from it, we might jump out of the list and crash.

The workaround is locking. Each time we need to read some global data, we use a read lock. Each time we write to it, we use a write lock. The difference between read locks and write locks is that many threads can share a read lock, whereas only one thread can have a write lock.

In milter-greylist, we use lock macros to avoid bloating the code:

#define WRLOCK(lock) if (pthread_rwlock_wrlock(&(lock)) != 0) {           \
                syslog(LOG_ERR, "%s:%d pthread_rwlock_wrlock failed: %s", \
                    __FILE__, __LINE__, strerror(errno));                 \
                exit(EX_SOFTWARE);                                        \
        }

Before using the lock, it must be initialized. Do this by using pthread_rwlock_init() before calling smfi_main().

There are many other problems caused by multi-threading. For instance, milter-greylist has to write its database to a file when it is modified, so that after a restart it can resume operation where it halted. It is not possible to dump the database to a file from a callback, because another thread could attempt to do this at the same time. To work around this problem, dump.c devotes a single dumper thread to this operation. This thread starts (using pthread_create() from main()) before the smfi_main() call.

The dumper thread sleeps on a flag, using pthread_cond_wait(). Each time another thread modifies the database, it wakes the dumper thread by calling pthread_cond_signal(), and the dumper thread handles the job of flushing data to disk.

Thread Safety and Third-Party Code

Last but not least, a milter must only call thread-safe functions from libraries. Any function that uses global variables or static memory is thread-unsafe. For instance, you have to use inet_ntop(3) instead of inet_ntoa(3).

Thread unsafety can be hard to guess. For instance, if your libc features a BIND4-based DNS resolver, using DNS resolver functions will lead to trouble. This kind of problem can be quite hard to discover, especially when linking with third-party libraries.

Fortunately, this kind of problem is easy to track down. After receiving a few messages, the milter will hang. At that time, if you attach gdb(1) to it and type the bt command (this shows the stack dump), you will always see it stuck in the same code path. This code path is likely to contain a thread-unsafe function. Here is an example:

# ps -ax | grep milter-greylist
13694 ?? S     0:00.13 milter-greylist -p /var/milter-greylist/sock
# gdb milter-greylist
(gdb) attach 13694
0x4193f238 in recvfrom () from /usr/lib/libc.so.12
(gdb) bt
#0  0x4193f238 in recvfrom () from /usr/lib/libc.so.12
#1  0x418a43c0 in __pth_sc_recvfrom () from /usr/pkg/lib/libpthread.so.20
#2  0x418a2cfc in pth_recvfrom_ev () from /usr/pkg/lib/libpthread.so.20
#3  0x418a2a7c in pth_recv_ev () from /usr/pkg/lib/libpthread.so.20
#4  0x418a2a50 in pth_recv () from /usr/pkg/lib/libpthread.so.20
#5  0x418a4444 in recv () from /usr/pkg/lib/libpthread.so.20
#6  0x418a10fc in pth_poll_ev () from /usr/pkg/lib/libpthread.so.20
#7  0x418a0d44 in pth_poll () from /usr/pkg/lib/libpthread.so.20
#8  0x418a3d24 in poll () from /usr/pkg/lib/libpthread.so.20
#9  0x418799fc in res_send () from /usr/lib/libresolv.so.1
#10 0x41877ef4 in res_query () from /usr/lib/libresolv.so.1
#11 0x4184377c in SPF_dns_lookup_resolv (spfdcid=0x190caa0, 
    domain=0x182b290 "example.com", rr_type=16, should_cache=1)
    at spf_dns_resolv.c:139
#12 0x4183fc64 in SPF_dns_lookup (spfdcid=0x0, domain=0x1a1df98 "", 
    rr_type=64, should_cache=2) at spf_dns.c:57
#13 0x4184291c in SPF_get_spf (spfcid=0x1987c00, spfdcid=0x190caa0, 
    domain=0x182b290 "example.com", c_results=0x1a1f8c8) at spf_get_spf.c:76
#14 0x418423ec in SPF_result (spfcid=0x1987c00, spfdcid=0x190caa0, domain=0x0)
    at spf_result.c:376
#15 0x180b0e4 in spf_alt_check (in=0x0, 
    fromp=0x190c940 "<John.Doe@example.com>") at spf.c:126
#16 0x18022ec in mlfi_envfrom (ctx=0x0, envfrom=0x182b250)
    at milter-greylist.c:178
#17 0x180e820 in st_sender ()
#18 0x180de14 in mi_engine ()
#19 0x180c4dc in mi_handle_session ()
#20 0x180bd50 in mi_thread_handle_wrapper ()
#21 0x4189bf7c in pth_spawn_trampoline () from /usr/pkg/lib/libpthread.so.20
#22 0x41898990 in pth_mctx_set_bootstrap () from /usr/pkg/lib/libpthread.so.20
#23 0x418988dc in pth_mctx_set_trampoline () from /usr/pkg/lib/libpthread.so.20
#24 0x7fffefdc in ?? ()

Note that if, when typing bt, you see no function name, make sure the program was built with -g and that the binary was not stripped at installation.

The last function invoked before the libpthread machinery is res_send(3). A quick search on the Internet tells that this function is not thread-safe in BIND4, which is what causes the problem. You must use a BIND8 resolver to work around this problem.

Reading Macros

From time to time, it is necessary to read some of Sendmail's macros by using smfi_getsymval(). This is how, for example, smfi_envrcpt() reads the message queue ID:

if ((priv->priv_queueid = smfi_getsymval(ctx, "{i}")) == NULL) {
        syslog(LOG_DEBUG, "smfi_getsymval failed for {i}: %s", 
            strerror(errno));
        priv->priv_queueid = "(unknown id)"; 
}

This can read only macros explicitly exported in sendmail.cf using the O Milter.macros configuration lines.

Changing Headers

In order to make debugging easier, milter-greylist adds an X-Greylist header to any handled message that explains if the message was delayed and how much, if the message is white-listed and why, and so on. smfi_addheader() in smfi_envrcpt() handles this. This function takes the opaque pointer, the header name, and the header value as arguments.

Conclusion

Milter is a scalable, easy-to-use solution for MTA-level filtering. The API is quite straightforward to use and hides very few pitfalls. It's easy to start and to develop complex filtering techniques. It is indeed a great opportunity to have it in the battle against spam and viruses.

milter-greylist was really easy to implement. It took under a week to produce something that works (with a few bugs), and less than a month to complete version 1.0. I hope this article will help potential developers to produce more milters.

Thanks to John Klos for reviewing this article.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to ONLamp.com.



Sponsored by: