oreilly.comSafari Books Online.Conferences.


Mail-Filtering Techniques

by Emmanuel Dreyfus

Internet email used to be a great tool, but it's currently crippled with annoyances -- unsolicited commercial email (also known as spam), viruses, and denial-of-service mail floods. Filtering email has become common. Today, it's hardly possible to use email and make your address public without some sort of spam and virus filtering tools.

Some filtering can be done on the client and some must be done on the server. This article studies how to filter email efficiently and without sacrificing reliability. A second part will focus on how to write a mail filter for Sendmail, the most comprehensive and widespread mail server on the Internet.

Internet Mail Background

Internet email today is built around three protocols: Simple Mail Transfer Protocol (SMTP), Post Office Protocol version 3 (POP3), and Internet Mail Access Protocol version 4 (IMAP4). Older protocols for distributing email were common in the past. Some are still in operation for certain setups, but we will not cover them.

Several kinds of software implement the three protocols above. The Mail User Agent (MUA) is the mail client that the end user sees. MUAs include software such as Eudora, Netscape mail, Mozilla Thunderbird, Pegasus mail, or the infamous Outlook Express, a popular target for Windows viruses. There are also several webmail packages available, where the MUA runs on a web server, with just its user interface running on the user's machine through a web browser.

A MUA sends messages to mail servers using the SMTP protocol, and receives its mail from mail servers using the POP3 and IMAP4 protocols. POP3 and IMAP4 are similar, with IMAP4 being more recent and more feature-rich than POP3.

The mail server runs a Mail Transfer Agent (MTA), such as Sendmail, Postfix, Qmail or Exim. Its job is to receive messages through SMTP and to route them to their destinations. If the destination is a local mailbox, the MTA uses a Mail Delivery Agent (MDA) to drop the message in a mailbox. If the destination is another machine, the MTA uses SMTP to contact another MTA on the destination mail server. This MTA will in turn use an MDA to store the message in a mailbox.

Related Reading

Spam Letters
By Jonathan Land

When it has to reach a mail server for a remote domain, a MTA needs to know the address of the mail server for the domain. This information is available through the MX (Mail eXchanger) record of the DNS. DNS acts as a directory, explaining how to send mail to any mail-enabled domain. The mail server listed in the MX record is usually known as the MX server.

Of course, in the real world, things are usually much more complicated. There can be multiple front-end MX servers that relay mail to multiple mail servers in the inner network.

Filtering Mail Today

Mail filtering can be done at three different levels.

Filtering at the MUA Level

The user's MUA can filter out viruses using an anti-virus software, and spam using various techniques, including learning filters that will try to learn what the user considers spam. While this method is the most flexible for the user, it suffers several drawbacks:

  • The client must download (at least part of) all messages to filter. This can be quite annoying when dealing with virus floods, especially for users who connect via dial-up.

  • On large networks, the system administrator must ensure that anti-virus definitions are up to date on many workstations. This can range from painful for a large corporation to impossible for an Internet Service Provider (ISP).

Filtering at the MDA Level

MDA level filtering solves those two problems. Because it happens on the server, it can destroy junk messages before the client has to download them. Maintaining centralized tools is also much easier.

MDA-level filtering has been the most popular way of filtering on the mail server for a while. It is easy because any MTA has to call an external program for local mail delivery. On UNIX systems, this means invoking a command such as mail, mail.local, or procmail. Filtering is easy -- just invoke a filter instead of the MDA and have the filter invoke the real MDA after it has completed its job.

This approach worked for some time, but turned out to have one major drawback: there's no user interaction at the MDA level! The filter cannot ask the user if it is safe to destroy a given message that could be spam or contain a virus. When the MDA finds a suspicious message, it must notify the sender or the receiver so that the mail system remains reliable. If it notifies the receiver, the user will be flooded by notification of non-delivery instead of being flooded by viruses and spam. This changes foreign junk mail into locally generated junk mail and does not really solve the problem.

If the MDA notifies the sender, then we hit a loophole in SMTP: it does not require sender authentication. It is trivial to forge an email with a random source address. Nowadays, any spam or virus will have a forged return address. Sending a notification to a forged sender results in mail being sent to a nonexistent address in the best case, and to a person that did not send the spam or virus, in the worst case. This is not acceptable.

The only other option when working at the MDA level is to drop junk messages silently. This is not satisfying on the reliability front, since a false positive will be dropped without notification.

The other big problem with MDA-level filtering is that chaining different filters (for instance, an anti-virus and an anti-spam), is not straightforward at all. You must tell the first filter to invoke the second instead of the real MDA and the second to invoke the real MDA. This can be quite complicated and difficult to troubleshoot.

MX-Level Filtering

Fortunately, a solution exists to these problems. You cannot trust the sender's email address, so we must avoid relying on it for notification of non-delivery. If filtering occurs at the MTA level on the domain's MX, then we are directly talking with a real MTA on a real mail server, a spam engine, or a virus.

SMTP works with the concept of message responsibility. A server will receive a message that it will flush it to disk. Then it will tell the sender server that it accepted the message. This transfers the responsibility of the message to the receiver and the sender may remove the message from its mail queue.

If for some reason (disk full, system crash, load too high, network outage, recipient unknown) the receiver MTA does not tell the sender that it accepted the message, the message remains the sender's responsibility. If the problem was permanent (recipient unknown, for instance), then the sender will have to send a notification of non-delivery. If the error was temporary, then the sender ought to retry sending the message later.

If we refuse a message that comes from a spam engine or a virus, we directly tell the spam engine or the virus that we refuse the message. The sender is not a real MTA, so it likely does not do error handling. Its job is just to flood the Internet with junk mail. It will probably generate no notification at all. This is good.

If the sender is a real MTA, it will make a delivery status notification to the sender, which should be the address of the actual message sender. We have exactly what we want.

Limitations of MX Filtering

There are, however, some minor problems with MX-level filtering.

  • It works only at the MX level. If the domain MX accepts a junk email with a forged sender address, there is no point in refusing it at another internal mail server, even at the MTA level. The sending server will be your domain's MX, and it will send a delivery status notification to the forged address.

  • If spam or a virus is sent to a mailing list where a recipient has a filtering MX, then the list owner will receive the delivery status notification. This happens because the list server accepted the junk email once and sent it to the list. The problem does not exist on the filtering MX but on the list server, and should really be fixed there. The only other way to avoid this problem is to drop messages silently, which is unacceptable on the reliability front.

    Additionally, list maintainers can use MDA- or MUA-based filters to deal automatically with delivery status notifications. These use a standard format that is easy to handle automatically.

  • When messages are relayed through forwarding to a filtering MX, we have the same issue. The mail server that forwards the junk mail will send a delivery status notification to a possibly forged address. Again the problem is not on the filtering MX, but on the forwarding host that accepted some junk mail, so it should be fixed there.

  • The same problem occurs again when receiving spam or viruses from an open relay. Again, the fix is the same. Fix or blacklist the open relay.

Pages: 1, 2

Next Pagearrow

Sponsored by: