In my last article, Order from Chaos with Procmail, I covered the basics of a software package many O'Reilly readers already know: Procmail. Operating on an email server, Procmail permits both system-wide and user-specific filtering rules. The effect of these rules includes filing messages in a folder by the contents in a Subject: or From: line, and firing off a string of processes that use the message as a trigger. Like many aspects of system administration, the process of coming up with useful and innovative Procmail recipes could benefit from a certain amount of shoulder surfing. Here, then, are a handful of Procmail recipes that I've found useful. Hopefully, they will trigger some useful ideas in your own Procmail development. Before you get started, however, read the Trying It Out section of my previous article.

Pager Alerts with Formail

This recipe falls under the category of "leash enhancement." Remember, just because you have near instantaneous notification of that email from someone in the marketing department, it doesn't mean you have any more or less obligation to respond to it. It's nice to know, though, that you're up to speed on what's going on with your systems. No hidden surprises in your inbox is a good thing.

For brevity, I'll assume that you send messages to your pager via SMTP email. In real life, it would probably be better to pass the message to a script that tries first to send pages via direct dial out to your pager provider. Or, it would attempt to page you via Simple Network Paging Protocol (SNPP). Failing that, it would use email as a last resort. Here's an example of the rule:

SUBJECT=`formail -xSubject:`
:0c
*^(Sender|From|Cc):.*\
(\
Admin-User|\
ajilon.com|\
buntain@|\
compassnet.com|\
dartw@|\
vorlon@\
)
|(formail \
-A"X-Rule: grand tour" \
-I"Subject: <KwM-Procmail> \
${SUBJECT}" -A"X-Loop: kwm@themullets.net" )|\
${SENDMAIL} ${PagerCNAME}

Let's look at each part of this rule in detail. Procmail isn't picky about formatting (except, obviously, within the content of a regular expression), but I've broken this rule up into several lines for a couple of reasons. One reason is that it's just plain easier to read. The other reason is that when I'm editing it, I can stick a blank line after it, and sort the whole mess in vi with something along the lines of <esc>!}sort -f.

  1. SUBJECT=`formail -xSubject:`

    The contents of the message's subject header line is extracted out to the variable SUBJECT. Use Formail to do this. You'll find that as your procmailrc grows, you will use Formail for an increasing number of individual recipes. You will also gravitate toward using it to munge errant email folders by using shell scripts to split folders up into individual messages and stream them into another fix-it script.

  2. :0c

    This sequence begins the recipe. Each recipe begins with a colon (:). The 0 indicates that there are an unspecified number of rules to follow, and c indicates that regardless of the outcome of this recipe, the message should be applied to subsequent recipes as well.

              
    * ^(Sender|From|Cc):.*\
    (\
    Admin-User|\
    ajilon.com|\
    buntain@|\
    compassnet.com|\
    dartw@|\
    vorlon@\
    )
    

    This condition isolates messages that are From or carbon-copied to six users who might be important. Procmail views this regular expression as if it's all on one line starting with an * and ending with the final ). Bear in mind that most of these domains or users aren't specified in bulletproof style. The ajilon.com string could be just as easily met by the address I am ajilon.com <root@command.com>, as it could be by the intended <anyone@ajilon.com>.

  3. |(formail \
    -A"X-Rule: grand tour" \
    -I"Subject: <KwM-Procmail> \
    ${SUBJECT}" -A"X-Loop: kwm@themullets.net" )|\
    ${SENDMAIL} ${PagerCNAME}
    
    

    This is the part of the recipe that does all the work. This action line (again, Procmail sees this split line as one physical line) uses Formail to generate a short message with the string <KwM-Procmail prepended to the subject, and sends it to ${PagerCNAME, a variable we would have defined earlier to point to the SMTP address of the pager. If you do this kind of pager notification, it's probably a good idea to add a string to the beginning of the subject, or somehow change the message before it reaches your pager, so you can distinguish between messages your boss (or whoever) sends to your as conventional email, and pages she sends directly to your pager herself.

The ability to carbon-copy messages to your pager is pretty pedestrian, though. This is Unix we're talking about, and you would be correct to expect Procmail to accomplish a great deal more than your run-of-the-mill client-side filtering app can.

Nested Recipes

Occasionally, you'll find that two or more recipes might start off with the same condition, but may have different additional conditions that need to be evaluated before action can be taken. Since Procmail runs on each mail message you receive, it is a good idea to streamline your Procmail recipe file so that it has the least amount of redundancy. One way of doing this is with nested recipes. Let's say you're on a mailing list and you want to save all messages from the list in a folder named for the day of the week, retain copies of posts made by your friend Claudia in your inbox, and forward copies of the messages to three friends who couldn't be bothered to join the list but still want to read it.

The following recipe begins with a single condition: checking for mail sent to the capslock list. Any email passing that condition would then be evaluated by each of the three nested recipes inside the braces. First, a copy of all list messages are filed in a folder named after the current day of the week. Putting the command date +"%a" in the name replaces the command with its output, which would be Sun, Mon, Tues, Wed, and so on. Regular use of this recipe would result in folder names like capslock-l.Mon.auto, capslock-l.Tue.auto, etc., giving you a good way to check and see what happened last Wednesday on your favorite mailing list.

Notice that the first two nested recipes start with :0c. As in the example above, that tells Procmail to apply the message to that recipe, but regardless of the outcome, keep applying it to subsequent recipes.

The second nested recipe, just like the first one, has no conditions attached to it. It takes all messages that come its way (which would only be capslock-L messages), pipes them through Formail to generate a header with four additional header lines; three reminding the recipient that they aren't actual members of the capslock-L list, and one used for loop prevention. The resulting message is then piped through sendmail and sent to your three friends who want to see the list traffic but don't want to join the list.

Finally, the third nested recipe watches for posts from Claudia and copies them to your inbox, known to Procmail as $DEFAULT.

   :0
# Is this a message from the capslog list?
* ^(To|Cc):.*capslock-L@lists.psu.com
{
   :0c
   # file a copy in a file named for the day of the week
   capslock-l.`date +"%a"`.auto

   :0c
   |(formail -A"X-Rule: Carbon-Copy from Kevin." \
          -A"X-Loop: kwm@themullets.net" )|\
          -A"X-Rule: If you wish to stop getting these."\
          -A"X-Rule: tell kwm@themullets.net"\
   ${SENDMAIL} huey@ageek.org duey@ageek.org louis@ageek.org

   :0
   * ^(From|Sender): claudia@unt.net
   ${DEFAULT}

}

The larger and more complex your Procmail recipe file grows, the greater the likelihood that something bad will happen. I'm not saying this to discourage you from using Procmail, only that it's a lot like a high-school driver's education class: You're bound to knock over a few trash cans and clip a few curbs.

The All-Important Loop Prevention

One of the most embarrassing things you can do is create a wildly spinning mail loop that uses up all available resources on one or more hosts. One sure-fire way to all but eliminate that possibility is to add a loop-prevention line to each message.

Notice that in the first two rules, I added the following header line to the message: X-Loop: kwm@themullets.net. This line is the key to a two-step loop-prevention strategy.

First, place something like this as the very first recipe in your Procmail Recipe file:

# pass everything I've 
# already seen on to the default folder.
:0
* ^X-Loop:\ kwm@themullets.net
/dev/null

You can assume that any message you recieve that includes the loop-prevention header line already has been processed once, and thus you can safely destroy any looping copy you get.

An alternative to this would be to avoid such loops in every other recipe your mail might encounter:

# do some random thing, but avoid loops
:0
* !^X-Loop:\ kwm@themullets.net
* whatever condition you wanted to have
some action

Trapping loops in every other recipe, however, is not only more cumbersome but more resource intensive. In the previous example, in which we trap for loops once and file them, a loop encounters one rule and is swiftly dispatched. In the next example, where it is trapped at every recipe, each loop must wander the full length of your recipe file before it is dispatched. Not good.

The second step to avoiding loops would be to use Formail (as in the previous examples) to ensure that the "X-Loop: user@site.something" header line is inserted in all email that goes anywhere but the default inbox. In fact, it might be better to add it to all messages to avoid any irregularities in the future.

Another common email frustration is reserved for those underprivileged folks who must download their email using POP3 rather than IMAP. If you connect to the Internet over dial-up, and don't want to wait two hours for the latest version of the dancing baby movie to download into your mailbox, you can intercept large messages and file them for access when you've got more bandwidth, time, or compulsion for minutiae.

Abbreviating Large Messages

Let's say that you're about to go on vacation. Since this puts you at a distance from your beloved DSL connection, you prefer to auto-file your large messages to a folder, leaving only a stub of the message that you send to your commodity Webmail account. The rule in the example below does this.

This rule also makes use of nested recipes. The condition for entry specifies that the message (header plus body) should be larger than 10,240 bytes. If that criterea is met, a copy of the message is filed in the folder bigmessages.auto. Another copy of the message is piped through a command sequence that replaces the body with a notice that says the larger message has been removed. It then forwards the much smaller mail on to your Yahoo account to notify you that there's a large unread message waiting.

:0c
* >10240
{
   :0c
   bigmessages.auto

   :0
   |((sed -e '/^$/,$d';\
      echo "";\
      echo "Body removed by procmail (ouch!)")|\
    formail -A"X-Loop: kwm@themullets.net" \
            -A"X-Rule: Big messages aside") |\
    ${SENDMAIL} somerandomuserid@yahoo.com

}

Weight, Weight! I Need that Message!

Streaming email into your mailbox isn't a black and white proposition; it's filled with shades of gray. Sometimes, it's not the kind of content in your email that makes you want to file it, it's the amount. For those occasions, there's the Procmail weighted scoring facility.

This rule uses the weighted scoring technique described in procmailsc(5) man page. I'll let you discover many of the nuances of this technique, but the following example will provide you with bragging rights to anyone who claims that their email filtering method is just as flexible and robust as Procmail. I've never run across any other means of mail filtering (client or host-based) that had a facility to match Procmail's weighted scoring technique.

The general idea is that weighted scoring uses a new kind of rule with the format * w^x condition where "w" and "x" are numbers and "condition" is a standard Procmail regular expression or rule. The "w" is the weight, and "x" is the exponent. The first time the condition matches in the header and/or body (depending on the flags you provide with the recipe), the weight is added (or established as) the score. The second time the condition matches, w*x is added. The third time, w*x*x is added. You get the picture. If the final weight is more than zero, the recipe is considered a match and the action line is performed.

What follows is a weighted scoring recipe for filing all messages that mention "budget" more than five times in the roundfile. The "Bh" flags tell Procmail to apply the conditions that follow (two, in this case) to the body as well, and to include the header when piping the message to the action line. The first condition (* -5^0) has a weight (5), but a zero for the exponent and no constituent condition (such as a regular expression or a message size to match. That is a common way to start a weighted scoring recipe with a fixed value. Here, it starts the recipe with a weight of -5, and a null condition (which is always considered true). The next rule (* 1^1 [Bb][Uu][Dd][Gg][Ee][Tt]) tells Procmail to increase the weight by one every time it sees the word "budget" with any variation in capitalization. Finally, if Procmail has a weight greater than zero at the end of the conditions, the message will be filed in the folder roundfile.budget.

:0Bh
* -5^0
*  1^1 [Bb][Uu][Dd][Gg][Ee][Tt]
roundfile.budget

I tested three messages with the word "budget." The first message had three instances of the word, the second had five instances, and the third had ten. Running Procmail in verbose mode, the test produced the results that follow this paragraph. I put the debugging output in italics. Note that the debugging output for the scoring rules provide two numbers: the total weight for that rule, and the total weight for the recipe. Displayed in this way, you can see that each rule starts with a weight of -5. The first message, with three instances of "budget," completed the second scoring rule with a weight of -2, and failed the recipe. Since the recipe failed, the message went to the default mailbox, /var/spool/mail/kwm. The second message, with five instances of "budget," completed with a score of 0; still too low to proceed on to the action line. The final message, with ten instances of "budget," completed with a score of 4 and was stored in the roundfile.budget folder.

procmail: Assigning "DEFAULT=/var/spool/mail/kwm"
procmail: Score:      -5      -5 ""
procmail: Score:       3      -2 "[Bb][Uu][Dd][Gg][Ee][Tt]"
procmail: Locking "/var/spool/mail/kwm.lock"
procmail: Assigning "LASTFOLDER=/var/spool/mail/kwm"
procmail: Opening "/var/spool/mail/kwm"
procmail: Acquiring kernel-lock
procmail: Unlocking "/var/spool/mail/kwm.lock"
>From kwm@themullets.net Fri Nov 17 08:49:31 2000
 Subject: b-word 3 times
  Folder: /var/spool/mail/kwm                          1111
procmail: Assigning "DEFAULT=/var/spool/mail/kwm"
procmail: Score:      -5      -5 ""
procmail: Score:       5       0 "[Bb][Uu][Dd][Gg][Ee][Tt]"
procmail: Locking "/var/spool/mail/kwm.lock"
procmail: Assigning "LASTFOLDER=/var/spool/mail/kwm"
procmail: Opening "/var/spool/mail/kwm"
procmail: Acquiring kernel-lock
procmail: Unlocking "/var/spool/mail/kwm.lock"
>From firstrecipe@atomicconsulting.com Fri Nov 17 08:49:52 2000
 Subject: b-word five times
  Folder: /var/spool/mail/kwm                          1212
procmail: Assigning "DEFAULT=/var/spool/mail/kwm"
procmail: Score:      -5      -5 ""
procmail: Score:       9       4 "[Bb][Uu][Dd][Gg][Ee][Tt]"
procmail: Assigning "LASTFOLDER=roundfile.budget"
procmail: Opening "roundfile.budget"
procmail: Acquiring kernel-lock
>From kwm@themullets.net Fri Nov 17 08:49:31 2000
 Subject: lots and lots of budget, ten actually, including header
  Folder: roundfile.budget                             939

This is a very simple example used to illustrate weighted scoring. If you like this feature, spend a few minutes with the procmailsc(5) man page. Weighted scoring comes close to doubling the power of Procmail. Be careful, though, not to spaghetti-code yourself out of a human-readable recipe file.

Procmail Resources

Of course, there's much more to learn about Procmail. I've just scratched the surface of Procmail's features and capabilities. If you want to learn more, nearly all of the best Procmail sites are accessible from either the Procmail Home Page or the Procmail Mini-FAQ.