Troubleshooting with Postfix Logsby Kyle Dent, author of Postfix: The Definitive Guide
Postfix provides a number of ways to get information that can be helpful when you are trying to diagnose a problem. The standard logging often provides more than adequate information for understanding problems and determining their cause. This article discusses Postfix logging in general, how to find all of the relevant information in the logs, and finally, how to increase the amount of logging when more information is needed. Before we get to logging, let's consider a few configuration guidelines that might help prevent problems from the start.
Change One Thing at a Time
Postfix is a complex system. If you make several changes and your system stops working as expected, you'll have a difficult time figuring out which change caused the problem.
Save Known Working Configurations
Be sure that you can always revert to an earlier known working configuration. Either make copies of files before you work on them or, even better, use version control with your configuration files.
soft_bounce Parameter for Testing Changes
When trying out a configuration change you're not sure about, you can enable
soft_bounce, so that mail that would bounce is kept in the queue and can still be delivered once the problem is corrected. In your main.cf file, simply add the following:
soft_bounce = yes
Besides queuing problem messages, enabling
soft_bounce causes Postfix to convert permanent rejection codes (5xx) to temporary ones (4xx). When you enable
soft_bounce, it's important to check your logs frequently. When you find a warning or see that a message has been rejected, you can fix your configuration problem and successfully receive the message next time the delivery is attempted.
warn_if_reject with Relay Restrictions
Chapter 11 of Postfix: The Definitive Guide discusses anti-spam configuration information. As you build up your set of blocking rules, you can preface those you're not sure about with
warn_if_reject to have the rule log a warning instead of rejecting the message.
For example, a restriction such as
reject_unknown_hostname rejects too many legitimate messages, in most situations. If you want to see what effect it will have in your environment without actually rejecting messages, preface it with
smtpd_helo_restrictions = permit_mynetworks warn_if_reject reject_unknown_hostname ...
Messages that would have been rejected by the restriction are allowed to go through, but a warning is logged. As with
soft_bounce, you have to watch your log file to know what effect a
warn_if_reject-modified restriction will have.
Whenever something is not working as expected, the first step for
troubleshooting the problem is checking your log. Postfix logs all operations
whether successful or not. Postfix uses your system's
to write messages to your log file. Because system logging differs on different
platforms, you should review your system documentation to learn about
syslog. Most likely, you can find where Postfix logging occurs by
checking your system's /etc/syslog.conf file. In the examples below,
we'll assume that the log file is located at /var/log/maillog.
If Postfix doesn't seem to be logging anything, read over your
documentation carefully. A common problem is that many
implementations won't create a log file if it doesn't already exist. If your
system logger is configured to write mail log entries to a file that doesn't
exist, you should first create the file and then restart your logging daemon.
Another problem new administrators sometimes encounter is that some
information is missing from the log. If any of your Postfix components are
chrooted, you have to take special steps to be sure that they can write
entries to your log. In particular, the syslog socket file must be
available within the Postfix chroot directory. If you are not familiar with Unix
chroot, how it works, and what system resources Postfix needs, you're
better off not using this feature.
chrooting is configured in the
master.cf file and is discussed in Chapter 4 of Postfix: The Definitive Guide.
Postfix provides a tremendous amount of valuable information as it runs. Get
in the habit of reviewing your logs frequently. Look particularly for errors and
warnings. Unix utilities such as
awk, and Perl are invaluable tools to
manipulate, filter, and extract precise information from your log files. You can
run the following simple command regularly to pull out any log entries reporting
$ egrep '(reject|warning|error|fatal|panic):' /var/log/maillog
Since Postfix does not run as a single process, but rather as several
different components, each handling a different aspect of mail delivery (see
Chapter 2 of Postfix: The Definitive Guide), each component writes its own log entries.
For example, when Postfix is initialized, the
master daemon records that
it is starting up.
Nov 22 09:07:03 scallop postfix/master: daemon started -- version 2.0.16
All Postfix components log entries following the same general pattern. The
figure below shows the parts of the
master initialization log entry
broken into its components.
The timestamp and hostname portions are obvious enough and I'll leave them out to simplify examples in the rest of this article. The Postfix component identification includes its process ID in square brackets. The last portion of the entry is the message itself and is basically freeform. Each component writes its own relevant information. Browsing through your own mail log entries should give you a good idea of the activity of your Postfix server.
When you are trying to solve a particular problem, you may need more than just one line from your log file. Often newcomers to Postfix don't look at the complete picture in the log file when trying to understand why something doesn't work or why a message wasn't delivered as expected. The next two sections discuss how to get the whole picture from your logs.
master Doesn't Know Everything
You may find messages reported by the
master daemon indicating a
failure that occurred earlier and wonder why they're not more informative:
postfix/master: warning: process /usr/libexec/postfix/smtpd pid 3163 exit status 1 postfix/master: warning: /usr/libexec/postfix/smtpd: bad command startup -- throttling
master daemon launches other Postfix components, its
warnings report that a particular component had a problem, but the
daemon doesn't know what the problem was. In this example, you can see that it's
smtpd daemon, running under process ID 3163, that actually encountered
a problem. With this information, you can find the error as reported by the
component itself. More than likely, you will find earlier entries in your log
file to see the description of the error from the component that reported it.
grep can give you the exact lines you're looking for:
$ grep 3136 /var/log/maillog postfix/smtpd: fatal: open database /etc/postfix/aliases.db: No such file or directory
In this case, the message clearly reports that a missing file, /etc/postfix/aliases.db, is the cause of the problem.
The Queue ID Tells All
If you want to know the fate of any particular message handled by Postfix,
your log has the answer again. When a message enters the Postfix system, whether
over the network or locally, it is immediately assigned a queue ID. You can
trace its complete journey by looking at all the log entries that contain the
queue ID. For example, when a client connects to your server to deliver a
smtpd daemon logs the following.
postfix/smtpd: connect from mail.example.com[192.168.100.45] postfix/smtpd: 9634B16F7F7: client=mail.example.com[192.168.100.45]
The first line records the initial connection. The next line shows you the queue ID that has been assigned to this message, 9634B16F7F7. When other Postfix components handle the message, they reference the same queue ID and record it with their logging entries. To find out what happened to this message in your system, pull out all of the log entries that include its queue ID:
$ grep 9634B16F7F7 /var/log/maillog postfix/smtpd: 9634B16F7F7: client=localhost[127.0.0.1] postfix/cleanup: 9634B16F7F7: message-id=<20031227171249.9634B16F7F7@scallop.example.com> postfix/qmgr: 9634B16F7F7: from=<email@example.com>, size=368, nrcpt=1 (queue active) postfix/smtp: 9634B16F7F7: to=<firstname.lastname@example.org>, relay=none, delay=46, status=deferred (connect to 126.96.36.199[188.8.131.52]: No route to host)
In this case, the
smtp client tells us that it could not relay the
message to its next hop because of a networking problem. Viewing this message in
the queue shows the same queue ID:
$ mailq Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient------- 9634B16F7F7 346 Sun Nov 22 09:01:08 email@example.com (connect to 184.108.40.206[220.127.116.11]: No route to host) firstname.lastname@example.org
A successfully delivered message looks like the following:
postfix/smtpd: 4F9D116F7F7: client=localhost[127.0.0.1] postfix/cleanup: 4F9D116F7F7: message-id=<20031228154034.4F9D116F7F7@scallop.example.com> postfix/qmgr: 4F9D116F7F7: from=<email@example.com>, size=344, nrcpt=1 (queue active) postfix/smtp: 4F9D116F7F7: to=<firstname.lastname@example.org>, relay=mail.example.com[18.104.22.168], delay=11, status=sent (250 Ok: queued as 51DCA20DD70)
In this case, the message was accepted by the host mail.example.com. Once delivered, the message becomes the responsibility of the receiving system.
If you experience a problem with a particular site or sites but find no
errors in your log, you can have Postfix increase the level of logging for the
problem site with
debug_peer_list. Postfix will then record additional
information, such as the entire conversation between the client and the server.
The parameter takes a list of sites for which Postfix should increase the
logging level according to the value set in
debug_peer_level. To get
more information on problems with a site that receives mail for the
example.com domain, set the parameters as follows.
debug_peer_list = example.com debug_peer_level = 2
After reloading Postfix and exchanging mail with the site, check your log
to see what transpires between your site and example.com. You can increase
debug_peer_level to get increasingly more information logged.
To increase logging for a particular component, add the
-v option to
entries in master.cf. For example, if you want to see more information
when Postfix sends messages over the network, add
-v to the
smtp component. You can add multiple
-v options to increase
the level of logging.
smtp unix - - n - - smtp -v -v
Because increased logging can generate a tremendous amount of information, don't set high logging levels during your normal operation. Increase your logging levels temporarily to solve a problem and then restore your configuration to normal logging levels once you have resolved the issue.
If you run into mysterious problems that aren't explained by log entries,
Postfix makes it easy to run with a debugger. Problems with your hardware or
kernel might manifest themselves in Postfix since it is a long-running, active
program. You can add the
-D option to components in master.cf.
See the DEBUG_README file that comes with the Postfix distribution for
detailed information on running tracers and debuggers with Postfix programs.
Email errors may seem mysterious when they crop up, but having good information is the key to understanding your problem. Working with the information Postfix provides, you can quickly isolate and solve problems.
Kyle Dent currently works at PARC (formerly Xerox PARC) where he is focused on creating novel technologies that help people interact with information, and uses cognitive scientific techniques to model and evaluate this interaction.
O'Reilly & Associates recently (in December 2003) released Postfix: The Definitive Guide.
For more information, or to order the book, click here.
Return to ONLamp.com.