Although Postfix can be configured to run 1000 SMTP client
processes at the same time, it is rarely desirable that it makes
1000 simultaneous connections to the same remote system. For this
reason, Postfix has safety mechanisms in place to avoid this
so-called "thundering herd" problem.
The Postfix queue manager implements the analog of the TCP slow
start flow control strategy: when delivering to a site, send a
small number of messages first, then increase the concurrency as
long as all goes well; reduce concurrency in the face of congestion.
-
The
initial_destination_concurrency parameter (default: 5)
controls how many messages are initially sent to the same destination
before adapting delivery concurrency. Of course, this setting is
effective only as long as it does not exceed the process limit and
the destination concurrency limit for the specific mail transport
channel.
-
The
default_destination_concurrency_limit parameter (default:
20) controls how many messages may be sent to the same destination
simultaneously. You can override this setting for specific message
delivery transports by taking the name of the master.cf entry
and appending "_destination_concurrency_limit".
Examples of transport specific concurrency limits are:
-
The
local_destination_concurrency_limit parameter (default:
2) controls how many messages are delivered simultaneously to the
same local recipient. The recommended limit is low because delivery
to the same mailbox must happen sequentially, so massive parallelism
is not useful. Another good reason to limit delivery concurrency
to the same recipient: if the recipient has an expensive shell
command in her .forward file, or if the recipient is a mailing list
manager, you don't want to run too many instances of those processes
the same time.
-
The default
smtp_destination_concurrency_limit of 20 seems
enough to noticeably load a system without bringing it to its knees.
Be careful when changing this to a much larger number.
The above default values of the concurrency limits work well
in a broad range of situations. Knee-jerk changes to these parameters
in the face of congestion can actually make problems worse.
Specifically, large destination concurrencies should never be the
default. They should be used only for transports that deliver mail
to a small number of high volume domains.
A common situation where high concurrency is called for is on
gateways relaying a high volume of mail from between the Internet
and an intranet mail environment. Approximately half the mail
(assuming equal volumes inbound and outbound) will be destined
for the internal mail hubs. Since the internal mail hubs will be
receiving all external mail exclusively from the gateway, it is
reasonable to configure the gateway to make greater demands on the
capacity of the internal SMTP servers.
The tuning of the inbound concurrency limits need not be trial
and error. A high volume capable mailhub should be able to easily
handle 50 or 100 (rather than the default 20) simultaneous connections,
especially if the gateway forwards to multiple MX hosts. When all
MX hosts are up and accepting connections in a timely fashion,
throughput will be high. If any MX host is down and completely
unresponsive, the average connection latency rises to at least 1/N
* $smtp_connection_timeout, if there are N MX hosts. This limits
throughput to at most the destination concurrency * N /
$smtp_connection_timeout.
For example, with a destination concurrency of 100 and 2 MX
hosts, each host will handle up to 50 simultaneous connections. If
one MX host is down and the default SMTP connection timeout is 30s,
the throughput limit is 100 * 2 / 30 ~= 6 messages per second. This
suggests that high volume destinations with good connectivity and
multiple MX hosts need a lower connection timeout, values as low
as 5s or even 1s can be used to prevent congestion when one or
more, but not all MX hosts are down.
If necessary, set a higher transport_destination_concurrency_limit
(in main.cf since this is a queue manager parameter) and a lower
smtp_connection_timeout (with a "-o" override in master.cf since
this parameter has no per-transport name) for the relay transport
and any transports dedicated for specific high volume destinations.