Building a Postfix-based mail system for incoming and outgoing email, capable of successfully sending one million emails per day

It was necessary to build an updated mail system for a client which would handle all incoming and outgoing email, and which could handle successfully sending out an average of one million emails per day. This was based on Postfix, since Postfix is known for reliability, robustness, security, and relative ease of administration. Building a Postfix mail system capable of handling so many emails is quite a significant aim at a time when establishing a positive reputation for independent mail servers delivering high volumes of email is quite a challenging goal.

Just to be clear before I go any further: these outgoing emails are not junk or spam emails. These are legitimate marketing emails to users, managed ethically and legally and with full consent of the site users. I should also point out that I’ve changed, redacted or removed any details in the information below and the data shared on GitHub which could pose any security issues.

This mail infrastructure was built in an AWS environment, but it would apply equally to any other hosting environment, whether cloud-based or in a more traditional hosting environment. The operating system used was CentOS Linux version 7, but this setup could be adapted to any Red Hat type release with minimal changes, and also to any Debian/Ubuntu type Linux distribution with a little more work.

Accompanying files on GitHub

All the config files, script files and cron jobs mentioned in this article can be found in the postfix_mail_system folder in my GitHub repository so you can examine the setup in more detail, use whatever bits and pieces are useful for you, and modify things as needed.

Mail system overview

mail system overview diagram
Mail system overview diagram

The main mail server, also known as the mail exchanger or MX server, is identified as “mx1”. The company’s MX record in the DNS is pointed at this machine, and it handles all incoming and outgoing mail for the company with the exception of large mail-outs from the site.

mx1 relays company mail to Google Workspace (previously known as G Suite, and prior to that Google Apps) for staff to access via Gmail.

I also built a mail cluster dedicated to handling large marketing mail-outs from the site. This mail cluster consists of multiple machines identified as “mail1”, “mail2”, etc. The mail contents are generated by an application running in the queue cluster and passed to the mail cluster to send the mail.

All other internally generated mail, i.e. non-bulk mail from the queue cluster plus mail from all other internal machines, is passed to mx1 for outbound delivery. Outgoing non-bulk mail is separated out this way for fast delivery, as the queues can get very big on the machines in the bulk mail cluster so delivery takes longer.

All mail servers are instances running within EC2 on AWS. The instance type should feature a reasonable balance of CPU and memory. At the moment I use instance type c4.large which provides just under 4 GB of memory and 2 vCPUs, which is sufficient for Postfix and associated tools to run comfortably and process significant quantifies of email without trouble.

All instances are configured with Elastic IPs, meaning that they have allocated static IP addresses. Elastic IPs are needed to ensure a consistent IP address for building up mail reputation, and also to enable reverse DNS which is vital for helping to prevent outgoing emails being marked as spam. Once an Elastic IP is set up, it’s necessary to complete this form in order to remove AWS mail sending limitations and to enable AWS to assign the reverse DNS as required.

Outgoing mail distribution

Bulk Mail Servers for large mail-outs

Large site mail-outs are sent directly to the cluster of bulk mail servers from the application running on the queue servers. Email delivery is distributed evenly across the mail cluster using round-robin DNS, meaning that the A record for the mail cluster contains all the mail server IP addresses, and every few minutes it cycles to a different IP address in the cluster ensuring balanced delivery.

Other outgoing email

Other emails from internal servers are processed via the local Postfix daemon on those servers, which is configured to relay all mail to the MX Server via SASL authentication for external delivery.

Software

Postfix

The main mail system software, i.e. the Mail Transport Agent or “MTA”, is Postfix. The version of Postfix currently in use, provided as a standard package by this version of CentOS, is 2.10. I’ll go into some detail below on the significant config files to explain what they’re used for and how they work. All these files are available in this folder in my GitHub repo.

/etc/postfix/main.cf is the main configuration file. It’s a large file so it would be impractical to go into detail here describing all aspects of it. However, much of it is fairly standard and it’s also fairly self-explanatory with large sections of comments explaining how to alter the configuration as needed. There is also of course much documentation on the Postfix web site and elsewhere on the Internet to help with the configuration of a new Postfix setup.

Many of the most significant aspects of main.cf are the definitions of configuration files and lookup tables for setting up many of the important details of Postfix’s operation, so I will go into more detail below on the most important of these. Later I will also describe certain specific sections of main.cf as needed to describe the operation of various other aspects of the mail system

/etc/postfix/master.cf is the master process file. Most of the configuration can usually be left as standard, though I added interface configuration for spawning pypolicyd-spf as needed (see below for more on pypolicyd-spf):

#
# pypolicyd-spf for SPF checking
#
policyd-spf unix - n n - 0 spawn user=nobody argv=/bin/python /usr/libexec/postfix/policyd-spf

This special interface is referenced from /etc/postfix/main.cf, the main configuration file.

/etc/postfix/header_checks is used to strip internal mail headers on outgoing mail for privacy/security reasons:

# Strip internal headers on outgoing mail for privacy/security
/^Received: (by|from) .*.internal.company.com/ IGNORE

I also use /etc/postfix/header_checks to add subject lines to the logs which is very useful when diagnosing delivery issues:

# Add subject lines to mail logs
/^[Ss]ubject:/ WARN

/etc/postfix/helo_reject_domains helps to stop spam by rejecting our company domain, and other suspicious oddities, in the HELO greeting. For example:

# Reject if HELO contains our domain
company.com REJECT Invalid name

/etc/postfix/relay and /etc/postfix/transport are used to provide the mechanism for emails to the company domain to be relayed on to Google Workspace. First of all, relay simply contains:

company.com relay

Then transport is a little more specific:

company.com relay:aspmx.l.google.com

/etc/postfix/virtual_domains is used to specify the company domain so that Postfix knows to accept and process this domain. Postfix oddly expects this file to have two columns, but it’s sufficient to put a meaningless comment in the second column:

company.com #a_comment_goes_here

/etc/postfix/virtual_aliases is used to specify treatment of specific email addresses arriving for our company domain (as specified in virtual_domains). The first two entries will pipe emails to custom scripts, as described later. After that go entries for additional aliases which will relay email on to accounts set up at Google Workspace:

bounces@company.com             bounces
donotreply@company.com          donotreply
info@company.com                bill@company.com
enquiries@company.com           dave@company.com
# Etc…

/etc/aliases is used to control how email aliases are processed locally. It contains important addresses such as root, and also bounces@ and donotreply@ so these addresses are piped to the appropriate scripts for processing (more information on this below):

# Dealing with bounces and donotreply email
bounces:        "| /usr/local/bin/bouncedmail.py"
donotreply:     "| /usr/local/bin/autoreply.pl /etc/postfix/donotreply.txt"

It’s necessary to run newaliases after the aliases file has been edited.

/etc/postfix/sender_access is used to manually blacklist and whitelist sender addresses:

spammer@dodgydomain.com REJECT

For some of these configuration files it’s necessary to run postmap after editing them, to convert them into the .db lookup table format required. If unsure, checking how they’re referenced in main.cf will confirm whether to create a .db lookup table or not.

pypolicyd-spf

pypolicyd-spf is installed as a CentOS package and is spawned as needed by Postfix (see above for interface definition) for SPF record checking of incoming mail on the MX Server, which is important for helping to prevent spam. The config can be found in this folder in my GitHub repo.

The configuration file is located at /etc/python-policyd-spf/policyd-spf.conf. It is currently set to block failed SPF authentication attempts for HELO checks and Mail From checks, which is the default configuration:

HELO_reject = SPF_Not_Pass
Mail_From_reject = Fail

The following sections in /etc/postfix/main.cf handle communication with pypolicyd-spf:

smtpd_recipient_restrictions =
  # SPF checking via policyd-spf
  check_policy_service unix:private/policyd-spf

policyd-spf_time_limit = 3600

OpenDKIM

OpenDKIM is installed as a CentOS package on all mail servers and runs as a process which Postfix communicates with to add DKIM authentication to outgoing email. On the MX Server it also checks incoming mail and adds a header to indicate the DKIM authentication result. The configuration files are in this folder in my GitHub repo.

The opendkim-genkey command is used to generate the private key for signing and the corresponding public key which must be entered into a DNS TXT record. The corresponding key files are stored in /etc/opendkim/keys.

The file /etc/sysconfig/opendkim specifies parameters for starting the service. The main configuration file is /etc/opendkim.conf and this references other config files in the /etc/opendkim folder. The most important key definitions for signing messages are in /etc/opendkim/KeyTable:

trusted._domainkey.company.com company.com:trusted:/etc/opendkim/keys/trusted.private

And in /etc/opendkim/SigningTable:

* trusted._domainkey.company.com

The DNS TXT record trusted._domainkey.company.com then contains the public key as previously generated, so if for example the public key is “abcdef123456UVWXYZ” then the TXT record will be:

trusted._domainkey.company.com. IN TXT "v=DKIM1; k=rsa; p=abcdef123456UVWXYZ"

The following section in /etc/postfix/main.cf handles communication with the OpenDKIM process:

# DKIM
smtpd_milters           = inet:127.0.0.1:8891
non_smtpd_milters       = $smtpd_milters
milter_default_action   = accept
milter_protocol         = 2

Postgrey

Postgrey is installed as a CentOS package and runs as a process which Postfix communicates with to provide greylisting on the MX Server. The configuration can be found in this folder in my GitHub repo.

Postgrey runs with arguments configured in /etc/sysconfig/postgrey and is currently set to greylist new connections for five minutes, and to automatically whitelist multiple connections from the same client address:

POSTGREY_OPTS="--auto-whitelist-clients --delay=300"

It references a client whitelist file at /etc/postfix/postgrey_whitelist_clients and a recipient whitelist file (used for postmaster@ and abuse@ addresses by default) at /etc/postfix/postgrey_whitelist_recipients. It’s possible also to manually add incoming servers to a client whitelist using the file /etc/postfix/postgrey_whitelist_clients.local.

The following section in /etc/postfix/main.cf handles communication with Postgrey:

smtpd_recipient_restrictions =
  # Greylisting via postgrey
  check_policy_service unix:postgrey/socket

pflogsumm

pflogsumm is installed as part of the postfix-perl-scripts CentOS package. It generates mail reports and stats from the Postfix logs which are sent via email. It runs daily from my cron script /etc/cron.daily/pflogsumm (also available here in my GitHub repo):

#!/bin/bash
 
# Create temp files and define variables
 
tmpfile1=$(mktemp)
tmpfile2=$(mktemp)
day="$(date +%A)"
today="$(date +%Y%m%d)"
yesterday="$(date -d yesterday +%Y%m%d)"

# Concatenate recent log files together for processing
 
zcat /var/log/maillog-${yesterday}.gz > $tmpfile1
cat /var/log/maillog-${today} >> $tmpfile1

# Grep out subject lines as they cause problems

grep -v "warning: header Subject:" $tmpfile1 > $tmpfile2

# Run pflogsumm on yesterday's logs from processed log files and email results to tech team
 
/usr/sbin/pflogsumm -d yesterday --verbose_msg_detail --detail 20 --bounce_detail 20 --deferral_detail 20 --reject_detail 20 --smtpd_warning_detail 20 --no_no_msg_size $tmpfile2 | mail -s "Mail stats from $(hostname -s)" techteam@company.com

rm -f $tmpfile1 $tmpfile2

Scripts

There are a number of proprietary scripts written to add additional functionality to the mail system, as described below. These are available in this folder in my GitHub repo.

Autoreply

The Perl script /usr/local/bin/autoreply.pl is used to generate autoreply messages using the text from /etc/postfix/donotreply.txt to provide the actual message body in each case. These definitions are found in /etc/aliases which in turn are referenced from /etc/virtual_aliases as needed (see above). This is used on the MX Server.

Bounced mail

The Python script /usr/local/bin/bouncedmail.py is installed on all mail servers to handle mail which has bounced back to bounces@company.com, via the relevant Postfix configuration (/etc/postfix/virtual_aliases and /etc/aliases as described above). It takes the message_id from the email headers, and it takes the body of the email, and inserts both into a table in the database which can later be queried as needed. This can be modified as per individual requirements.

Blocking SASL authentication failures

The Bash script /usr/local/bin/block_sasl_fail.sh is installed on the MX Server to block IP addresses that are trying to determine the SASL password for Postfix in order to use our mail system for spamming. Its database resets once per day as a result of log rotation.

It runs every five minutes from /etc/cron.d/block_sasl_fail and scans the mail log for SASL authentication failures, extracts the IP addresses and puts them in /etc/postfix/access_dynamic so that Postfix can dynamically block the addresses.

SASL authentication for internal connections

For added security and protection against open relaying, mail being relayed from one internal machine to another uses SASL to authenticate. This is enabled in the following section in /etc/postfix/main.cf:

# Turn on SMTP authentication
smtpd_sasl_auth_enable = yes

There are other options and procedures relating to SASL setup but I won’t go into more specific detail in order to avoid the tiny risk of compromising any security on the system. It’s relatively easy to set up and configure in the way that you need, and there’s a starting point in the documentation on the Postfix website.

Encryption

TLS encryption for connections between mail servers is quite standard nowadays. To set it up you’ll need one or more certificates to cover all mail servers. For example, if all mail server names are in the subdomain internal.company.com, buying a wildcard certificate for *.internal.company.com would cover all mail servers. TLS is configured in /etc/postfix/main.cf as follows. These are fairly standard configuration options which I found worked for me:

# TLS
smtp_tls_CAfile = /etc/ssl/certs/ca-bundle.crt
smtp_tls_security_level = may
smtp_tls_note_starttls_offer = yes
smtp_tls_loglevel = 1
smtpd_tls_CAfile = /etc/ssl/certs/ca-bundle.crt
smtpd_tls_cert_file = # redacted for security
smtpd_tls_key_file = # redacted for security
smtpd_tls_security_level = may
smtpd_tls_loglevel = 1
smtpd_tls_received_header = yes

Again, if you need more help setting up TLS for Postfix, the documentation on the Postfix website has a comprehensive starting point.

Anti-spam measures for incoming mail

In addition to measures already discussed, there are numerous additional parameters to prevent spam and implement other important authentication and security requirements, as defined in /etc/postfix/main.cf on the MX Server. Some key examples are as follows:

  • Restriction of relaying to internal machines (vitally important in order not to be used as an open relay).
  • Sleeping for 20 seconds during negotiation to help deter bots used for spamming.
  • Requirement for proper HELO message during negotiation.
  • Rejection of invalid HELO syntax or non-fully-qualified domains in HELO.
  • Rejection of senders without fully-qualified domains.
  • Rejection of senders with invalid DNS.
  • Rejection of improper pipelining in SMTP conversation.
  • Rejection of recipient domains that are not fully-qualified or have invalid DNS.
  • Use of Realtime Blackhole Lists – Barracuda Central and Spamhaus – to block known blacklisted IPs.
  • Rejection of unauthorised pipelining (sending multiple SMTP commands in a batch without waiting for a response to each).

The full list of restrictions and permissions for anti-spam, anti-relay, authentication and security in main.cf is as follows. These are divided into sections for client restrictions, HELO restrictions, sender restrictions and recipient restrictions. These are all commented so they are reasonably self-explanatory:

# Don't allow full transaction before rejection
# otherwise dynamic SASL abuse blocking won't work
# (drawback is less detailed logging for rejected messages)
smtpd_delay_reject = no
smtpd_client_restrictions =
  # IP blacklisting from block_sasl_fail.sh script
  check_client_access hash:/etc/postfix/access_dynamic
  # Manual IP blacklisting
  check_client_access hash:/etc/postfix/access
  # Accept from trusted machines
  permit_mynetworks
  # Accept from authenticated machines
  permit_sasl_authenticated
  # Pause transaction to discourage less determined spammers
  sleep 20
  permit
# Require HELO
smtpd_helo_required = yes
smtpd_helo_restrictions =
  # Accept from trusted machines
  permit_mynetworks
  # Accept from authenticated machines
  permit_sasl_authenticated
  # Reject if HELO is non-fully-qualified domain
  # (may need to remove if too many false positives)
  reject_non_fqdn_helo_hostname
  # Reject if HELO syntax is invalid 
  # (may need to remove if too many false positives)
  reject_invalid_helo_hostname
  permit
smtpd_sender_restrictions =
  # Accept from trusted machines
  permit_mynetworks
  # Accept from authenticated machines
  permit_sasl_authenticated
  # Reject non-fully-qualified sender domain
  reject_non_fqdn_sender
  # Reject sender domain with invalid DNS
  reject_unknown_sender_domain
  permit
smtpd_recipient_restrictions =
  # Accept from trusted machines
  permit_mynetworks
  # Accept from authenticated machines
  permit_sasl_authenticated
  # Reject improper pipelining
  reject_unauth_pipelining
  # Reject domains we don't accept, so we're not an open relay
  reject_unauth_destination
  # Reject non-fully-qualified recipient domain
  reject_non_fqdn_recipient
  # Reject recipient domain with invalid DNS
  reject_unknown_recipient_domain
  # Reject if on DNSBL
  reject_rbl_client b.barracudacentral.org
  # Reject if on DNSBL
  reject_rbl_client zen.spamhaus.org
  # Manual blacklisting/whitelisting of sender addresses
  check_sender_access hash:/etc/postfix/sender_access
  # SPF checking via policyd-spf
  check_policy_service unix:private/policyd-spf
  # Greylisting via postgrey
  check_policy_service unix:postgrey/socket
  permit
policyd-spf_time_limit = 3600

Outgoing mail authentication

Outbound mail authentication and validation is very important so that receiving mail servers can run these checks to verify that mail sent out is legitimate email from the company. This will help to ensure that email is treated as valid, i.e. placed into recipient inboxes rather than placed into their spam folders or silently dropped altogether. Below I’ve explained generally how these are set up, and where you can find more information on getting up and running with the relevant DNS entries. There are various online validation services you can use to check your SPF, DKIM and DMARC configuration once you have them set up, such as this one.

DKIM

For DKIM I used OpenDKIM (see above for the configuration and DNS entry) which signs outgoing email with the necessary headers to match the public key in the DKIM TXT record in the DNS, so that the DKIM test will pass when our mail is received at the other end.

More information on DKIM.

SPF

For SPF we have the necessary DNS record which advises receivers of the valid IPs for mail sent from the company domain. It includes all company MX servers plus two specific mail server IP addresses, plus the Google SPF definition for mail send from Google Workspace:

company.com. IN TXT "v=spf1 mx ip4:1.2.3.4 ip4:5.6.7.8 include:_spf.google.com ~all"

More information on SPF.

DMARC

For DMARC I set up the relevant DNS record which ensures receivers can use DMARC to verify mail received from our domains via DKIM and SPF and handle it accordingly:

_dmarc.company.com. IN TXT "v=DMARC1; p=quarantine; pct=100; rua=mailto:dmarc@company.com;"

More information on DMARC.

Mail reputation

For Google and Microsoft I signed up with their feedback loops which send us emails that have been flagged as spam in their email services so they can be dealt with accordingly.

I currently use two main sources for keeping track of the reputation of the mail system and emails delivered from it. These are important to monitor in order to see if there are issues with authentication, encryption, reputation of specific IPs and domains, etc.

Google Postmaster Tools

This is for email sent to Google Gmail. It can be set up here and it shows:

  • Spam rate.
  • IP reputation.
  • Domain reputation.
  • Feedback loop data.
  • Authentication (DKIM, SPF and DMARC).
  • Encryption (TLS).
  • Delivery errors.

Microsoft SNDS

This is for email sent to Hotmail, Outlook and other Microsoft email services. It can be set up here and it shows:

  • Numerical data about incoming emails from each IP address.
  • Reputation of each IP.
  • Complaint rates.

Queue information, monitoring, performance and throughput

Monitoring of load and other metrics

Monitoring of CPU load, swap usage, and other metrics can be performed using Nagios or any other suitable monitoring system that you prefer to use.

If you’re using AWS, some current and historical data about CPU usage, storage IO and network IO for the mail servers can be found in the AWS console. Other cloud infrastructure providers will have similar monitoring options.

Throughput and other mail delivery data

Each day, email reports are generated on each mail server with pflogsumm which show the amount of email sent that day, along with other stats (see above for how pflogsumm is configured and run). This will produce a detailed report sent to you via email, with general summaries at the top like this, followed by breakdowns for traffic summaries, sender and recipient details, details of referrals, bounces, rejections, etc:

300083   received
300208   delivered
     0   forwarded
  2519   deferred  (33494  deferrals)
   106   bounced
     0   rejected (0%)
     0   reject warnings
     0   held
     0   discarded (0%)

  9770m  bytes received
  9782m  bytes delivered
     2   senders
     2   sending hosts/domains
158087   recipients
  7515   recipient hosts/domains

Mail queues

I wrote a Bash script which produces JSON output that can function as a simple web app/API to show queue information on the mail servers. This can be installed on any server running web server software and with suitable SSH access to the mail servers (this will involve some relevant tweaking of user accounts, group settings, SSH keys and sudo permissions). The cron file /etc/cron.d/mailqueues runs the script /usr/local/bin/mailqueues.sh and the end product is /var/www/html/mailqueues/index.html which can be served by a suitably configured web server. The script and cron config can be found here in my GitHub repo.

Making a request from curl will produce output like the following, showing the amount of mail in the active and deferred queues on all the mail servers. The active queue is mail currently being processed and delivered. The deferred queue is mail that was previously temporarily undeliverable for one reason or another so is being held in this queue until the next delivery attempt. Obviously you can adapt this script and setup as needed for your own environment:

{ "mailqueues": {
     "mail1": {
       "activemailqueue": "34",
       "deferredmailqueue": "1548"
     },
     "mail2": {
       "activemailqueue": "34",
       "deferredmailqueue": "1557"
     },
     "mail3": {
       "activemailqueue": "35",
       "deferredmailqueue": "1707"
     },
     "mail4": {
       "activemailqueue": "40",
       "deferredmailqueue": "1736"
     },
     "mx1": {
       "activemailqueue": "0",
       "deferredmailqueue": "99"
     }
 } }

Scaling, redundancy and backups

Scaling

Should any performance issues occur, there are scaling options for the mail system which can be explored.

For the mail cluster used for mail-outs, it has already been expanded to cope with load issues during periods of intensive mail distribution activity, and it could be expanded again should the need arise. It’s very simple to clone an existing server to a new one using the imaging tools provided in AWS EC2. The bigger challenge is slowly building up its reputation as a new mail server so that emails don’t start getting rejected or marked as spam by Gmail, Hotmail/Outlook, Yahoo Mail, AOL, etc. This can be achieved by controlling the DNS round-robin such that the new server only takes a small amount of email per day to begin with, then this is increased slowly over time until it is able to relay the same amount of mail as the other machines in the cluster.

The MX server has never encountered any performance issues so has never needed to be scaled, but it could be cloned and its functionality duplicated across multiple machines if needed.

Redundancy

The mail cluster obviously has redundancy already as it consists of four servers, so the loss of one server would be manageable whilst launching a new server and building up reputation.

There is no redundancy for the MX server because should it ever fail, a new server could be launched very quickly from backups to replace it. Incoming mail unable to connect would be queued and retried successfully once the instance was replaced, so there would be no significant loss of mail. If this mail system were running in a less reliable environment, I would have set up a backup MX server to keep things running in a more stable and robust manner, but this has never been necessary on EC2.

Backups and config management

Both mail server types are backed up regularly so can be recreated very quickly as needed. Additionally, all the configuration is stored in a git repository and deployed via a configuration management system, so updating any server with the latest config is fast and trivial.

Conclusion

I hope this has provided some useful insights into building, maintaining and running a high-transaction email system in a secure, robust and efficient manner. Don’t forget that all the config files, scripts and cron jobs mentioned above are available in my GitHub repo. If you have any questions, or want help setting up an email system, or have issues with an existing email system, please feel free to get in touch. There are more details about me and the SysAdmin & DevOps services I provide on my website.