Why email was deferred?
From MyWiki
(Added comment about what /tmp/def.txt is all about) |
|||
Line 1: | Line 1: | ||
- | Was trying to get my head around how to dig out the reason why we have so many emails in deferred queue | + | Was trying to get my head around how to dig out the reason why we have so many emails in deferred queue on DMZ based SMTP gateways. |
- | Getting the reason why each email was deferred. Format of the output <email> %% <reason why deferred> | + | We have two layers of SMTP gateways: one inside production subnet and another one upstream in DMZ. |
+ | |||
+ | Getting the reason why each email was deferred first. Digging DMZ based SMTP logs first. Format of the output <email> %% <reason why deferred> | ||
<pre> | <pre> | ||
- | # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ rec = $2} /reason/{reason = $2 } END {print rec" %% "reason}' $d; done| sort | uniq | + | dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ rec = $2} /reason/{reason = $2 } END {print rec" %% "reason}' $d; done| sort | uniq |
5856b200-6e15-427a-8760-d9f43542fd69@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out | 5856b200-6e15-427a-8760-d9f43542fd69@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out | ||
6c6f1bae-ee74-4f94-bc24-dbd39534d9e2@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out | 6c6f1bae-ee74-4f94-bc24-dbd39534d9e2@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out | ||
</pre> | </pre> | ||
- | Digging out where the connection came from: | + | Digging out where the connection came from on DMZ based SMTP doesn't give much - email was sent a day or two ago and connections now come from localhost: |
<pre> | <pre> | ||
- | # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do grep `basename ${d}` /var/log/mail.log.1 | grep client | awk -F= '{print $2}'; done | sort | uniq -c | + | dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do grep `basename ${d}` /var/log/mail.log.1 | grep client | awk -F= '{print $2}'; done | sort | uniq -c |
36 localhost[127.0.0.1] | 36 localhost[127.0.0.1] | ||
</pre> | </pre> | ||
- | + | So, we need to go one level down into production SMTP server and dig there which server sent the email to the upstream SMTP gateway. Before we do that, we create <tt>/tmp/def.txt</tt> file that has list of email addresses to which emails were deferred on the upstream DMZ based SMTP gateway, one email address per line: | |
+ | |||
+ | <pre> | ||
+ | dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ print $2}' $d; done | sort | uniq > /tmp/def.txt | ||
+ | </pre> | ||
+ | |||
+ | Now we bring the the file <tt>/tmp/def.txt</tt> to production SMTP and dig there: | ||
<pre> | <pre> | ||
- | $ while read line; do for e in `grep $line /var/log/mail.log | awk '{print $6}' | sed -e 's/://'`; do grep $e /var/log/mail.log | grep client | awk -F= '{print $2}'; done ; done < /tmp/def.txt | sort | uniq -c | + | prod-smtp $ while read line; do for e in `grep $line /var/log/mail.log | awk '{print $6}' | sed -e 's/://'`; do grep $e /var/log/mail.log | grep client | awk -F= '{print $2}'; done ; done < /tmp/def.txt | sort | uniq -c |
1 www-02.production[192.168.114.11] | 1 www-02.production[192.168.114.11] | ||
5 svn-01.production[192.168.0.173] | 5 svn-01.production[192.168.0.173] |
Current revision as of 22:49, 25 September 2014
Was trying to get my head around how to dig out the reason why we have so many emails in deferred queue on DMZ based SMTP gateways.
We have two layers of SMTP gateways: one inside production subnet and another one upstream in DMZ.
Getting the reason why each email was deferred first. Digging DMZ based SMTP logs first. Format of the output <email> %% <reason why deferred>
dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ rec = $2} /reason/{reason = $2 } END {print rec" %% "reason}' $d; done| sort | uniq 5856b200-6e15-427a-8760-d9f43542fd69@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out 6c6f1bae-ee74-4f94-bc24-dbd39534d9e2@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out
Digging out where the connection came from on DMZ based SMTP doesn't give much - email was sent a day or two ago and connections now come from localhost:
dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do grep `basename ${d}` /var/log/mail.log.1 | grep client | awk -F= '{print $2}'; done | sort | uniq -c 36 localhost[127.0.0.1]
So, we need to go one level down into production SMTP server and dig there which server sent the email to the upstream SMTP gateway. Before we do that, we create /tmp/def.txt file that has list of email addresses to which emails were deferred on the upstream DMZ based SMTP gateway, one email address per line:
dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ print $2}' $d; done | sort | uniq > /tmp/def.txt
Now we bring the the file /tmp/def.txt to production SMTP and dig there:
prod-smtp $ while read line; do for e in `grep $line /var/log/mail.log | awk '{print $6}' | sed -e 's/://'`; do grep $e /var/log/mail.log | grep client | awk -F= '{print $2}'; done ; done < /tmp/def.txt | sort | uniq -c 1 www-02.production[192.168.114.11] 5 svn-01.production[192.168.0.173] 4 web-02.production[192.168.48.12] 1 web-03.production[192.168.48.13] 10 webx-01.production[192.168.107.20] 10 webx-02.production[192.168.107.21] 8 webx-03.production[192.168.107.22]