Why email was deferred?

From MyWiki

(Difference between revisions)
Jump to: navigation, search
(First draft)
 
(2 intermediate revisions not shown)
Line 1: Line 1:
-
Was trying to get my head around how to dig out the reason why we have so many emails in deferred queue
+
Was trying to get my head around how to dig out the reason why we have so many emails in deferred queue on DMZ based SMTP gateways.
-
Getting the reason why each email was deferred. Format of the output <email> %% <reason why deferred>
+
We have two layers of SMTP gateways: one inside production subnet and another one upstream in DMZ.
 +
 
 +
Getting the reason why each email was deferred first. Digging DMZ based SMTP logs first. Format of the output <email> %% <reason why deferred>
<pre>
<pre>
-
# for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ rec = $2} /reason/{reason = $2 } END {print rec" %% "reason}' $d; done| sort | uniq
+
dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ rec = $2} /reason/{reason = $2 } END {print rec" %% "reason}' $d; done| sort | uniq
5856b200-6e15-427a-8760-d9f43542fd69@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out
5856b200-6e15-427a-8760-d9f43542fd69@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out
6c6f1bae-ee74-4f94-bc24-dbd39534d9e2@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out
6c6f1bae-ee74-4f94-bc24-dbd39534d9e2@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out
</pre>
</pre>
-
Digging out where the connection came from:
+
Digging out where the connection came from on DMZ based SMTP doesn't give much - email was sent a day or two ago and connections now come from localhost:
<pre>
<pre>
-
# for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do grep `basename ${d}` /var/log/mail.log.1 | grep client | awk -F= '{print $2}'; done | sort | uniq -c
+
dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do grep `basename ${d}` /var/log/mail.log.1 | grep client | awk -F= '{print $2}'; done | sort | uniq -c
     36 localhost[127.0.0.1]
     36 localhost[127.0.0.1]
</pre>
</pre>
-
Which server sent the email to the upstream SMTP gateway:
+
So, we need to go one level down into production SMTP server and dig there which server sent the email to the upstream SMTP gateway. Before we do that, we create <tt>/tmp/def.txt</tt> file that has list of email addresses to which emails were deferred on the upstream DMZ based SMTP gateway, one email address per line:
 +
 
 +
<pre>
 +
dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ print $2}' $d; done | sort | uniq > /tmp/def.txt
 +
</pre>
 +
 
 +
Now we bring the the file <tt>/tmp/def.txt</tt> to production SMTP and dig there:
<pre>
<pre>
-
$ while read line; do for e in `grep $line /var/log/mail.log | awk '{print $6}' | sed -e 's/://'`; do grep $e /var/log/mail.log | grep client | awk -F= '{print $2}'; done ; done < /tmp/def.txt | sort | uniq -c
+
prod-smtp $ while read line; do for e in `grep $line /var/log/mail.log | awk '{print $6}' | sed -e 's/://'`; do grep $e /var/log/mail.log | grep client | awk -F= '{print $2}'; done ; done < /tmp/def.txt | sort | uniq -c
       1 www-02.production[192.168.114.11]
       1 www-02.production[192.168.114.11]
       5 svn-01.production[192.168.0.173]
       5 svn-01.production[192.168.0.173]

Current revision as of 22:49, 25 September 2014

Was trying to get my head around how to dig out the reason why we have so many emails in deferred queue on DMZ based SMTP gateways.

We have two layers of SMTP gateways: one inside production subnet and another one upstream in DMZ.

Getting the reason why each email was deferred first. Digging DMZ based SMTP logs first. Format of the output <email> %% <reason why deferred>

dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ rec = $2} /reason/{reason = $2 } END {print rec" %% "reason}' $d; done| sort | uniq
5856b200-6e15-427a-8760-d9f43542fd69@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out
6c6f1bae-ee74-4f94-bc24-dbd39534d9e2@test.com %% connect to test.com[208.64.121.161]:25: Connection timed out

Digging out where the connection came from on DMZ based SMTP doesn't give much - email was sent a day or two ago and connections now come from localhost:

dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do grep `basename ${d}` /var/log/mail.log.1 | grep client | awk -F= '{print $2}'; done | sort | uniq -c
     36 localhost[127.0.0.1]

So, we need to go one level down into production SMTP server and dig there which server sent the email to the upstream SMTP gateway. Before we do that, we create /tmp/def.txt file that has list of email addresses to which emails were deferred on the upstream DMZ based SMTP gateway, one email address per line:

dmz-smtp # for d in `find /opt/pmx6/postfix/var/spool/mqueue/defer ! -type d -print`; do awk -F= '/recipient/{ print $2}' $d; done | sort | uniq > /tmp/def.txt

Now we bring the the file /tmp/def.txt to production SMTP and dig there:

prod-smtp $ while read line; do for e in `grep $line /var/log/mail.log | awk '{print $6}' | sed -e 's/://'`; do grep $e /var/log/mail.log | grep client | awk -F= '{print $2}'; done ; done < /tmp/def.txt | sort | uniq -c
      1 www-02.production[192.168.114.11]
      5 svn-01.production[192.168.0.173]
      4 web-02.production[192.168.48.12]
      1 web-03.production[192.168.48.13]
     10 webx-01.production[192.168.107.20]
     10 webx-02.production[192.168.107.21]
      8 webx-03.production[192.168.107.22]
Personal tools