add ansible role journal-postfix (a log parser for Postfix) with playbook and doc

2019-12-15 19:10:28 +01:00 · 2019-12-15 19:10:28 +01:00 · e5a8025064
parent 713372c850
commit e5a8025064
14 changed files with 3570 additions and 0 deletions
--- a/journal-postfix-doc/20191127_pyugat_talk.html
+++ b/journal-postfix-doc/20191127_pyugat_talk.html
@ -0,0 +1,378 @@
+<h1>journal-postfix - A log parser for Postfix</h1>
+<p>Experiences from applying Python to the domain of bad old email.</p>
+<h2>Email ✉</h2>
+<ul>
+<li>old technology (starting in the 70ies)</li>
+<li><a href="https://en.wikipedia.org/wiki/Store_and_forward">store-and-forward</a>: sent != delivered to recipient</li>
+<li>non-delivery reasons:
+<ul>
+<li>recipient over quota</li>
+<li>inexistent destination</li>
+<li>malware</li>
+<li>spam</li>
+<li>server problem</li>
+<li>...</li>
+</ul></li>
+<li>permanent / non-permanent failure (<a href="https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml">DSN ~ 5.X.Y / 4.X.Y</a>)</li>
+<li>non-delivery modes
+<ul>
+<li>immediate reject on SMTP level</li>
+<li>delayed <a href="https://en.wikipedia.org/wiki/Bounce_message">bounce messages</a> by <a href="https://upload.wikimedia.org/wikipedia/commons/a/a2/Bounce-DSN-MTA-names.png">reporting MTA</a> - queueing (e.g., ~5d) before delivery failure notification</li>
+<li>discarding</li>
+</ul></li>
+<li>read receipts</li>
+<li><a href="https://en.wikipedia.org/wiki/Email_tracking">Wikipedia: email tracking</a></li>
+</ul>
+<h2><a href="https://en.wikipedia.org/wiki/SMTP">SMTP</a></h2>
+<p><a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example">SMTP session example</a>: envelope sender, envelope recipient may differ from From:, To:</p>
+<p>Lists of error codes:</p>
+<ul>
+<li><a href="https://www.inmotionhosting.com/support/email/email-troubleshooting/smtp-and-esmtp-error-code-list">SMTP and ESMTP</a></li>
+<li><a href="https://serversmtp.com/smtp-error/">SMTP</a></li>
+<li><a href="https://info.webtoolhub.com/kb-a15-smtp-status-codes-smtp-error-codes-smtp-reply-codes.aspx">SMTP</a></li>
+</ul>
+<p>Example of an error within a bounced email (Subject: Mail delivery failed: returning message to sender)</p>
+<pre><code>SMTP error from remote server for TEXT command, host: smtpin.rzone.de (81.169.145.97) reason: 550 5.7.1 Refused by local policy. No SPAM please!
+</code></pre>
+<ul>
+<li>email users are continually asking for the fate of their emails (or those of their correspondents which should have arrived)</li>
+</ul>
+<h2><a href="http://www.postfix.org">Postfix</a></h2>
+<ul>
+<li>popular <a href="https://en.wikipedia.org/wiki/Message_transfer_agent">MTA</a></li>
+<li>written in C</li>
+<li>logging to files / journald</li>
+<li>example log messages for a (non-)delivery + stats</li>
+</ul>
+<pre><code>Nov 27 16:19:22 mail postfix/smtpd[18995]: connect from unknown[80.82.79.244]
+Nov 27 16:19:22 mail postfix/smtpd[18995]: NOQUEUE: reject: RCPT from unknown[80.82.79.244]: 454 4.7.1 &lt;spameri@tiscali.it&gt;: Relay access denied; from=&lt;spameri@tiscali.it&gt; to=&lt;spameri@tiscali.it&gt; proto=ESMTP helo=&lt;WIN-G7CPHCGK247&gt;
+Nov 27 16:19:22 mail postfix/smtpd[18995]: disconnect from unknown[80.82.79.244] ehlo=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=4/5
+
+Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection rate 1/60s for (smtp:80.82.79.244) at Nov 27 16:19:22
+Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection count 1 for (smtp:80.82.79.244) at Nov 27 16:19:22
+Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max cache size 1 at Nov 27 16:19:22
+
+Nov 27 16:22:48 mail postfix/smtpd[18999]: connect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
+Nov 27 16:22:49 mail postfix/smtpd[18999]: 47NQzY13DbzNWNQG: client=mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
+Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: info: header Subject: Re: test from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]; from=&lt;ibu@cosmopool.net&gt; to=&lt;ibu@multiname.org&gt; proto=ESMTP helo=&lt;mail.cosmopool.net&gt;
+Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: message-id=&lt;d5154432-b984-d65a-30b3-38bde7e37af8@cosmopool.net&gt;
+Nov 27 16:22:49 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: from=&lt;ibu@cosmopool.net&gt;, size=1365, nrcpt=2 (queue active)
+Nov 27 16:22:49 mail postfix/smtpd[18999]: disconnect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107] ehlo=1 mail=1 rcpt=2 data=1 quit=1 commands=6
+Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=&lt;ibu2@multiname.org&gt;, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 &lt;ibu2@multiname.org&gt; nV9iJ9mi3l0+SgAAZU03Dg Saved)
+Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=&lt;ibu@multiname.org&gt;, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 &lt;ibu@multiname.org&gt; nV9iJ9mi3l0+SgAAZU03Dg:2 Saved)
+Nov 27 16:22:50 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: removed
+</code></pre>
+<ul>
+<li><a href="http://www.postfix.org/OVERVIEW.html">involved postfix components</a>
+<ul>
+<li>smtpd (port 25: smtp, port 587: submission)</li>
+<li>cleanup</li>
+<li>smtp/lmtp</li>
+</ul></li>
+<li>missing log parser</li>
+</ul>
+<h2>Idea</h2>
+<ul>
+<li>follow log stream and write summarized delivery information to a database</li>
+<li>goal: spot delivery problems, collect delivery stats</li>
+<li>a GUI could then display the current delivery status to users</li>
+</ul>
+<h2>Why Python?</h2>
+<ul>
+<li>simple and fun language, clear and concise</li>
+<li>well suited for text processing</li>
+<li>libs available for systemd, PostgreSQL</li>
+<li>huge standard library (used here: datetime, re, yaml, argparse, select)</li>
+<li>speed sufficient?</li>
+</ul>
+<h2>Development iterations</h2>
+<ul>
+<li>hmm, easy task, might take a few days</li>
+<li>PoC: reading and polling from journal works as expected</li>
+<li>used postfix logfiles in syslog format and wrote regexps matching them iteratively</li>
+<li>separated parsing messages from extracting delivery information</li>
+<li>created a delivery table</li>
+<li>hmm, this is very slow, takes hours to process log messages from a few days (from a server with not much traffic)</li>
+<li>introduced polling timeout and SQL transactions handling several messages at once</li>
+<li>... much faster</li>
+<li>looks fine, but wait... did I catch all syntax variants of Postfix log messages?</li>
+<li>looked into Postfix sources and almost got lost</li>
+<li>weeks of hard work identifying relevant log output directives</li>
+<li>completely rewrote parser to deal with the rich log msg syntax, e.g.:<br> <code>def _strip_pattern(msg, res, pattern_name, pos='l', target_names=None)</code></li>
+<li>oh, there are even more Postfix components... limit to certain Postfix configurations, in particular virtual mailboxes and not local ones</li>
+<li>mails may have multiple recipients... split delivery table into delivery_from and delivery_to</li>
+<li>decide which delivery information is relevant</li>
+<li>cleanup and polish (config mgmt, logging)</li>
+<li>write ansible role</li>
+</ul>
+<h2>Structure</h2>
+<svg viewBox="0 0 1216 400" xmlns="http://www.w3.org/2000/svg" xmlns:inkspace="http://www.inkscape.org/namespaces/inkscape" xmlns:xlink="http://www.w3.org/1999/xlink">
+  <defs id="defs_block">
+    <filter height="1.504" id="filter_blur" inkspace:collect="always" width="1.1575" x="-0.07875" y="-0.252">
+      <feGaussianBlur id="feGaussianBlur3780" inkspace:collect="always" stdDeviation="4.2" />
+    </filter>
+  </defs>
+  <title>blockdiag</title>
+  <desc>blockdiag {
+    default_fontsize = 20;
+    node_height = 80;
+    journal_since -&gt; run_loop;
+    journal_follow -&gt; run_loop;
+    logfile -&gt; run_loop;
+    run_loop -&gt; parse -&gt; extract_delivery -&gt; store;
+    store -&gt; delivery_from;
+    store -&gt; delivery_to;
+    store -&gt; noqueue;
+
+    group { label="input iterables"; journal_since; journal_follow; logfile; };
+    group { label="output tables"; delivery_from; delivery_to; noqueue; };
+}
+</desc>
+  <rect fill="rgb(243,152,0)" height="340" style="filter:url(#filter_blur)" width="144" x="56" y="30" />
+  <rect fill="rgb(243,152,0)" height="340" style="filter:url(#filter_blur)" width="144" x="1016" y="30" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="259" y="46" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="46" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="166" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="67" y="286" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="451" y="46" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="643" y="46" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="835" y="46" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="46" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="166" />
+  <rect fill="rgb(0,0,0)" height="80" stroke="rgb(0,0,0)" style="filter:url(#filter_blur);opacity:0.7;fill-opacity:1" width="128" x="1027" y="286" />
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="256" y="40" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="87" x="320.5" y="90">run_loop</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="40" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="128.0" y="79">journal_sin</text>
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="21" x="128.5" y="101">ce</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="160" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="128.0" y="199">journal_fol</text>
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="32" x="128.0" y="221">low</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="64" y="280" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="76" x="128.0" y="330">logfile</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="448" y="40" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="512.0" y="90">parse</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="640" y="40" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="704.0" y="79">extract_del</text>
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="704.0" y="101">ivery</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="832" y="40" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="54" x="896.0" y="90">store</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="40" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="1088.0" y="79">delivery_fr</text>
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="21" x="1088.5" y="101">om</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="160" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="120" x="1088.0" y="210">delivery_to</text>
+  <rect fill="rgb(255,255,255)" height="80" stroke="rgb(0,0,0)" width="128" x="1024" y="280" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="20" font-style="normal" font-weight="normal" text-anchor="middle" textLength="76" x="1088.0" y="330">noqueue</text>
+  <path d="M 384 80 L 440 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="447,80 440,76 440,84 447,80" stroke="rgb(0,0,0)" />
+  <path d="M 576 80 L 632 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="639,80 632,76 632,84 639,80" stroke="rgb(0,0,0)" />
+  <path d="M 768 80 L 824 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="831,80 824,76 824,84 831,80" stroke="rgb(0,0,0)" />
+  <path d="M 960 80 L 1016 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="1023,80 1016,76 1016,84 1023,80" stroke="rgb(0,0,0)" />
+  <path d="M 960 80 L 992 80" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 992 80 L 992 200" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 992 200 L 1016 200" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="1023,200 1016,196 1016,204 1023,200" stroke="rgb(0,0,0)" />
+  <path d="M 960 80 L 992 80" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 992 80 L 992 320" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 992 320 L 1016 320" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="1023,320 1016,316 1016,324 1023,320" stroke="rgb(0,0,0)" />
+  <path d="M 192 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
+  <path d="M 192 200 L 240 200" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 240 200 L 240 80" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 240 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
+  <path d="M 192 320 L 240 320" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 240 320 L 240 80" fill="none" stroke="rgb(0,0,0)" />
+  <path d="M 240 80 L 248 80" fill="none" stroke="rgb(0,0,0)" />
+  <polygon fill="rgb(0,0,0)" points="255,80 248,76 248,84 255,80" stroke="rgb(0,0,0)" />
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="16" font-style="normal" font-weight="normal" text-anchor="middle" textLength="122" x="128.0" y="38">input iter ...</text>
+  <text fill="rgb(0,0,0)" font-family="sans-serif" font-size="16" font-style="normal" font-weight="normal" text-anchor="middle" textLength="113" x="1088.5" y="38">output tables</text>
+</svg>
+<h2>Iterables</h2>
+<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">def</span> iter_journal_messages_since(timestamp: Union[<span class="bu">int</span>, <span class="bu">float</span>]):</a>
+<a class="sourceLine" id="cb3-2" title="2">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb3-3" title="3"><span class="co">    Yield False and message details from the journal since *timestamp*.</span></a>
+<a class="sourceLine" id="cb3-4" title="4"></a>
+<a class="sourceLine" id="cb3-5" title="5"><span class="co">    This is the loading phase (loading messages that already existed</span></a>
+<a class="sourceLine" id="cb3-6" title="6"><span class="co">    when we start).</span></a>
+<a class="sourceLine" id="cb3-7" title="7"></a>
+<a class="sourceLine" id="cb3-8" title="8"><span class="co">    Argument *timestamp* is a UNIX timestamp.</span></a>
+<a class="sourceLine" id="cb3-9" title="9"></a>
+<a class="sourceLine" id="cb3-10" title="10"><span class="co">    Only journal entries for systemd unit UNITNAME with loglevel</span></a>
+<a class="sourceLine" id="cb3-11" title="11"><span class="co">    INFO and above are retrieved.</span></a>
+<a class="sourceLine" id="cb3-12" title="12"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb3-13" title="13">    ...</a>
+<a class="sourceLine" id="cb3-14" title="14"></a>
+<a class="sourceLine" id="cb3-15" title="15"><span class="kw">def</span> iter_journal_messages_follow(timestamp: Union[<span class="bu">int</span>, <span class="bu">float</span>]):</a>
+<a class="sourceLine" id="cb3-16" title="16">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb3-17" title="17"><span class="co">    Yield commit and message details from the journal through polling.</span></a>
+<a class="sourceLine" id="cb3-18" title="18"></a>
+<a class="sourceLine" id="cb3-19" title="19"><span class="co">    This is the polling phase (after we have read pre-existing messages</span></a>
+<a class="sourceLine" id="cb3-20" title="20"><span class="co">    in the loading phase).</span></a>
+<a class="sourceLine" id="cb3-21" title="21"></a>
+<a class="sourceLine" id="cb3-22" title="22"><span class="co">    Argument *timestamp* is a UNIX timestamp.</span></a>
+<a class="sourceLine" id="cb3-23" title="23"></a>
+<a class="sourceLine" id="cb3-24" title="24"><span class="co">    Only journal entries for systemd unit UNITNAME with loglevel</span></a>
+<a class="sourceLine" id="cb3-25" title="25"><span class="co">    INFO and above are retrieved.</span></a>
+<a class="sourceLine" id="cb3-26" title="26"></a>
+<a class="sourceLine" id="cb3-27" title="27"><span class="co">    *commit* (bool) tells whether it is time to store the delivery</span></a>
+<a class="sourceLine" id="cb3-28" title="28"><span class="co">    information obtained from the messages yielded by us.</span></a>
+<a class="sourceLine" id="cb3-29" title="29"><span class="co">    It is set to True if max_delay_before_commit has elapsed.</span></a>
+<a class="sourceLine" id="cb3-30" title="30"><span class="co">    After this delay delivery information will be written; to be exact:</span></a>
+<a class="sourceLine" id="cb3-31" title="31"><span class="co">    the delay may increase by up to one journal_poll_interval.</span></a>
+<a class="sourceLine" id="cb3-32" title="32"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb3-33" title="33">    ...</a>
+<a class="sourceLine" id="cb3-34" title="34"></a>
+<a class="sourceLine" id="cb3-35" title="35"><span class="kw">def</span> iter_logfile_messages(filepath: <span class="bu">str</span>, year: <span class="bu">int</span>,</a>
+<a class="sourceLine" id="cb3-36" title="36">                          commit_after_lines<span class="op">=</span>max_messages_per_commit):</a>
+<a class="sourceLine" id="cb3-37" title="37">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb3-38" title="38"><span class="co">    Yield messages and a commit flag from a logfile.</span></a>
+<a class="sourceLine" id="cb3-39" title="39"></a>
+<a class="sourceLine" id="cb3-40" title="40"><span class="co">    Loop through all lines of the file with given *filepath* and</span></a>
+<a class="sourceLine" id="cb3-41" title="41"><span class="co">    extract the time and log message. If the log message starts</span></a>
+<a class="sourceLine" id="cb3-42" title="42"><span class="co">    with &#39;postfix/&#39;, then extract the syslog_identifier, pid and</span></a>
+<a class="sourceLine" id="cb3-43" title="43"><span class="co">    message text.</span></a>
+<a class="sourceLine" id="cb3-44" title="44"></a>
+<a class="sourceLine" id="cb3-45" title="45"><span class="co">    Since syslog lines do not contain the year, the *year* to which</span></a>
+<a class="sourceLine" id="cb3-46" title="46"><span class="co">    the first log line belongs must be given.</span></a>
+<a class="sourceLine" id="cb3-47" title="47"></a>
+<a class="sourceLine" id="cb3-48" title="48"><span class="co">    Return a commit flag and a dict with these keys:</span></a>
+<a class="sourceLine" id="cb3-49" title="49"><span class="co">        &#39;t&#39;: timestamp</span></a>
+<a class="sourceLine" id="cb3-50" title="50"><span class="co">        &#39;message&#39;: message text</span></a>
+<a class="sourceLine" id="cb3-51" title="51"><span class="co">        &#39;identifier&#39;: syslog identifier (e.g., &#39;postfix/smtpd&#39;)</span></a>
+<a class="sourceLine" id="cb3-52" title="52"><span class="co">        &#39;pid&#39;: process id</span></a>
+<a class="sourceLine" id="cb3-53" title="53"></a>
+<a class="sourceLine" id="cb3-54" title="54"><span class="co">    The commit flag will be set to True for every</span></a>
+<a class="sourceLine" id="cb3-55" title="55"><span class="co">    (commit_after_lines)-th filtered message and serves</span></a>
+<a class="sourceLine" id="cb3-56" title="56"><span class="co">    as a signal to the caller to commit this chunk of data</span></a>
+<a class="sourceLine" id="cb3-57" title="57"><span class="co">    to the database.</span></a>
+<a class="sourceLine" id="cb3-58" title="58"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb3-59" title="59">    ...</a></code></pre></div>
+<h2>Running loops</h2>
+<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">def</span> run(dsn, verp_marker<span class="op">=</span><span class="va">False</span>, filepath<span class="op">=</span><span class="va">None</span>, year<span class="op">=</span><span class="va">None</span>, debug<span class="op">=</span>[]):</a>
+<a class="sourceLine" id="cb4-2" title="2">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb4-3" title="3"><span class="co">    Determine loop(s) and run them within a database context.</span></a>
+<a class="sourceLine" id="cb4-4" title="4"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb4-5" title="5">    init(verp_marker<span class="op">=</span>verp_marker)</a>
+<a class="sourceLine" id="cb4-6" title="6">    <span class="cf">with</span> psycopg2.<span class="ex">connect</span>(dsn) <span class="im">as</span> conn:</a>
+<a class="sourceLine" id="cb4-7" title="7">        <span class="cf">with</span> conn.cursor(cursor_factory<span class="op">=</span>psycopg2.extras.RealDictCursor) <span class="im">as</span> curs:</a>
+<a class="sourceLine" id="cb4-8" title="8">            <span class="cf">if</span> filepath:</a>
+<a class="sourceLine" id="cb4-9" title="9">                run_loop(iter_logfile_messages(filepath, year), curs, debug<span class="op">=</span>debug)</a>
+<a class="sourceLine" id="cb4-10" title="10">            <span class="cf">else</span>:</a>
+<a class="sourceLine" id="cb4-11" title="11">                begin_timestamp <span class="op">=</span> get_latest_timestamp(curs)</a>
+<a class="sourceLine" id="cb4-12" title="12">                run_loop(iter_journal_messages_since(begin_timestamp), curs, debug<span class="op">=</span>debug)</a>
+<a class="sourceLine" id="cb4-13" title="13">                begin_timestamp <span class="op">=</span> get_latest_timestamp(curs)</a>
+<a class="sourceLine" id="cb4-14" title="14">                run_loop(iter_journal_messages_follow(begin_timestamp), curs, debug<span class="op">=</span>debug)</a>
+<a class="sourceLine" id="cb4-15" title="15"></a>
+<a class="sourceLine" id="cb4-16" title="16"><span class="kw">def</span> run_loop(iterable, curs, debug<span class="op">=</span>[]):</a>
+<a class="sourceLine" id="cb4-17" title="17">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb4-18" title="18"><span class="co">    Loop over log messages obtained from *iterable*.</span></a>
+<a class="sourceLine" id="cb4-19" title="19"></a>
+<a class="sourceLine" id="cb4-20" title="20"><span class="co">    Parse the message, extract delivery information from it and store</span></a>
+<a class="sourceLine" id="cb4-21" title="21"><span class="co">    that delivery information.</span></a>
+<a class="sourceLine" id="cb4-22" title="22"></a>
+<a class="sourceLine" id="cb4-23" title="23"><span class="co">    For performance reasons delivery items are collected in a cache</span></a>
+<a class="sourceLine" id="cb4-24" title="24"><span class="co">    before writing them (i.e., committing a database transaction).</span></a>
+<a class="sourceLine" id="cb4-25" title="25"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb4-26" title="26">    cache <span class="op">=</span> []</a>
+<a class="sourceLine" id="cb4-27" title="27">    msg_count <span class="op">=</span> max_messages_per_commit</a>
+<a class="sourceLine" id="cb4-28" title="28">    <span class="cf">for</span> commit, msg_details <span class="kw">in</span> iterable:</a>
+<a class="sourceLine" id="cb4-29" title="29">        ...</a></code></pre></div>
+<h2>Parsing</h2>
+<p>Parse what you can. (But only msg_info in Postfix, and only relevant components.)</p>
+<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">def</span> parse(msg_details, debug<span class="op">=</span><span class="va">False</span>):</a>
+<a class="sourceLine" id="cb5-2" title="2">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb5-3" title="3"><span class="co">    Parse a log message returning a dict.</span></a>
+<a class="sourceLine" id="cb5-4" title="4"></a>
+<a class="sourceLine" id="cb5-5" title="5"><span class="co">    *msg_details* is assumed to be a dict with these keys:</span></a>
+<a class="sourceLine" id="cb5-6" title="6"></a>
+<a class="sourceLine" id="cb5-7" title="7"><span class="co">      * &#39;identifier&#39; (syslog identifier),</span></a>
+<a class="sourceLine" id="cb5-8" title="8"><span class="co">      * &#39;pid&#39; (process id),</span></a>
+<a class="sourceLine" id="cb5-9" title="9"><span class="co">      * &#39;message&#39; (message text)</span></a>
+<a class="sourceLine" id="cb5-10" title="10"></a>
+<a class="sourceLine" id="cb5-11" title="11"><span class="co">    The syslog identifier and process id are copied to the resulting dict.</span></a>
+<a class="sourceLine" id="cb5-12" title="12"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb5-13" title="13">    ...</a>
+<a class="sourceLine" id="cb5-14" title="14"></a>
+<a class="sourceLine" id="cb5-15" title="15"><span class="kw">def</span> _parse_branch(comp, msg, res):</a>
+<a class="sourceLine" id="cb5-16" title="16">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb5-17" title="17"><span class="co">    Parse a log message string *msg*, adding results to dict *res*.</span></a>
+<a class="sourceLine" id="cb5-18" title="18"></a>
+<a class="sourceLine" id="cb5-19" title="19"><span class="co">    Depending on the component *comp* we branch to functions</span></a>
+<a class="sourceLine" id="cb5-20" title="20"><span class="co">    named _parse_{comp}.</span></a>
+<a class="sourceLine" id="cb5-21" title="21"></a>
+<a class="sourceLine" id="cb5-22" title="22"><span class="co">    Add parsing results to dict *res*. Always add key &#39;action&#39;.</span></a>
+<a class="sourceLine" id="cb5-23" title="23"><span class="co">    Try to parse every syntactical element.</span></a>
+<a class="sourceLine" id="cb5-24" title="24"><span class="co">    Note: We parse what we can. Assessment of parsing results relevant</span></a>
+<a class="sourceLine" id="cb5-25" title="25"><span class="co">    for delivery is done in :func:`extract_delivery`.</span></a>
+<a class="sourceLine" id="cb5-26" title="26"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb5-27" title="27">    ...</a></code></pre></div>
+<h2>Extracting</h2>
+<p>Extract what is relevant.</p>
+<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb6-1" title="1"><span class="kw">def</span> extract_delivery(msg_details, parsed):</a>
+<a class="sourceLine" id="cb6-2" title="2">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb6-3" title="3"><span class="co">    Compute delivery information from parsing results.</span></a>
+<a class="sourceLine" id="cb6-4" title="4"></a>
+<a class="sourceLine" id="cb6-5" title="5"><span class="co">    Basically this means that we map the parsed fields to</span></a>
+<a class="sourceLine" id="cb6-6" title="6"><span class="co">    a type (&#39;from&#39; or &#39;to&#39;) and to the database</span></a>
+<a class="sourceLine" id="cb6-7" title="7"><span class="co">    fields for table &#39;delivery_from&#39; or &#39;delivery_to&#39;.</span></a>
+<a class="sourceLine" id="cb6-8" title="8"></a>
+<a class="sourceLine" id="cb6-9" title="9"><span class="co">    We branch to functions _extract_{comp} where comp is the</span></a>
+<a class="sourceLine" id="cb6-10" title="10"><span class="co">    name of a Postfix component.</span></a>
+<a class="sourceLine" id="cb6-11" title="11"></a>
+<a class="sourceLine" id="cb6-12" title="12"><span class="co">    Return a list of error strings and a dict with the</span></a>
+<a class="sourceLine" id="cb6-13" title="13"><span class="co">    extracted information. Keys with None values are removed</span></a>
+<a class="sourceLine" id="cb6-14" title="14"><span class="co">    from the dict.</span></a>
+<a class="sourceLine" id="cb6-15" title="15"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb6-16" title="16">    ...</a></code></pre></div>
+<h2>Regular expressions</h2>
+<ul>
+<li><p>see sources</p></li>
+<li><p><a href="https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression">Stackoverflow: How to validate an email address</a> <a href="https://i.stack.imgur.com/YI6KR.png">FSM</a></p></li>
+</ul>
+<h3>BTW: <a href="https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr">email.utils.parseaddr</a></h3>
+<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb7-1" title="1"><span class="op">&gt;&gt;&gt;</span> <span class="im">from</span> email.utils <span class="im">import</span> parseaddr</a>
+<a class="sourceLine" id="cb7-2" title="2"><span class="op">&gt;&gt;&gt;</span> parseaddr(<span class="st">&#39;Ghost &lt;&quot;hello@nowhere&quot;@pyug.at&gt;&#39;</span>)</a>
+<a class="sourceLine" id="cb7-3" title="3">(<span class="st">&#39;Ghost&#39;</span>, <span class="st">&#39;&quot;hello@nowhere&quot;@pyug.at&#39;</span>)</a>
+<a class="sourceLine" id="cb7-4" title="4"><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(parseaddr(<span class="st">&#39;&quot;more</span><span class="ch">\&quot;</span><span class="st">fun</span><span class="ch">\&quot;\\</span><span class="st">&quot;hello</span><span class="ch">\\</span><span class="st">&quot;@nowhere&quot;@pyug.at&#39;</span>)[<span class="dv">1</span>])</a>
+<a class="sourceLine" id="cb7-5" title="5"><span class="co">&quot;more&quot;</span>fun<span class="st">&quot;</span><span class="ch">\&quot;</span><span class="st">hello</span><span class="ch">\&quot;</span><span class="st">@nowhere&quot;</span><span class="op">@</span>pyug.at</a>
+<a class="sourceLine" id="cb7-6" title="6"><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(parseaddr(<span class="st">&#39;&quot;&quot;@pyug.at&#39;</span>)[<span class="dv">1</span>])</a>
+<a class="sourceLine" id="cb7-7" title="7"><span class="co">&quot;&quot;</span><span class="op">@</span>pyug.at</a></code></pre></div>
+<h2>Storing</h2>
+<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb8-1" title="1"><span class="kw">def</span> store_deliveries(cursor, cache, debug<span class="op">=</span>[]):</a>
+<a class="sourceLine" id="cb8-2" title="2">    <span class="co">&quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb8-3" title="3"><span class="co">    Store cached delivery information into the database.</span></a>
+<a class="sourceLine" id="cb8-4" title="4"></a>
+<a class="sourceLine" id="cb8-5" title="5"><span class="co">    Find queue_ids in *cache* and group delivery items by</span></a>
+<a class="sourceLine" id="cb8-6" title="6"><span class="co">    them, but separately for delivery types &#39;from&#39; and &#39;to&#39;.</span></a>
+<a class="sourceLine" id="cb8-7" title="7"><span class="co">    In addition, collect delivery items with queue_id is None.</span></a>
+<a class="sourceLine" id="cb8-8" title="8"></a>
+<a class="sourceLine" id="cb8-9" title="9"><span class="co">    After grouping we merge all items withing a group into a</span></a>
+<a class="sourceLine" id="cb8-10" title="10"><span class="co">    single item. So we can combine several SQL queries into </span></a>
+<a class="sourceLine" id="cb8-11" title="11"><span class="co">    a single one, which improves performance significantly.</span></a>
+<a class="sourceLine" id="cb8-12" title="12"></a>
+<a class="sourceLine" id="cb8-13" title="13"><span class="co">    Then store the merged items and the deliveries with</span></a>
+<a class="sourceLine" id="cb8-14" title="14"><span class="co">    queue_id is None.</span></a>
+<a class="sourceLine" id="cb8-15" title="15"><span class="co">    &quot;&quot;&quot;</span></a>
+<a class="sourceLine" id="cb8-16" title="16">    ...</a></code></pre></div>
+<p>Database schema: 3 tables:</p>
+<ul>
+<li>delivery_from: smtpd, milters, qmgr</li>
+<li>delivery_to: smtp, virtual, bounce, error</li>
+<li>noqueue: rejected by smtpd before even getting a queue_id</li>
+</ul>
+<p>Table noqueue contains all the spam; for this we only use SQL INSERT, no ON CONFLICT ... UPDATE; it's faster.</p>
+<h2>Demo</h2>
+<pre><code>...
+</code></pre>
+<h2>Questions / Suggestions</h2>
+<ul>
+<li>Could you enhance speed by using prepared statements?</li>
+<li>Will old data be deleted (as required by GDPR)?</li>
+</ul>
+<p>Both were implemented after the talk.</p>
--- a/journal-postfix-doc/20191127_pyugat_talk.md
+++ b/journal-postfix-doc/20191127_pyugat_talk.md
@ -0,0 +1,340 @@
+# journal-postfix - A log parser for Postfix
+
+Experiences from applying Python to the domain of bad old email.
+
+## Email ✉
+
+  * old technology (starting in the 70ies)
+  * [store-and-forward](https://en.wikipedia.org/wiki/Store_and_forward): sent != delivered to recipient
+  * non-delivery reasons:
+    * recipient over quota
+    * inexistent destination
+    * malware
+    * spam
+    * server problem
+    * ...
+  * permanent / non-permanent failure ([DSN ~ 5.X.Y / 4.X.Y](https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml))
+  * non-delivery modes
+    * immediate reject on SMTP level
+    * delayed [bounce messages](https://en.wikipedia.org/wiki/Bounce_message) by [reporting MTA](https://upload.wikimedia.org/wikipedia/commons/a/a2/Bounce-DSN-MTA-names.png) - queueing (e.g., ~5d) before delivery failure notification
+    * discarding
+  * read receipts
+  * [Wikipedia: email tracking](https://en.wikipedia.org/wiki/Email_tracking)
+
+## [SMTP](https://en.wikipedia.org/wiki/SMTP)
+
+[SMTP session example](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example):
+envelope sender, envelope recipient may differ from From:, To:
+
+Lists of error codes:
+
+  * [SMTP and ESMTP](https://www.inmotionhosting.com/support/email/email-troubleshooting/smtp-and-esmtp-error-code-list)
+  * [SMTP](https://serversmtp.com/smtp-error/)
+  * [SMTP](https://info.webtoolhub.com/kb-a15-smtp-status-codes-smtp-error-codes-smtp-reply-codes.aspx)
+
+Example of an error within a bounced email (Subject: Mail delivery failed: returning message to sender)
+
+    SMTP error from remote server for TEXT command, host: smtpin.rzone.de (81.169.145.97) reason: 550 5.7.1 Refused by local policy. No SPAM please!
+
+  * email users are continually asking for the fate of their emails (or those of their correspondents which should have arrived)
+
+## [Postfix](http://www.postfix.org)
+
+  * popular [MTA](https://en.wikipedia.org/wiki/Message_transfer_agent)
+  * written in C
+  * logging to files / journald
+  * example log messages for a (non-)delivery + stats
+
+```
+Nov 27 16:19:22 mail postfix/smtpd[18995]: connect from unknown[80.82.79.244]
+Nov 27 16:19:22 mail postfix/smtpd[18995]: NOQUEUE: reject: RCPT from unknown[80.82.79.244]: 454 4.7.1 <spameri@tiscali.it>: Relay access denied; from=<spameri@tiscali.it> to=<spameri@tiscali.it> proto=ESMTP helo=<WIN-G7CPHCGK247>
+Nov 27 16:19:22 mail postfix/smtpd[18995]: disconnect from unknown[80.82.79.244] ehlo=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=4/5
+
+Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection rate 1/60s for (smtp:80.82.79.244) at Nov 27 16:19:22
+Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max connection count 1 for (smtp:80.82.79.244) at Nov 27 16:19:22
+Nov 27 16:22:43 mail postfix/anvil[18997]: statistics: max cache size 1 at Nov 27 16:19:22
+
+Nov 27 16:22:48 mail postfix/smtpd[18999]: connect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
+Nov 27 16:22:49 mail postfix/smtpd[18999]: 47NQzY13DbzNWNQG: client=mail.cosmopool.net[2a01:4f8:160:20c1::10:107]
+Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: info: header Subject: Re: test from mail.cosmopool.net[2a01:4f8:160:20c1::10:107]; from=<ibu@cosmopool.net> to=<ibu@multiname.org> proto=ESMTP helo=<mail.cosmopool.net>
+Nov 27 16:22:49 mail postfix/cleanup[19003]: 47NQzY13DbzNWNQG: message-id=<d5154432-b984-d65a-30b3-38bde7e37af8@cosmopool.net>
+Nov 27 16:22:49 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: from=<ibu@cosmopool.net>, size=1365, nrcpt=2 (queue active)
+Nov 27 16:22:49 mail postfix/smtpd[18999]: disconnect from mail.cosmopool.net[2a01:4f8:160:20c1::10:107] ehlo=1 mail=1 rcpt=2 data=1 quit=1 commands=6
+Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=<ibu2@multiname.org>, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 <ibu2@multiname.org> nV9iJ9mi3l0+SgAAZU03Dg Saved)
+Nov 27 16:22:50 mail postfix/lmtp[19005]: 47NQzY13DbzNWNQG: to=<ibu@multiname.org>, relay=mail.multiname.org[private/dovecot-lmtp], delay=1.2, delays=0.56/0.01/0.01/0.63, dsn=2.0.0, status=sent (250 2.0.0 <ibu@multiname.org> nV9iJ9mi3l0+SgAAZU03Dg:2 Saved)
+Nov 27 16:22:50 mail postfix/qmgr[29349]: 47NQzY13DbzNWNQG: removed
+```
+
+  * [involved postfix components](http://www.postfix.org/OVERVIEW.html)
+    * smtpd (port 25: smtp, port 587: submission)
+    * cleanup
+    * smtp/lmtp
+  * missing log parser
+
+## Idea
+
+  * follow log stream and write summarized delivery information to a database
+  * goal: spot delivery problems, collect delivery stats
+  * a GUI could then display the current delivery status to users
+
+## Why Python?
+
+  * simple and fun language, clear and concise
+  * well suited for text processing
+  * libs available for systemd, PostgreSQL
+  * huge standard library (used here: datetime, re, yaml, argparse, select)
+  * speed sufficient?
+
+## Development iterations
+
+  * hmm, easy task, might take a few days
+  * PoC: reading and polling from journal works as expected
+  * used postfix logfiles in syslog format and wrote regexps matching them iteratively
+  * separated parsing messages from extracting delivery information
+  * created a delivery table
+  * hmm, this is very slow, takes hours to process log messages from a few days (from a server with not much traffic)
+  * introduced polling timeout and SQL transactions handling several messages at once
+  * ... much faster
+  * looks fine, but wait... did I catch all syntax variants of Postfix log messages?
+  * looked into Postfix sources and almost got lost
+  * weeks of hard work identifying relevant log output directives
+  * completely rewrote parser to deal with the rich log msg syntax, e.g.:<br>
+    `def _strip_pattern(msg, res, pattern_name, pos='l', target_names=None)`
+  * oh, there are even more Postfix components... limit to certain Postfix configurations, in particular virtual mailboxes and not local ones
+  * mails may have multiple recipients... split delivery table into delivery_from and delivery_to
+  * decide which delivery information is relevant
+  * cleanup and polish (config mgmt, logging)
+  * write ansible role
+
+## Structure
+
+```blockdiag
+blockdiag {
+    default_fontsize = 20;
+    node_height = 80;
+    journal_since -> run_loop;
+    journal_follow -> run_loop;
+    logfile -> run_loop;
+    run_loop -> parse -> extract_delivery -> store;
+    store -> delivery_from;
+    store -> delivery_to;
+    store -> noqueue;
+
+    group { label="input iterables"; journal_since; journal_follow; logfile; };
+    group { label="output tables"; delivery_from; delivery_to; noqueue; };
+}
+```
+
+## Iterables
+
+```python
+def iter_journal_messages_since(timestamp: Union[int, float]):
+    """
+    Yield False and message details from the journal since *timestamp*.
+
+    This is the loading phase (loading messages that already existed
+    when we start).
+
+    Argument *timestamp* is a UNIX timestamp.
+
+    Only journal entries for systemd unit UNITNAME with loglevel
+    INFO and above are retrieved.
+    """
+    ...
+
+def iter_journal_messages_follow(timestamp: Union[int, float]):
+    """
+    Yield commit and message details from the journal through polling.
+
+    This is the polling phase (after we have read pre-existing messages
+    in the loading phase).
+
+    Argument *timestamp* is a UNIX timestamp.
+
+    Only journal entries for systemd unit UNITNAME with loglevel
+    INFO and above are retrieved.
+
+    *commit* (bool) tells whether it is time to store the delivery
+    information obtained from the messages yielded by us.
+    It is set to True if max_delay_before_commit has elapsed.
+    After this delay delivery information will be written; to be exact:
+    the delay may increase by up to one journal_poll_interval.
+    """
+    ...
+
+def iter_logfile_messages(filepath: str, year: int,
+                          commit_after_lines=max_messages_per_commit):
+    """
+    Yield messages and a commit flag from a logfile.
+
+    Loop through all lines of the file with given *filepath* and
+    extract the time and log message. If the log message starts
+    with 'postfix/', then extract the syslog_identifier, pid and
+    message text.
+
+    Since syslog lines do not contain the year, the *year* to which
+    the first log line belongs must be given.
+
+    Return a commit flag and a dict with these keys:
+        't': timestamp
+        'message': message text
+        'identifier': syslog identifier (e.g., 'postfix/smtpd')
+        'pid': process id
+
+    The commit flag will be set to True for every
+    (commit_after_lines)-th filtered message and serves
+    as a signal to the caller to commit this chunk of data
+    to the database.
+    """
+    ...
+```
+
+## Running loops
+
+```python
+def run(dsn, verp_marker=False, filepath=None, year=None, debug=[]):
+    """
+    Determine loop(s) and run them within a database context.
+    """
+    init(verp_marker=verp_marker)
+    with psycopg2.connect(dsn) as conn:
+        with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as curs:
+            if filepath:
+                run_loop(iter_logfile_messages(filepath, year), curs, debug=debug)
+            else:
+                begin_timestamp = get_latest_timestamp(curs)
+                run_loop(iter_journal_messages_since(begin_timestamp), curs, debug=debug)
+                begin_timestamp = get_latest_timestamp(curs)
+                run_loop(iter_journal_messages_follow(begin_timestamp), curs, debug=debug)
+
+def run_loop(iterable, curs, debug=[]):
+    """
+    Loop over log messages obtained from *iterable*.
+
+    Parse the message, extract delivery information from it and store
+    that delivery information.
+
+    For performance reasons delivery items are collected in a cache
+    before writing them (i.e., committing a database transaction).
+    """
+    cache = []
+    msg_count = max_messages_per_commit
+    for commit, msg_details in iterable:
+        ...
+```
+
+## Parsing
+
+Parse what you can. (But only msg_info in Postfix, and only relevant components.)
+
+```python
+def parse(msg_details, debug=False):
+    """
+    Parse a log message returning a dict.
+
+    *msg_details* is assumed to be a dict with these keys:
+
+      * 'identifier' (syslog identifier),
+      * 'pid' (process id),
+      * 'message' (message text)
+
+    The syslog identifier and process id are copied to the resulting dict.
+    """
+    ...
+
+def _parse_branch(comp, msg, res):
+    """
+    Parse a log message string *msg*, adding results to dict *res*.
+
+    Depending on the component *comp* we branch to functions
+    named _parse_{comp}.
+
+    Add parsing results to dict *res*. Always add key 'action'.
+    Try to parse every syntactical element.
+    Note: We parse what we can. Assessment of parsing results relevant
+    for delivery is done in :func:`extract_delivery`.
+    """
+    ...
+```
+
+## Extracting
+
+Extract what is relevant.
+
+```python
+def extract_delivery(msg_details, parsed):
+    """
+    Compute delivery information from parsing results.
+
+    Basically this means that we map the parsed fields to
+    a type ('from' or 'to') and to the database
+    fields for table 'delivery_from' or 'delivery_to'.
+
+    We branch to functions _extract_{comp} where comp is the
+    name of a Postfix component.
+
+    Return a list of error strings and a dict with the
+    extracted information. Keys with None values are removed
+    from the dict.
+    """
+    ...
+```
+
+## Regular expressions
+
+  * see sources
+
+  * [Stackoverflow: How to validate an email address](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) [FSM](https://i.stack.imgur.com/YI6KR.png)
+
+### BTW: [email.utils.parseaddr](https://docs.python.org/3/library/email.utils.html#email.utils.parseaddr)
+
+```python
+>>> from email.utils import parseaddr
+>>> parseaddr('Ghost <"hello@nowhere"@pyug.at>')
+('Ghost', '"hello@nowhere"@pyug.at')
+>>> print(parseaddr('"more\"fun\"\\"hello\\"@nowhere"@pyug.at')[1])
+"more"fun"\"hello\"@nowhere"@pyug.at
+>>> print(parseaddr('""@pyug.at')[1])
+""@pyug.at
+```
+
+## Storing
+
+```python
+def store_deliveries(cursor, cache, debug=[]):
+    """
+    Store cached delivery information into the database.
+
+    Find queue_ids in *cache* and group delivery items by
+    them, but separately for delivery types 'from' and 'to'.
+    In addition, collect delivery items with queue_id is None.
+
+    After grouping we merge all items withing a group into a
+    single item. So we can combine several SQL queries into 
+    a single one, which improves performance significantly.
+
+    Then store the merged items and the deliveries with
+    queue_id is None.
+    """
+    ...
+```
+
+
+Database schema: 3 tables:
+
+  * delivery_from: smtpd, milters, qmgr
+  * delivery_to: smtp, virtual, bounce, error
+  * noqueue: rejected by smtpd before even getting a queue_id
+
+Table noqueue contains all the spam; for this we only use SQL INSERT, no ON CONFLICT ... UPDATE; it's faster.
+
+## Demo
+
+    ...
+
+## Questions / Suggestions
+
+  * Could you enhance speed by using prepared statements?
+  * Will old data be deleted (as required by GDPR)?
+
+Both were implemented after the talk.
--- a/journal-postfix.yml
+++ b/journal-postfix.yml
@ -0,0 +1,34 @@
+# Deploy journal-postfix
+
+# This will install a service that writes mail delivery information
+# obtained from systemd-journal (unit postfix@-.service) to a
+# PostgreSQL database.
+#
+# You can configure the database connection parameters (and optionally
+# a verp_marker) as host vars like this:
+#
+# mailserver:
+#   postgresql:
+#     host: 127.0.0.1
+#     port: 5432
+#     dbname: mailserver
+#     username: mailserver
+#     password: !vault |
+#         $ANSIBLE_VAULT;1.1;AES256
+#         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+#         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+#         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+#         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+#         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+#   postfix:
+#     verp_marker: rstxyz
+#
+# If you do not, then you must edit /etc/journal-postfix/main.yml
+# on the destination hosts and run systemctl start journal-postfix
+# manually.
+
+- name: install journal-postfix
+  user: root
+  hosts: mail
+  roles:
+    - journal-postfix
--- a/journal-postfix/files/journal-postfix.service
+++ b/journal-postfix/files/journal-postfix.service
@ -0,0 +1,17 @@
+# this file is part of ansible role journal-postfix
+
+[Unit]
+Description=Extract postfix message delivery information from systemd journal messages\
+and store them in a PostgreSQL database. Configuration is in /etc/journal-postfix/main.yml
+After=multi-user.target
+
+[Service]
+Type=simple
+ExecStart=/srv/journal-postfix/run.py
+User=journal-postfix
+WorkingDirectory=/srv/journal-postfix/
+Restart=on-failure
+RestartPreventExitStatus=97
+
+[Install]
+WantedBy=multi-user.target
--- a/journal-postfix/files/srv/README.md
+++ b/journal-postfix/files/srv/README.md
@ -0,0 +1,85 @@
+Parse postfix entries in systemd journal and collect delivery information.
+
+The information on mail deliveries is written to tables in a PostgreSQL
+database. The database can then be queried by a UI showing delivery status
+to end users. The UI is not part of this package.
+
+This software is tailor-made for debian buster with systemd as init system.
+It is meant to run on the same system on which Postfix is running,
+or on a system receiving the log stream of a Postfix instance in its
+systemd journal.
+
+Prerequisites / Postfix configuration:
+
+  - Access to a PostgreSQL database.
+  - Postfix: Only virtual mailboxes are supported.
+  - Postfix: You can use short or long queue_ids (see
+    http://www.postfix.org/postconf.5.html#enable_long_queue_ids),
+    but since the uniqueness of short queue_ids is very limited,
+    usage of long queue_ids is *strongly recommended*.
+
+Installation:
+
+  - apt install python3-psycopg2 python3-systemd python3-yaml
+  - Edit /etc/journal-postfix/main.yml
+  - Output is written to the journal (unit journal-postfix). READ IT!
+
+Side effects (database):
+
+  - The configured database user will create the tables
+    - delivery_from
+    - delivery_to
+    - noqueue
+    in the configured database, if they do not yet exist.
+    These tables will be filled with results from parsing the journal.
+    Table noqueue contains deliveries rejected by smtpd before they
+    got a queue_id. Deliveries with queue_id are in tables delivery_from
+    and delivery_to, which are separate, because an email can have only
+    one sender, but more than one recipient. Entries in both tables are
+    related through the queue_id and the approximate date; note that
+    short queue_ids are not unique for a delivery transaction, so
+    consider changing your Postfix configuration to long queue_ids.
+  - Log output is written to journald, unit journal-postfix.
+
+Configuration:
+
+  - Edit the config file in YAML format located at
+    /etc/journal-postfix/main.conf
+
+Limitations:
+
+  - The log output of Postfix may contain messages not primarily relevant
+    for delivery, namely messages of levels panic, fatal, error, warning.
+    They are discarded.
+  - The postfix server must be configured to use virtual mailboxes;
+    deliveries to local mailboxes are ignored.
+  - Parsing is specific to a Postfix version and only version 3.4.5
+    (the version in Debian buster) is supported; it is intended to support
+    Postfix versions in future stable Debian releases.
+  - This script does not support concurrency; we assume that there is only
+    one process writing to the database tables. Thus clustered postfix
+    setups are not supported.
+
+Options:
+
+  - If you use dovecot as lmtpd, you will also get dovecot_ids upon
+    successful delivery.
+  - If you have configured Postfix to store VERP-ids of outgoing mails
+    in table 'mail_from' in the same database, then bounce emails can
+    be associated with original emails. The VERP-ids must have a certain
+    format.
+  - The subject of emails will be extracted from log messages starting
+    with "info: header Subject:". To enable these messages configure
+    Postfix like this: Enabled header_checks in main.cf (
+        header_checks = regexp:/etc/postfix/header_checks
+    ) and put this line into /etc/postfix/header_checks:
+        /^Subject:/ INFO
+  - You can also import log messages from a log file in syslog format:
+    Run this script directly from command line with options --file
+    (the path to the file to be parsed) and --year (the year of the
+    first message in this log file).
+    Note: For the name of the month to be recognized correctly, the
+    script must be run with this locale.
+    Attention: When running from the command line, log output will
+    not be sent to unit journal-postfix; use this command instead:
+    journalctl --follow SYSLOG_IDENTIFIER=python3
--- a/journal-postfix/files/srv/parser.py
+++ b/journal-postfix/files/srv/parser.py
--- a/journal-postfix/files/srv/run.py
+++ b/journal-postfix/files/srv/run.py
@ -0,0 +1,212 @@
+#!/usr/bin/env python3
+
+"""
+Main script to be run as a systemd unit or manually.
+"""
+
+import argparse
+import datetime
+import os
+import sys
+from pprint import pprint
+from typing import Iterable, List, Optional, Tuple, Union
+import psycopg2
+import psycopg2.extras
+from systemd import journal
+import settings
+from parser import init_parser, parse_entry, extract_delivery
+from sources import (
+    iter_journal_messages_since,
+    iter_journal_messages_follow,
+    iter_logfile_messages,
+)
+from storage import (
+    init_db,
+    init_session,
+    get_latest_timestamp,
+    delete_old_deliveries,
+    store_delivery_items,
+)
+
+
+exit_code_without_restart = 97
+
+
+def run(
+    dsn: str,
+    verp_marker: Optional[str] = None,
+    filepath: Optional[str] = None,
+    year: Optional[int] = None,
+    debug: List[str] = [],
+) -> None:
+    """
+    Determine loop(s) and run them within a database context.
+    """
+    init_parser(verp_marker=verp_marker)
+    with psycopg2.connect(dsn) as conn:
+        with conn.cursor(
+            cursor_factory=psycopg2.extras.RealDictCursor
+        ) as curs:
+            init_session(curs)
+            if filepath and year:
+                run_loop(
+                    iter_logfile_messages(filepath, year), curs, debug=debug
+                )
+            else:
+                begin_timestamp = get_latest_timestamp(curs)
+                run_loop(
+                    iter_journal_messages_since(begin_timestamp),
+                    curs,
+                    debug=debug,
+                )
+                begin_timestamp = get_latest_timestamp(curs)
+                run_loop(
+                    iter_journal_messages_follow(begin_timestamp),
+                    curs,
+                    debug=debug,
+                )
+
+
+def run_loop(
+    iterable: Iterable[Tuple[bool, Optional[dict]]],
+    curs: psycopg2.extras.RealDictCursor,
+    debug: List[str] = []
+) -> None:
+    """
+    Loop over log entries obtained from *iterable*.
+
+    Parse the message, extract delivery information from it and store
+    that delivery information.
+
+    For performance reasons delivery items are collected in a cache
+    before writing them (i.e., committing a database transaction).
+    """
+    cache = []
+    msg_count = settings.max_messages_per_commit
+    last_delete = None
+    for commit, msg_details in iterable:
+        parsed_entry = None
+        if msg_details:
+            parsed_entry = parse_entry(msg_details)
+            if 'all' in debug or (
+                parsed_entry and parsed_entry.get('comp') in debug
+            ):
+                print('_' * 80)
+                print('MSG_DETAILS:', msg_details)
+                print('PARSED_ENTRY:', parsed_entry)
+            if parsed_entry:
+                errors, delivery = extract_delivery(msg_details, parsed_entry)
+                if not errors and delivery:
+                    if 'all' in debug or parsed_entry.get('comp') in debug:
+                        print('DELIVERY:')
+                        pprint(delivery)
+                    # it may happen that a delivery of type 'from' has
+                    # a recipient; in this case add a second delivery
+                    # of type 'to' to the cache, but only for deliveries
+                    # with queue_id
+                    if (
+                        delivery['type'] == 'from'
+                        and 'recipient' in delivery
+                        and delivery.get('queue_id')
+                    ):
+                        delivery2 = delivery.copy()
+                        delivery2['type'] = 'to'
+                        cache.append(delivery2)
+                        del delivery['recipient']
+                    cache.append(delivery)
+                    msg_count -= 1
+                    if msg_count == 0:
+                        commit = True
+                elif errors:
+                    msg = (
+                        f'Extracting delivery from parsed entry failed: '
+                        f'errors={errors}; msg_details={msg_details}; '
+                        f'parsed_entry={parsed_entry}'
+                    )
+                    journal.send(msg, PRIORITY=journal.LOG_CRIT)
+                    if 'all' in debug or parsed_entry.get('comp') in debug:
+                        print('EXTRACTION ERRORS:', errors)
+        if commit:
+            if 'all' in debug:
+                print('.' * 40, 'committing')
+            # store cache, clear cache, reset message counter
+            store_delivery_items(curs, cache, debug=debug)
+            cache = []
+            msg_count = settings.max_messages_per_commit
+        now = datetime.datetime.utcnow()
+        if last_delete is None or last_delete < now - settings.delete_interval:
+            delete_old_deliveries(curs)
+            last_delete = now
+            if 'all' in debug:
+                print('.' * 40, 'deleting old deliveries')
+    else:
+        store_delivery_items(curs, cache, debug=debug)
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--debug',
+        help='Comma-separated list of components to be debugged; '
+        'valid component names are the Postfix components '
+        'plus "sql" plus "all".',
+    )
+    parser.add_argument(
+        '--file',
+        help='File path of a Postfix logfile in syslog '
+        'format to be parsed instead of the journal',
+    )
+    parser.add_argument(
+        '--year',
+        help='If --file is given, we need to know '
+        'the year of the first line in the logfile',
+    )
+    args = parser.parse_args()
+
+    config = settings.get_config()
+    if config:
+        # check if startup is enabled or fail
+        msg = None
+        if 'startup' not in config:
+            msg = 'Parameter "startup" is not configured.'
+        elif not config['startup']:
+            msg = 'Startup is not enabled in the config file.'
+        if msg:
+            journal.send(msg, PRIORITY=journal.LOG_CRIT)
+            sys.exit(exit_code_without_restart)
+        # check more params and call run
+        try:
+            verp_marker = config['postfix']['verp_marker']
+        except Exception:
+            verp_marker = None
+        debug: List[str] = []
+        if args.debug:
+            debug = args.debug.split(',')
+        filepath = None
+        year = None
+        if args.file:
+            filepath = args.file
+            if not args.year:
+                print(
+                    'If --file is given, we need to know the year'
+                    ' of the first line in the logfile. Please use --year.'
+                )
+                sys.exit(1)
+            else:
+                year = int(args.year)
+        dsn = init_db(config)
+        if dsn:
+            run(
+                dsn,
+                verp_marker=verp_marker,
+                filepath=filepath,
+                year=year,
+                debug=debug,
+            )
+    else:
+        print('Config invalid, see journal.')
+        sys.exit(exit_code_without_restart)
+
+
+if __name__ == '__main__':
+    main()
--- a/journal-postfix/files/srv/settings.py
+++ b/journal-postfix/files/srv/settings.py
@ -0,0 +1,125 @@
+#!/usr/bin/env python3
+
+"""
+Settings for journal-postfix.
+"""
+
+import os
+import datetime
+from typing import Union, Optional
+from systemd import journal
+from yaml import load
+
+
+main_config_file: str = '/etc/journal-postfix/main.yml'
+"""
+Filepath to the main config file.
+
+Can be overriden by environment variable JOURNAL_POSTFIX_MAIN_CONF.
+"""
+
+
+systemd_unitname: str = 'postfix@-.service'
+"""
+Name of the systemd unit running the postfix service.
+"""
+
+
+journal_poll_interval: Union[float, int] = 10.0
+"""
+Poll timeout in seconds for fetching messages from the journal.
+
+Will be overriden if set in the main config.
+
+If the poll times out, it is checked whether the last commit
+lies more than max_delay_before_commit seconds in the past;
+if so, the current database transaction will be committed.
+"""
+
+
+max_delay_before_commit: datetime.timedelta = datetime.timedelta(seconds=30)
+"""
+How much time may pass before committing a database transaction?
+
+Will be overriden if set in the main config.
+
+(The actual maximal delay can be one journal_poll_interval in addition.)
+"""
+
+
+max_messages_per_commit: int = 1000
+"""
+How many messages to cache at most before committing a database transaction?
+
+Will be overriden if set in the main config.
+"""
+
+
+delete_deliveries_after_days: int = 0
+"""
+After how many days shall deliveries be deleted from the database?
+
+A value of 0 means that data are never deleted.
+"""
+
+
+def get_config() -> Optional[dict]:
+    """
+    Load config from the main config and return it.
+
+    The default main config file path (global main_config_file)
+    can be overriden with environment variable
+    JOURNAL_POSTFIX_MAIN_CONF.
+    """
+    try:
+        filename = os.environ['JOURNAL_POSTFIX_MAIN_CONF']
+        global main_config_file
+        main_config_file = filename
+    except Exception:
+        filename = main_config_file
+    try:
+        with open(filename, 'r') as config_file:
+            config_raw = config_file.read()
+    except Exception:
+        msg = f'ERROR: cannot read config file {filename}'
+        journal.send(msg, PRIORITY=journal.LOG_CRIT)
+        return None
+    try:
+        config = load(config_raw)
+    except Exception as err:
+        msg = f'ERROR: invalid yaml syntax in {filename}: {err}'
+        journal.send(msg, PRIORITY=journal.LOG_CRIT)
+        return None
+    # override some global variables
+    _global_value_from_config(config['postfix'], 'systemd_unitname', str)
+    _global_value_from_config(config, 'journal_poll_interval', float)
+    _global_value_from_config(config, 'max_delay_before_commit', 'seconds')
+    _global_value_from_config(config, 'max_messages_per_commit', int)
+    _global_value_from_config(config, 'delete_deliveries_after_days', int)
+    _global_value_from_config(config, 'delete_interval', 'seconds')
+    return config
+
+
+def _global_value_from_config(
+    config, name: str, type_: Union[type, str]
+) -> None:
+    """
+    Set a global variable to the value obtained from *config*.
+
+    Also cast to *type_*.
+    """
+    try:
+        value = config.get(name)
+        if type_ == 'seconds':
+            value = datetime.timedelta(seconds=float(value))
+        else:
+            value = type_(value)  # type: ignore
+        globals()[name] = value
+    except Exception:
+        if value is not None:
+            msg = f'ERROR: configured value of {name} is invalid.'
+            journal.send(msg, PRIORITY=journal.LOG_ERR)
+
+
+if __name__ == '__main__':
+    print(get_config())
--- a/journal-postfix/files/srv/setup.cfg
+++ b/journal-postfix/files/srv/setup.cfg
@ -0,0 +1,5 @@
+[pycodestyle]
+max-line-length = 200
+
+[mypy]
+ignore_missing_imports = True
--- a/journal-postfix/files/srv/sources.py
+++ b/journal-postfix/files/srv/sources.py
@ -0,0 +1,178 @@
+#!/usr/bin/env python3
+
+"""
+Data sources.
+
+Note: python-systemd journal docs are at
+https://www.freedesktop.org/software/systemd/python-systemd/journal.html
+"""
+
+import datetime
+import select
+from typing import Iterable, Optional, Tuple, Union
+from systemd import journal
+import settings
+
+
+def iter_journal_messages_since(
+    timestamp: Union[int, float]
+) -> Iterable[Tuple[bool, dict]]:
+    """
+    Yield False and message details from the journal since *timestamp*.
+
+    This is the loading phase (loading messages that already existed
+    when we start).
+
+    Argument *timestamp* is a UNIX timestamp.
+
+    Only journal entries for systemd unit settings.systemd_unitname with
+    loglevel INFO and above are retrieved.
+    """
+    timestamp = float(timestamp)
+    sdj = journal.Reader()
+    sdj.log_level(journal.LOG_INFO)
+    sdj.add_match(_SYSTEMD_UNIT=settings.systemd_unitname)
+    sdj.seek_realtime(timestamp)
+    for entry in sdj:
+        yield False, _get_msg_details(entry)
+
+
+def iter_journal_messages_follow(
+    timestamp: Union[int, float]
+) -> Iterable[Tuple[bool, Optional[dict]]]:
+    """
+    Yield commit and message details from the journal through polling.
+
+    This is the polling phase (after we have read pre-existing messages
+    in the loading phase).
+
+    Argument *timestamp* is a UNIX timestamp.
+
+    Only journal entries for systemd unit settings.systemd_unitname with
+    loglevel INFO and above are retrieved.
+
+    *commit* (bool) tells whether it is time to store the delivery
+    information obtained from the messages yielded by us.
+    It is set to True if settings.max_delay_before_commit has elapsed.
+    After this delay delivery information will be written; to be exact:
+    the delay may increase by up to one settings.journal_poll_interval.
+    """
+    sdj = journal.Reader()
+    sdj.log_level(journal.LOG_INFO)
+    sdj.add_match(_SYSTEMD_UNIT=settings.systemd_unitname)
+    sdj.seek_realtime(timestamp)
+    p = select.poll()
+    p.register(sdj, sdj.get_events())
+    last_commit = datetime.datetime.utcnow()
+    interval_ms = settings.journal_poll_interval * 1000
+    while True:
+        p.poll(interval_ms)
+        commit = False
+        now = datetime.datetime.utcnow()
+        if last_commit + settings.max_delay_before_commit < now:
+            commit = True
+            last_commit = now
+        if sdj.process() == journal.APPEND:
+            for entry in sdj:
+                yield commit, _get_msg_details(entry)
+        elif commit:
+            yield commit, None
+
+
+def iter_logfile_messages(
+    filepath: str,
+    year: int,
+    commit_after_lines=settings.max_messages_per_commit,
+) -> Iterable[Tuple[bool, dict]]:
+    """
+    Yield messages and a commit flag from a logfile.
+
+    Loop through all lines of the file with given *filepath* and
+    extract the time and log message. If the log message starts
+    with 'postfix/', then extract the syslog_identifier, pid and
+    message text.
+
+    Since syslog lines do not contain the year, the *year* to which
+    the first log line belongs must be given.
+
+    Return a commit flag and a dict with these keys:
+        't': timestamp
+        'message': message text
+        'identifier': syslog identifier (e.g., 'postfix/smtpd')
+        'pid': process id
+
+    The commit flag will be set to True for every
+    (commit_after_lines)-th filtered message and serves
+    as a signal to the caller to commit this chunk of data
+    to the database.
+    """
+    dt = None
+    with open(filepath, 'r') as fh:
+        cnt = 0
+        while True:
+            line = fh.readline()
+            if not line:
+                break
+
+            # get datetime
+            timestamp = line[:15]
+            dt_prev = dt
+            dt = _parse_logfile_timestamp(timestamp, year)
+            if dt is None:
+                continue  # discard log message with invalid timestamp
+
+            # if we transgress a year boundary, then increment the year
+            if dt_prev and dt + datetime.timedelta(days=1) < dt_prev:
+                year += 1
+                dt = _parse_logfile_timestamp(timestamp, year)
+
+            # filter postfix messages
+            msg = line[21:].strip()
+            if 'postfix/' in msg:
+                cnt += 1
+                syslog_identifier, msg_ = msg.split('[', 1)
+                pid, msg__ = msg_.split(']', 1)
+                message = msg__[2:]
+                commit = cnt % commit_after_lines == 0
+                yield commit, {
+                    't': dt,
+                    'message': message,
+                    'identifier': syslog_identifier,
+                    'pid': pid,
+                }
+
+
+def _get_msg_details(journal_entry: dict) -> dict:
+    """
+    Return information extracted from a journal entry object as a dict.
+    """
+    return {
+        't': journal_entry['__REALTIME_TIMESTAMP'],
+        'message': journal_entry['MESSAGE'],
+        'identifier': journal_entry.get('SYSLOG_IDENTIFIER'),
+        'pid': journal_entry.get('SYSLOG_PID'),
+    }
+
+
+def _parse_logfile_timestamp(
+    timestamp: Optional[str],
+    year: int
+) -> Optional[datetime.datetime]:
+    """
+    Parse a given syslog *timestamp* and return a datetime.
+
+    Since the timestamp does not contain the year, it is an
+    extra argument.
+
+    Note: Successful parsing og the month's name depends on
+    the locale under which this script runs.
+    """
+    if timestamp is None:
+        return None
+    try:
+        timestamp = timestamp.replace('  ', ' ')
+        t1 = datetime.datetime.strptime(timestamp, '%b %d %H:%M:%S')
+        t2 = t1.replace(year=year)
+        return t2
+    except Exception:
+        return None
--- a/journal-postfix/files/srv/storage.py
+++ b/journal-postfix/files/srv/storage.py
@ -0,0 +1,337 @@
+#!/usr/bin/env python3
+
+"""
+Storage to PostgreSQL.
+"""
+
+import datetime
+import json
+import re
+import time
+from collections import defaultdict
+from traceback import format_exc
+from typing import Any, Dict, Iterable, List, Optional, Tuple, Union
+import psycopg2
+import psycopg2.extras
+from systemd import journal
+import settings
+from storage_setup import (
+    get_create_table_stmts,
+    get_sql_prepared_statement,
+    get_sql_execute_prepared_statement,
+    table_fields,
+)
+
+
+def get_latest_timestamp(curs: psycopg2.extras.RealDictCursor) -> int:
+    """
+    Fetch the latest timestamp from the database.
+
+    Return the latest timestamp of a message transfer from the database.
+    If there are no records yet, return 0.
+    """
+    last = 0
+    curs.execute(
+        "SELECT greatest(max(t_i), max(t_f)) AS last FROM delivery_from"
+    )
+    last1 = curs.fetchone()['last']
+    if last1:
+        last = max(
+            last, (last1 - datetime.datetime(1970, 1, 1)).total_seconds()
+        )
+    curs.execute(
+        "SELECT greatest(max(t_i), max(t_f)) AS last FROM delivery_to"
+    )
+    last2 = curs.fetchone()['last']
+    if last2:
+        last = max(
+            last, (last2 - datetime.datetime(1970, 1, 1)).total_seconds()
+        )
+    return last
+
+
+def delete_old_deliveries(curs: psycopg2.extras.RealDictCursor) -> None:
+    """
+    Delete deliveries older than the configured number of days.
+
+    See config param *delete_deliveries_after_days*.
+    """
+    max_days = settings.delete_deliveries_after_days
+    if max_days:
+        now = datetime.datetime.utcnow()
+        dt = datetime.timedelta(days=max_days)
+        t0 = now - dt
+        curs.execute("DELETE FROM delivery_from WHERE t_i < %s", (t0,))
+        curs.execute("DELETE FROM delivery_to WHERE t_i < %s", (t0,))
+        curs.execute("DELETE FROM noqueue WHERE t < %s", (t0,))
+
+
+def store_delivery_items(
+    cursor,
+    cache: List[dict],
+    debug: List[str] = []
+) -> None:
+    """
+    Store cached delivery items into the database.
+
+    Find queue_ids in *cache* and group delivery items by
+    them, but separately for delivery types 'from' and 'to'.
+    In addition, collect delivery items with queue_id is None.
+
+    After grouping we merge all items withing a group into a
+    single item. So we can combine several SQL queries into
+    a single one, which improves performance significantly.
+
+    Then store the merged items and the deliveries with
+    queue_id is None.
+    """
+    if 'all' in debug or 'sql' in debug:
+        print(f'Storing {len(cache)} messages.')
+    if not cache:
+        return
+    from_items, to_items, noqueue_items = _group_delivery_items(cache)
+    deliveries_from = _merge_delivery_items(from_items, item_type='from')
+    deliveries_to = _merge_delivery_items(to_items, item_type='to')
+    _store_deliveries(cursor, 'delivery_from', deliveries_from, debug=debug)
+    _store_deliveries(cursor, 'delivery_to', deliveries_to, debug=debug)
+    _store_deliveries(cursor, 'noqueue', noqueue_items, debug=debug)
+
+
+FromItems = Dict[str, List[dict]]
+
+
+ToItems = Dict[Tuple[str, Optional[str]], List[dict]]
+
+
+NoqueueItems = Dict[int, dict]
+
+
+def _group_delivery_items(
+    cache: List[dict]
+) -> Tuple[FromItems, ToItems, NoqueueItems]:
+    """
+    Group delivery items by type and queue_id.
+
+    Return items of type 'from', of type 'to' and items without
+    queue_id.
+    """
+    delivery_from_items: FromItems = defaultdict(list)
+    delivery_to_items: ToItems = defaultdict(list)
+    noqueue_items: NoqueueItems = {}
+    noqueue_i = 1
+    for item in cache:
+        if item.get('queue_id'):
+            queue_id = item['queue_id']
+            if item.get('type') == 'from':
+                delivery_from_items[queue_id].append(item)
+            else:
+                recipient = item.get('recipient')
+                delivery_to_items[(queue_id, recipient)].append(item)
+        else:
+            noqueue_items[noqueue_i] = item
+            noqueue_i += 1
+    return delivery_from_items, delivery_to_items, noqueue_items
+
+
+def _merge_delivery_items(
+    delivery_items: Union[FromItems, ToItems],
+    item_type: str = 'from',
+) -> Dict[Union[str, Tuple[str, Optional[str]]], dict]:
+    """
+    Compute deliveries by combining multiple delivery items.
+
+    Take lists of delivery items for each queue_id (in case
+    of item_type=='from') or for (queue_id, recipient)-pairs
+    (in case of item_type='to').
+    Each delivery item is a dict obtained from one log message.
+    The dicts are consecutively updated (merged), except for the
+    raw log messages (texts) which are collected into a list.
+    The fields of the resulting delivery are filtered according
+    to the target table.
+    Returned is a dict mapping queue_ids (in case
+    of item_type=='from') or (queue_id, recipient)-pairs
+    (in case of item_type='to') to deliveries.
+    """
+    deliveries = {}
+    for group, items in delivery_items.items():
+        delivery = {}
+        messages = []
+        for item in items:
+            message = item.pop('message')
+            identifier = item.pop('identifier')
+            pid = item.pop('pid')
+            messages.append(f'{identifier}[{pid}]: {message}')
+            delivery.update(item)
+        delivery['messages'] = messages
+        deliveries[group] = delivery
+    return deliveries
+
+
+def _store_deliveries(
+    cursor: psycopg2.extras.RealDictCursor,
+    table_name: str,
+    deliveries: Dict[Any, dict],
+    debug: List[str] = [],
+) -> None:
+    """
+    Store grouped and merged delivery items.
+    """
+    if not deliveries:
+        return
+    n = len(deliveries.values())
+    t0 = time.time()
+    cursor.execute('BEGIN')
+    _store_deliveries_batch(cursor, table_name, deliveries.values())
+    cursor.execute('COMMIT')
+    t1 = time.time()
+    if 'all' in debug or 'sql' in debug:
+        milliseconds = (t1 - t0) * 1000
+        print(
+            '*' * 10,
+            f'SQL transaction time {table_name}: '
+            f'{milliseconds:.2f} ms ({n} deliveries)',
+        )
+
+
+def _store_deliveries_batch(
+    cursor: psycopg2.extras.RealDictCursor,
+    table_name: str,
+    deliveries: Iterable[dict]
+) -> None:
+    """
+    Store *deliveries* (i.e., grouped and merged delivery items).
+
+    We use a prepared statement and execute_batch() from
+    psycopg2.extras to improve performance.
+    """
+    rows = []
+    for delivery in deliveries:
+        # get values for all fields of the table
+        field_values: List[Any] = []
+        t = delivery.get('t')
+        delivery['t_i'] = t
+        delivery['t_f'] = t
+        for field in table_fields[table_name]:
+            if field in delivery:
+                if field == 'messages':
+                    field_values.append(json.dumps(delivery[field]))
+                else:
+                    field_values.append(delivery[field])
+            else:
+                field_values.append(None)
+        rows.append(field_values)
+    sql = get_sql_execute_prepared_statement(table_name)
+    try:
+        psycopg2.extras.execute_batch(cursor, sql, rows)
+    except Exception as err:
+        msg = f'SQL statement failed: "{sql}" -- the values were: {rows}'
+        journal.send(msg, PRIORITY=journal.LOG_ERR)
+
+
+def init_db(config: dict) -> Optional[str]:
+    """
+    Initialize database; if ok return DSN, else None.
+
+    Try to get parameters for database access,
+    check existence of tables and possibly create them.
+    """
+    dsn = _get_dsn(config)
+    if dsn:
+        ok = _create_tables(dsn)
+        if not ok:
+            return None
+    return dsn
+
+
+def _get_dsn(config: dict) -> Optional[str]:
+    """
+    Return the DSN (data source name) from the *config*.
+    """
+    try:
+        postgresql_config = config['postgresql']
+        hostname = postgresql_config['hostname']
+        port = postgresql_config['port']
+        database = postgresql_config['database']
+        username = postgresql_config['username']
+        password = postgresql_config['password']
+    except Exception:
+        msg = f"""ERROR: invalid config in {settings.main_config_file}
+The config file must contain a section like this:
+
+postgresql:
+    hostname: <HOSTNAME_OR_IP>
+    port: <PORT>
+    database: <DATABASE_NAME>
+    username: <USERNAME>
+    password: <PASSWORD>
+"""
+        journal.send(msg, PRIORITY=journal.LOG_CRIT)
+        return None
+    dsn = f'host={hostname} port={port} dbname={database} '\
+          f'user={username} password={password}'
+    return dsn
+
+
+def _create_tables(dsn: str) -> bool:
+    """
+    Check existence of tables and possibly create them, returning success.
+    """
+    try:
+        with psycopg2.connect(dsn) as conn:
+            with conn.cursor() as curs:
+                for table_name, sql_stmts in get_create_table_stmts().items():
+                    ok = _create_table(curs, table_name, sql_stmts)
+                    if not ok:
+                        return False
+    except Exception:
+        journal.send(
+            f'ERROR: cannot connect to database, check params'
+            f' in {settings.main_config_file}',
+            PRIORITY=journal.LOG_CRIT,
+        )
+        return False
+    return True
+
+
+def _create_table(
+    cursor: psycopg2.extras.RealDictCursor,
+    table_name: str,
+    sql_stmts: List[str]
+) -> bool:
+    """
+    Try to create a table if it does not exist and return whether it exists.
+
+    If creation failed, emit an error to the journal.
+    """
+    cursor.execute("SELECT EXISTS(SELECT * FROM "
+                   "information_schema.tables WHERE table_name=%s)",
+                   (table_name,))
+    table_exists = cursor.fetchone()[0]
+    if not table_exists:
+        for sql_stmt in sql_stmts:
+            try:
+                cursor.execute(sql_stmt)
+            except Exception:
+                journal.send(
+                    'ERROR: database user needs privilege to create tables.\n'
+                    'Alternatively, you can create the table manually like'
+                    ' this:\n\n'
+                    + '\n'.join([sql + ';' for sql in sql_stmts]),
+                    PRIORITY=journal.LOG_CRIT,
+                )
+                return False
+    return True
+
+
+def init_session(cursor: psycopg2.extras.RealDictCursor) -> None:
+    """
+    Init a database session.
+
+    Define prepared statements.
+    """
+    stmt = get_sql_prepared_statement('delivery_from')
+    cursor.execute(stmt)
+    stmt = get_sql_prepared_statement('delivery_to')
+    cursor.execute(stmt)
+    stmt = get_sql_prepared_statement('noqueue')
+    cursor.execute(stmt)
--- a/journal-postfix/files/srv/storage_setup.py
+++ b/journal-postfix/files/srv/storage_setup.py
@ -0,0 +1,210 @@
+#!/usr/bin/env python3
+
+"""
+Database table definitions and prepared statements.
+
+Note: (short) postfix queue IDs are not unique:
+http://postfix.1071664.n5.nabble.com/Queue-ID-gets-reused-Not-unique-td25387.html
+"""
+
+from typing import Dict, List
+
+
+_table_def_delivery_from = [
+    [
+        dict(name='t_i', dtype='TIMESTAMP'),
+        dict(name='t_f', dtype='TIMESTAMP'),
+        dict(name='queue_id', dtype='VARCHAR(16)', null=False, extra='UNIQUE'),
+        dict(name='host', dtype='VARCHAR(200)'),
+        dict(name='ip', dtype='VARCHAR(50)'),
+        dict(name='sasl_username', dtype='VARCHAR(300)'),
+        dict(name='orig_queue_id', dtype='VARCHAR(16)'),
+        dict(name='status', dtype='VARCHAR(10)'),
+        dict(name='accepted', dtype='BOOL', null=False, default='TRUE'),
+        dict(name='done', dtype='BOOL', null=False, default='FALSE'),
+        dict(name='sender', dtype='VARCHAR(300)'),
+        dict(name='message_id', dtype='VARCHAR(1000)'),
+        dict(name='resent_message_id', dtype='VARCHAR(1000)'),
+        dict(name='subject', dtype='VARCHAR(1000)'),
+        dict(name='phase', dtype='VARCHAR(15)'),
+        dict(name='error', dtype='VARCHAR(1000)'),
+        dict(name='size', dtype='INT'),
+        dict(name='nrcpt', dtype='INT'),
+        dict(name='verp_id', dtype='INT'),
+        dict(name='messages', dtype='JSONB', null=False, default="'{}'::JSONB"),
+    ],
+    "CREATE INDEX delivery_from__queue_id ON delivery_from (queue_id)",
+    "CREATE INDEX delivery_from__t_i ON delivery_from (t_i)",
+    "CREATE INDEX delivery_from__t_f ON delivery_from (t_f)",
+    "CREATE INDEX delivery_from__sender ON delivery_from (sender)",
+    "CREATE INDEX delivery_from__message_id ON delivery_from (message_id)",
+]
+
+
+_table_def_delivery_to = [
+    [
+        dict(name='t_i', dtype='TIMESTAMP'),
+        dict(name='t_f', dtype='TIMESTAMP'),
+        dict(name='queue_id', dtype='VARCHAR(16)', null=False),
+        dict(name='recipient', dtype='VARCHAR(300)'),
+        dict(name='orig_recipient', dtype='VARCHAR(300)'),
+        dict(name='host', dtype='VARCHAR(200)'),
+        dict(name='ip', dtype='VARCHAR(50)'),
+        dict(name='port', dtype='VARCHAR(10)'),
+        dict(name='relay', dtype='VARCHAR(10)'),
+        dict(name='delay', dtype='VARCHAR(200)'),
+        dict(name='delays', dtype='VARCHAR(200)'),
+        dict(name='dsn', dtype='VARCHAR(10)'),
+        dict(name='status', dtype='VARCHAR(10)'),
+        dict(name='status_text', dtype='VARCHAR(1000)'),
+        dict(name='messages', dtype='JSONB', null=False, default="'{}'::JSONB"),
+    ],
+    "ALTER TABLE delivery_to ADD CONSTRAINT"
+    " delivery_to__queue_id_recipient UNIQUE(queue_id, recipient)",
+    "CREATE INDEX delivery_to__queue_id ON delivery_to (queue_id)",
+    "CREATE INDEX delivery_to__recipient ON delivery_to (recipient)",
+    "CREATE INDEX delivery_to__t_i ON delivery_to (t_i)",
+    "CREATE INDEX delivery_to__t_f ON delivery_to (t_f)",
+]
+
+
+_table_def_noqueue = [
+    [
+        dict(name='t', dtype='TIMESTAMP'),
+        dict(name='host', dtype='VARCHAR(200)'),
+        dict(name='ip', dtype='VARCHAR(50)'),
+        dict(name='sender', dtype='VARCHAR(300)'),
+        dict(name='recipient', dtype='VARCHAR(300)'),
+        dict(name='sasl_username', dtype='VARCHAR(300)'),
+        dict(name='status', dtype='VARCHAR(10)'),
+        dict(name='phase', dtype='VARCHAR(15)'),
+        dict(name='error', dtype='VARCHAR(1000)'),
+        dict(name='message', dtype='TEXT'),
+    ],
+    "CREATE INDEX noqueue__t ON noqueue (t)",
+    "CREATE INDEX noqueue__sender ON noqueue (sender)",
+    "CREATE INDEX noqueue__recipient ON noqueue (recipient)",
+]
+
+
+_tables: Dict[str, list] = {
+    'delivery_from': _table_def_delivery_from,
+    'delivery_to': _table_def_delivery_to,
+    'noqueue': _table_def_noqueue,
+}
+
+
+_prepared_statements = {
+    'delivery_from':
+        "PREPARE delivery_from_insert ({}) AS "
+        "INSERT INTO delivery_from ({}) VALUES ({}) "
+        "ON CONFLICT (queue_id) DO UPDATE SET {}",
+    'delivery_to':
+        "PREPARE delivery_to_insert ({}) AS "
+        "INSERT INTO delivery_to ({}) VALUES ({}) "
+        "ON CONFLICT (queue_id, recipient) DO UPDATE SET {}",
+    'noqueue':
+        "PREPARE noqueue_insert ({}) AS "
+        "INSERT INTO noqueue ({}) VALUES ({}){}",
+}
+
+
+table_fields: Dict[str, List[str]] = {}
+"""
+Lists of field names for tables, populated by get_create_table_stmts().
+"""
+
+
+def get_sql_prepared_statement(table_name: str) -> str:
+    """
+    Return SQL defining a prepared statement for inserting into a table.
+
+    Table 'noqueue' is handled differently, because it does not have
+    an UPDATE clause.
+    """
+    col_names = []
+    col_types = []
+    col_args = []
+    col_upds = []
+    col_i = 0
+    for field in _tables[table_name][0]:
+        # column type
+        col_type = field['dtype']
+        if field['dtype'].lower().startswith('varchar'):
+            col_type = 'TEXT'
+        col_types.append(col_type)
+        # column args
+        col_i += 1
+        col_arg = '$' + str(col_i)
+        # column name
+        col_name = field['name']
+        col_names.append(col_name)
+        if 'default' in field:
+            default = field['default']
+            col_args.append(f'COALESCE({col_arg},{default})')
+        else:
+            col_args.append(col_arg)
+        # column update
+        col_upd = f'{col_name}=COALESCE({col_arg},{table_name}.{col_name})'
+        if col_name != 't_i':
+            if col_name == 'messages':
+                col_upd = f'{col_name}={table_name}.{col_name}||{col_arg}'
+            if table_name != 'noqueue':
+                col_upds.append(col_upd)
+    stmt = _prepared_statements[table_name].format(
+        ','.join(col_types),
+        ','.join(col_names),
+        ','.join(col_args),
+        ','.join(col_upds),
+    )
+    return stmt
+
+
+def get_sql_execute_prepared_statement(table_name: str) -> str:
+    """
+    Return SQL for executing the given table's prepared statement.
+
+    The result is based on global variable _tables.
+    """
+    fields = _tables[table_name][0]
+    return "EXECUTE {}_insert ({})"\
+        .format(table_name, ','.join(['%s' for i in range(len(fields))]))
+
+
+def get_create_table_stmts() -> Dict[str, List[str]]:
+    """
+    Return a dict mapping table names to SQL statements creating the tables.
+
+    Also populate global variable table_fields.
+    """
+    res = {}
+    for table_name, table_def in _tables.items():
+        stmts = table_def.copy()
+        stmts[0] = _get_sql_create_stmt(table_name, table_def[0])
+        res[table_name] = stmts
+        field_names = [x['name'] for x in table_def[0]]
+        global table_fields
+        table_fields[table_name] = field_names
+    return res
+
+
+def _get_sql_create_stmt(table_name: str, fields: List[dict]):
+    """
+    Return the 'CREATE TABLE' SQL statement for a table.
+
+    Factor in NULL, DEFAULT and extra DDL text.
+    """
+    sql = f"CREATE TABLE {table_name} (\n    id BIGSERIAL,"
+    col_defs = []
+    for field in fields:
+        col_def = f"    {field['name']} {field['dtype']}"
+        if 'null' in field and field['null'] is False:
+            col_def += " NOT NULL"
+        if 'default' in field:
+            col_def += f" DEFAULT {field['default']}"
+        if 'extra' in field:
+            col_def += f" {field['extra']}"
+        col_defs.append(col_def)
+    sql += '\n' + ',\n'.join(col_defs)
+    sql += '\n)'
+    return sql
--- a/journal-postfix/tasks/main.yml
+++ b/journal-postfix/tasks/main.yml
@ -0,0 +1,90 @@
+- name: user journal-postfix
+  user:
+    name: journal-postfix
+    group: systemd-journal
+    state: present
+    system: yes
+    uid: 420
+    create_home: no
+    home: /srv/journal-postfix
+    password: '!'
+    password_lock: yes
+    comment: created by ansible role journal-postfix
+
+- name: directories /srv/journal-postfix, /etc/journal-postfix
+  file:
+    path: "{{ item }}"
+    state: directory
+    owner: journal-postfix
+    group: systemd-journal
+    mode: 0755
+  loop:
+    - /srv/journal-postfix
+    - /etc/journal-postfix
+
+- name: install dependencies
+  apt:
+    name: python3-psycopg2,python3-systemd,python3-yaml
+    state: present
+    update_cache: yes
+    install_recommends: no
+
+- name: files in /srv/journal-postfix
+  copy:
+    src: "srv/{{ item }}"
+    dest: "/srv/journal-postfix/{{ item }}"
+    owner: journal-postfix
+    group: systemd-journal
+    mode: 0644
+    force: yes
+  loop:
+    - run.py
+    - settings.py
+    - sources.py
+    - parser.py
+    - storage.py
+    - storage_setup.py
+    - README.md
+    - setup.cfg
+
+- name: make some files executable
+  file:
+    path: "{{ item }}"
+    mode: 0755
+  loop:
+    - /srv/journal-postfix/run.py
+    - /srv/journal-postfix/settings.py
+
+- name: determine whether to startup
+  set_fact:
+    startup: "{{ mailserver.postgresql.host is defined and mailserver.postgresql.port is defined and mailserver.postgresql.dbname is defined and mailserver.postgresql.username is defined and mailserver.postgresql.password is defined }}"
+
+- name: file /etc/journal-postfix/main.yml
+  template:
+    src: main.yml
+    dest: /etc/journal-postfix/main.yml
+    owner: journal-postfix
+    group: systemd-journal
+    mode: 0600
+    force: no
+
+- name: file journal-postfix.service
+  copy:
+    src: journal-postfix.service
+    dest: /etc/systemd/system/journal-postfix.service
+    owner: root
+    group: root
+    mode: 0644
+    force: yes
+
+- name: enable systemd unit journal-postfix.service
+  systemd:
+    enabled: yes
+    daemon_reload: yes
+    name: journal-postfix.service
+
+- name: restart systemd unit journal-postfix.service
+  systemd:
+    state: restarted
+    name: journal-postfix.service
+  when: startup
--- a/journal-postfix/templates/main.yml
+++ b/journal-postfix/templates/main.yml
@ -0,0 +1,45 @@
+# Configuration for journal-postfix, see /srv/journal-postfix
+
+# To enable startup of systemd unit journal-postfix set this to yes:
+startup: {{ 'yes' if startup else 'no' }}
+
+# PostgreSQL database connection parameters
+postgresql:
+    hostname: {{ mailserver.postgresql.host | default('127.0.0.1') }}
+    port: {{ mailserver.postgresql.port | default('5432') }}
+    database: {{ mailserver.postgresql.dbname | default('mailserver') }}
+    username: {{ mailserver.postgresql.username | default('mailserver') }}
+    password: {{ mailserver.postgresql.password | default('*************') }}
+
+# Postfix parameters
+postfix:
+    # Systemd unit name of the Postfix unit. Only one unit is supported.
+    systemd_unitname: postfix@-.service
+
+    # If you have configured Postfix to rewrite envelope sender
+    # addresses of outgoing mails so that it includes a VERP
+    # (Variable Envelope Return Path) of the form
+    # {local_part}+{verp_marker}-{id}@{domain}, where id is an
+    # integer, then set the verp_marker here:
+    verp_marker: {{ mailserver.postfix.verp_marker | default('') }}
+
+# Poll timeout in seconds for fetching messages from the journal.
+journal_poll_interval: 10.0
+
+# How much time may pass before committing a database transaction?
+# (The actual maximal delay can be one journal_poll_interval in addition.)
+max_delay_before_commit: 60.0
+
+# How many messages to cache at most before committing a database transaction?
+max_messages_per_commit: 10000
+
+# Delete delivery records older than this number of days.
+# A value of 0 means that data are never deleted.
+# Note: Deliveries may have a substantial time intervals over which they
+# are active; here the age of a delivery is determined by its start time.
+delete_deliveries_after_days: 30
+
+# The time interval in seconds after which a deletion of old
+# delivery records is triggered. (Will not be smaller than
+# max_delay_before_commit + journal_poll_interval.)
+delete_interval: 3600