Fighting Spam
Fighting email spam in modern times most of the times looks like this:
Rspamd
Rspamd is a rapid spam filtering system. Written in C with Lua script engine extension seems to be really fast and a really good solution for SOHO environments.
In this blog post, I'’ll try to present you a quickstart guide on working with rspamd on a CentOS 6.9 machine running postfix.
DISCLAIMER: This blog post is from a very technical point of view!
Installation
We are going to install rspamd via know rpm repositories:
Epel Repository
We need to install epel repository first:
# yum -y install http://fedora-mirror01.rbc.ru/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
Rspamd Repository
Now it is time to setup the rspamd repository:
# curl https://rspamd.com/rpm-stable/centos-6/rspamd.repo -o /etc/yum.repos.d/rspamd.repo
Install the gpg key
# rpm --import http://rspamd.com/rpm-stable/gpg.key
and verify the repository with # yum repolist
repo id repo name
base CentOS-6 - Base
epel Extra Packages for Enterprise Linux 6 - x86_64
extras CentOS-6 - Extras
rspamd Rspamd stable repository
updates CentOS-6 - Updates
Rpm
Now it is time to install rspamd to our linux box:
# yum -y install rspamd
# yum info rspamd
Name : rspamd
Arch : x86_64
Version : 1.6.3
Release : 1
Size : 8.7 M
Repo : installed
From repo : rspamd
Summary : Rapid spam filtering system
URL : https://rspamd.com
License : BSD2c
Description : Rspamd is a rapid, modular and lightweight spam filter. It is designed to work
: with big amount of mail and can be easily extended with own filters written in
: lua.
Init File
We need to correct rspamd init file so that rspamd can find the correct configuration file:
# vim /etc/init.d/rspamd
# ebal, Wed, 06 Sep 2017 00:31:37 +0300
## RSPAMD_CONF_FILE="/etc/rspamd/rspamd.sysvinit.conf"
RSPAMD_CONF_FILE="/etc/rspamd/rspamd.conf"
or
# ln -s /etc/rspamd/rspamd.conf /etc/rspamd/rspamd.sysvinit.conf
Start Rspamd
We are now ready to start for the first time rspamd daemon:
# /etc/init.d/rspamd restart
syntax OK
Stopping rspamd: [FAILED]
Starting rspamd: [ OK ]
verify that is running:
# ps -e fuwww | egrep -i rsp[a]md
root 1337 0.0 0.7 205564 7164 ? Ss 20:19 0:00 rspamd: main process
_rspamd 1339 0.0 0.7 206004 8068 ? S 20:19 0:00 _ rspamd: rspamd_proxy process
_rspamd 1340 0.2 1.2 209392 12584 ? S 20:19 0:00 _ rspamd: controller process
_rspamd 1341 0.0 1.0 208436 11076 ? S 20:19 0:00 _ rspamd: normal process
perfect, now it is time to enable rspamd to run on boot:
# chkconfig rspamd on
# chkconfig --list | egrep rspamd
rspamd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Postfix
In a nutshell, postfix will pass through (filter) an email using the milter protocol to another application before queuing it to one of postfix’s mail queues. Think milter as a bridge that connects two different applications.
Rspamd Proxy
In Rspamd 1.6 Rmilter is obsoleted but rspamd proxy worker supports milter protocol. That means we need to connect our postfix with rspamd_proxy via milter protocol.
Rspamd has a really nice documentation: https://rspamd.com/doc/index.html
On MTA integration you can find more info.
# netstat -ntlp | egrep -i rspamd
output:
tcp 0 0 0.0.0.0:11332 0.0.0.0:* LISTEN 1451/rspamd
tcp 0 0 0.0.0.0:11333 0.0.0.0:* LISTEN 1451/rspamd
tcp 0 0 127.0.0.1:11334 0.0.0.0:* LISTEN 1451/rspamd
tcp 0 0 :::11332 :::* LISTEN 1451/rspamd
tcp 0 0 :::11333 :::* LISTEN 1451/rspamd
tcp 0 0 ::1:11334 :::* LISTEN 1451/rspamd
# egrep -A1 proxy /etc/rspamd/rspamd.conf
worker "rspamd_proxy" {
bind_socket = "*:11332";
.include "$CONFDIR/worker-proxy.inc"
.include(try=true; priority=1,duplicate=merge) "$LOCAL_CONFDIR/local.d/worker-proxy.inc"
.include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/worker-proxy.inc"
}
Milter
If you want to know all the possibly configuration parameter on postfix for milter setup:
# postconf | egrep -i milter
output:
milter_command_timeout = 30s
milter_connect_macros = j {daemon_name} v
milter_connect_timeout = 30s
milter_content_timeout = 300s
milter_data_macros = i
milter_default_action = tempfail
milter_end_of_data_macros = i
milter_end_of_header_macros = i
milter_helo_macros = {tls_version} {cipher} {cipher_bits} {cert_subject} {cert_issuer}
milter_macro_daemon_name = $myhostname
milter_macro_v = $mail_name $mail_version
milter_mail_macros = i {auth_type} {auth_authen} {auth_author} {mail_addr} {mail_host} {mail_mailer}
milter_protocol = 6
milter_rcpt_macros = i {rcpt_addr} {rcpt_host} {rcpt_mailer}
milter_unknown_command_macros =
non_smtpd_milters =
smtpd_milters =
We are mostly interested in the last two, but it is best to follow rspamd documentation:
# vim /etc/postfix/main.cf
Adding the below configuration lines:
# ebal, Sat, 09 Sep 2017 18:56:02 +0300
## A list of Milter (mail filter) applications for new mail that does not arrive via the Postfix smtpd(8) server.
on_smtpd_milters = inet:127.0.0.1:11332
## A list of Milter (mail filter) applications for new mail that arrives via the Postfix smtpd(8) server.
smtpd_milters = inet:127.0.0.1:11332
## Send macros to mail filter applications
milter_mail_macros = i {auth_type} {auth_authen} {auth_author} {mail_addr} {client_addr} {client_name} {mail_host} {mail_mailer}
## skip mail without checks if something goes wrong, like rspamd is down !
milter_default_action = accept
Reload postfix
# postfix reload
postfix/postfix-script: refreshing the Postfix mail system
Testing
netcat
From a client:
$ nc 192.168.122.96 25
220 centos69.localdomain ESMTP Postfix
EHLO centos69
250-centos69.localdomain
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: <root@example.org>
250 2.1.0 Ok
RCPT TO: <root@localhost>
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
test
.
250 2.0.0 Ok: queued as 4233520144
^]
Logs
Looking through logs may be a difficult task for many, even so it is a task that you have to do.
MailLog
# egrep 4233520144 /var/log/maillog
Sep 9 19:08:01 localhost postfix/smtpd[1960]: 4233520144: client=unknown[192.168.122.1]
Sep 9 19:08:05 localhost postfix/cleanup[1963]: 4233520144: message-id=<>
Sep 9 19:08:05 localhost postfix/qmgr[1932]: 4233520144: from=<root@example.org>, size=217, nrcpt=1 (queue active)
Sep 9 19:08:05 localhost postfix/local[1964]: 4233520144: to=<root@localhost.localdomain>, orig_to=<root@localhost>, relay=local, delay=12, delays=12/0.01/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep 9 19:08:05 localhost postfix/qmgr[1932]: 4233520144: removed
Everything seems fine with postfix.
Rspamd Log
# egrep -i 4233520144 /var/log/rspamd/rspamd.log
2017-09-09 19:08:05 #1455(normal) <79a04e>; task; rspamd_message_parse: loaded message; id: <undef>; queue-id: <4233520144>; size: 6; checksum: <a6a8e3835061e53ed251c57ab4f22463>
2017-09-09 19:08:05 #1455(normal) <79a04e>; task; rspamd_task_write_log: id: <undef>, qid: <4233520144>, ip: 192.168.122.1, from: <root@example.org>, (default: F (add header): [9.40/15.00] [MISSING_MID(2.50){},MISSING_FROM(2.00){},MISSING_SUBJECT(2.00){},MISSING_TO(2.00){},MISSING_DATE(1.00){},MIME_GOOD(-0.10){text/plain;},ARC_NA(0.00){},FROM_NEQ_ENVFROM(0.00){;root@example.org;},RCVD_COUNT_ZERO(0.00){0;},RCVD_TLS_ALL(0.00){}]), len: 6, time: 87.992ms real, 4.723ms virtual, dns req: 0, digest: <a6a8e3835061e53ed251c57ab4f22463>, rcpts: <root@localhost>
It works !
Training
If you have already a spam or junk folder is really easy training the Bayesian classifier with rspamc.
I use Maildir, so for my setup the initial training is something like this:
# cd /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/
# find . -type f -exec rspamc learn_spam {} \;
Auto-Training
I’ve read a lot of tutorials that suggest real-time training via dovecot plugins or something similar. I personally think that approach adds complexity and for small companies or personal setup, I prefer using Cron daemon:
@daily /bin/find /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/ -type f -mtime -1 -exec rspamc learn_spam {} \;
That means every day, search for new emails in my spam folder and use them to train rspamd.
Training from mbox
First of all seriously ?
Split mbox
There is a nice and simply way to split a mbox to separated files for rspamc to use them.
# awk '/^From / {i++}{print > "msg"i}' Spam
and then feed rspamc:
# ls -1 msg* | xargs rspamc --verbose learn_spam
Stats
# rspamc stat
Results for command: stat (0.068 seconds)
Messages scanned: 2
Messages with action reject: 0, 0.00%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 2, 100.00%
Messages with action greylist: 0, 0.00%
Messages with action no action: 0, 0.00%
Messages treated as spam: 2, 100.00%
Messages treated as ham: 0, 0.00%
Messages learned: 1859
Connections count: 2
Control connections count: 2157
Pools allocated: 2191
Pools freed: 2170
Bytes allocated: 542k
Memory chunks allocated: 41
Shared chunks allocated: 10
Chunks freed: 0
Oversized chunks: 736
Fuzzy hashes in storage "rspamd.com": 659509399
Fuzzy hashes stored: 659509399
Statfile: BAYES_SPAM type: sqlite3; length: 32.66M; free blocks: 0; total blocks: 430.29k; free: 0.00%; learned: 1859; users: 1; languages: 4
Statfile: BAYES_HAM type: sqlite3; length: 9.22k; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 1; languages: 1
Total learns: 1859
X-Spamd-Result
To view the spam score in every email, we need to enable extended reporting headers and to do that we need to edit our configuration:
# vim /etc/rspamd/modules.d/milter_headers.conf
and just above use add :
# ebal, Wed, 06 Sep 2017 01:52:08 +0300
extended_spam_headers = true;
use = [];
then reload rspamd:
# /etc/init.d/rspamd reload
syntax OK
Reloading rspamd: [ OK ]
View Source
If your open the email in view-source then you will see something like this:
X-Rspamd-Queue-Id: D0A5728ABF
X-Rspamd-Server: centos69
X-Spamd-Result: default: False [3.40 / 15.00]
Web Server
Rspamd comes with their own web server. That is really useful if you dont have a web server in your mail server, but it is not recommended.
By-default, rspamd web server is only listening to local connections. We can see that from the below ss output
# ss -lp | egrep -i rspamd
LISTEN 0 128 :::11332 :::* users:(("rspamd",7469,10),("rspamd",7471,10),("rspamd",7472,10),("rspamd",7473,10))
LISTEN 0 128 *:11332 *:* users:(("rspamd",7469,9),("rspamd",7471,9),("rspamd",7472,9),("rspamd",7473,9))
LISTEN 0 128 :::11333 :::* users:(("rspamd",7469,18),("rspamd",7473,18))
LISTEN 0 128 *:11333 *:* users:(("rspamd",7469,16),("rspamd",7473,16))
LISTEN 0 128 ::1:11334 :::* users:(("rspamd",7469,14),("rspamd",7472,14),("rspamd",7473,14))
LISTEN 0 128 127.0.0.1:11334 *:* users:(("rspamd",7469,12),("rspamd",7472,12),("rspamd",7473,12))
127.0.0.1:11334
So if you want to change that (dont) you have to edit the rspamd.conf (core file):
# vim +/11334 /etc/rspamd/rspamd.conf
and change this line:
bind_socket = "localhost:11334";
to something like this:
bind_socket = "YOUR_SERVER_IP:11334";
or use sed:
# sed -i -e 's/localhost:11334/YOUR_SERVER_IP/' /etc/rspamd/rspamd.conf
and then fire up your browser:
Web Password
It is a good tactic to change the default password of this web-gui to something else.
# vim /etc/rspamd/worker-controller.inc
# password = "q1";
password = "password";
always a good idea to restart rspamd.
Reverse Proxy
I dont like having exposed any web app without SSL or basic authentication, so I shall put rspamd web server under a reverse proxy (apache).
So on httpd-2.2 the configuration is something like this:
ProxyPreserveHost On
<Location /rspamd>
AuthName "Rspamd Access"
AuthType Basic
AuthUserFile /etc/httpd/rspamd_htpasswd
Require valid-user
ProxyPass http://127.0.0.1:11334
ProxyPassReverse http://127.0.0.1:11334
Order allow,deny
Allow from all
</Location>
Http Basic Authentication
You need to create the file that is going to be used to store usernames and password for basic authentication:
# htpasswd -csb /etc/httpd/rspamd_htpasswd rspamd rspamd_passwd
Adding password for user rspamd
restart your apache instance.
bind_socket
Of course for this to work, we need to change the bind socket on rspamd.conf
Dont forget this ;)
bind_socket = "127.0.0.1:11334";
Selinux
If there is a problem with selinux, then:
# setsebool -P httpd_can_network_connect=1
or
# setsebool httpd_can_network_connect_db on
Errors ?
If you see an error like this:
IO write error
when running rspamd, then you need explicit tell rspamd to use:
rspamc -h 127.0.0.1:11334
To prevent any future errors, I’ve created a shell wrapper:
/usr/local/bin/rspamc
#!/bin/sh
/usr/bin/rspamc -h 127.0.0.1:11334 $*
Final Thoughts
I am using rspamd for a while know and I am pretty happy with it.
I’ve setup a spamtrap email address to feed my spam folder and let the cron script to train rspamd.
So after a thousand emails: