Ευάγγελος Μπαλάσκας - Evaggelos Balaskas

Sep

2017

Rspamd Fast, free and open-source spam filtering system

Posted by ebal at 00:05:33 in blog, planet_ellak, planet_Sysadmin, planet_fsfe

Fighting Spam

Fighting email spam in modern times most of the times looks like this:

Rspamd

Rspamd is a rapid spam filtering system. Written in C with Lua script engine extension seems to be really fast and a really good solution for SOHO environments.

In this blog post, I'’ll try to present you a quickstart guide on working with rspamd on a CentOS 6.9 machine running postfix.

DISCLAIMER: This blog post is from a very technical point of view!

Installation

We are going to install rspamd via know rpm repositories:

Epel Repository

We need to install epel repository first:

# yum -y install http://fedora-mirror01.rbc.ru/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

Rspamd Repository

Now it is time to setup the rspamd repository:

# curl https://rspamd.com/rpm-stable/centos-6/rspamd.repo -o /etc/yum.repos.d/rspamd.repo

Install the gpg key

# rpm --import http://rspamd.com/rpm-stable/gpg.key

and verify the repository with # yum repolist

repo id     repo name
base        CentOS-6 - Base
epel        Extra Packages for Enterprise Linux 6 - x86_64
extras      CentOS-6 - Extras
rspamd      Rspamd stable repository
updates     CentOS-6 - Updates

Rpm

Now it is time to install rspamd to our linux box:

# yum -y install rspamd


# yum info rspamd

Name        : rspamd
Arch        : x86_64
Version     : 1.6.3
Release     : 1
Size        : 8.7 M
Repo        : installed
From repo   : rspamd
Summary     : Rapid spam filtering system
URL         : https://rspamd.com
License     : BSD2c
Description : Rspamd is a rapid, modular and lightweight spam filter. It is designed to work
            : with big amount of mail and can be easily extended with own filters written in
            : lua.

Init File

We need to correct rspamd init file so that rspamd can find the correct configuration file:

# vim /etc/init.d/rspamd

# ebal, Wed, 06 Sep 2017 00:31:37 +0300
## RSPAMD_CONF_FILE="/etc/rspamd/rspamd.sysvinit.conf"
RSPAMD_CONF_FILE="/etc/rspamd/rspamd.conf"


# ln -s /etc/rspamd/rspamd.conf /etc/rspamd/rspamd.sysvinit.conf

Start Rspamd

We are now ready to start for the first time rspamd daemon:

# /etc/init.d/rspamd restart

syntax OK
Stopping rspamd:                                           [FAILED]
Starting rspamd:                                           [  OK  ]

verify that is running:

# ps -e fuwww | egrep -i rsp[a]md


root      1337  0.0  0.7 205564  7164 ?        Ss   20:19   0:00 rspamd: main process
_rspamd   1339  0.0  0.7 206004  8068 ?        S    20:19   0:00  _ rspamd: rspamd_proxy process
_rspamd   1340  0.2  1.2 209392 12584 ?        S    20:19   0:00  _ rspamd: controller process
_rspamd   1341  0.0  1.0 208436 11076 ?        S    20:19   0:00  _ rspamd: normal process

perfect, now it is time to enable rspamd to run on boot:

# chkconfig rspamd on

# chkconfig --list | egrep rspamd
rspamd          0:off   1:off   2:on    3:on    4:on    5:on    6:off

Postfix

In a nutshell, postfix will pass through (filter) an email using the milter protocol to another application before queuing it to one of postfix’s mail queues. Think milter as a bridge that connects two different applications.

Rspamd Proxy

In Rspamd 1.6 Rmilter is obsoleted but rspamd proxy worker supports milter protocol. That means we need to connect our postfix with rspamd_proxy via milter protocol.

Rspamd has a really nice documentation: https://rspamd.com/doc/index.html
On MTA integration you can find more info.

# netstat -ntlp | egrep -i rspamd

output:

tcp        0      0 0.0.0.0:11332               0.0.0.0:*                   LISTEN      1451/rspamd
tcp        0      0 0.0.0.0:11333               0.0.0.0:*                   LISTEN      1451/rspamd
tcp        0      0 127.0.0.1:11334             0.0.0.0:*                   LISTEN      1451/rspamd
tcp        0      0 :::11332                    :::*                        LISTEN      1451/rspamd
tcp        0      0 :::11333                    :::*                        LISTEN      1451/rspamd
tcp        0      0 ::1:11334                   :::*                        LISTEN      1451/rspamd

# egrep -A1 proxy /etc/rspamd/rspamd.conf


worker "rspamd_proxy" {
    bind_socket = "*:11332";
    .include "$CONFDIR/worker-proxy.inc"
    .include(try=true; priority=1,duplicate=merge) "$LOCAL_CONFDIR/local.d/worker-proxy.inc"
    .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/worker-proxy.inc"
}

Milter

If you want to know all the possibly configuration parameter on postfix for milter setup:

# postconf | egrep -i milter

output:

milter_command_timeout = 30s
milter_connect_macros = j {daemon_name} v
milter_connect_timeout = 30s
milter_content_timeout = 300s
milter_data_macros = i
milter_default_action = tempfail
milter_end_of_data_macros = i
milter_end_of_header_macros = i
milter_helo_macros = {tls_version} {cipher} {cipher_bits} {cert_subject} {cert_issuer}
milter_macro_daemon_name = $myhostname
milter_macro_v = $mail_name $mail_version
milter_mail_macros = i {auth_type} {auth_authen} {auth_author} {mail_addr} {mail_host} {mail_mailer}
milter_protocol = 6
milter_rcpt_macros = i {rcpt_addr} {rcpt_host} {rcpt_mailer}
milter_unknown_command_macros =
non_smtpd_milters =
smtpd_milters =

We are mostly interested in the last two, but it is best to follow rspamd documentation:

# vim /etc/postfix/main.cf

Adding the below configuration lines:

# ebal, Sat, 09 Sep 2017 18:56:02 +0300

## A list of Milter (mail filter) applications for new mail that does not arrive via the Postfix smtpd(8) server.
on_smtpd_milters = inet:127.0.0.1:11332

## A list of Milter (mail filter) applications for new mail that arrives via the Postfix smtpd(8) server.
smtpd_milters = inet:127.0.0.1:11332

## Send macros to mail filter applications
milter_mail_macros = i {auth_type} {auth_authen} {auth_author} {mail_addr} {client_addr} {client_name} {mail_host} {mail_mailer}

## skip mail without checks if something goes wrong, like rspamd is down !
milter_default_action = accept

Reload postfix

# postfix reload

postfix/postfix-script: refreshing the Postfix mail system

Testing

netcat

From a client:

$ nc 192.168.122.96 25

220 centos69.localdomain ESMTP Postfix
EHLO centos69
250-centos69.localdomain
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: <root@example.org>
250 2.1.0 Ok
RCPT TO: <root@localhost>
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
test
.
250 2.0.0 Ok: queued as 4233520144
^]

Logs

Looking through logs may be a difficult task for many, even so it is a task that you have to do.

MailLog

# egrep 4233520144 /var/log/maillog


Sep  9 19:08:01 localhost postfix/smtpd[1960]: 4233520144: client=unknown[192.168.122.1]
Sep  9 19:08:05 localhost postfix/cleanup[1963]: 4233520144: message-id=<>
Sep  9 19:08:05 localhost postfix/qmgr[1932]: 4233520144: from=<root@example.org>, size=217, nrcpt=1 (queue active)
Sep  9 19:08:05 localhost postfix/local[1964]: 4233520144: to=<root@localhost.localdomain>, orig_to=<root@localhost>, relay=local, delay=12, delays=12/0.01/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep  9 19:08:05 localhost postfix/qmgr[1932]: 4233520144: removed

Everything seems fine with postfix.

Rspamd Log

# egrep -i 4233520144 /var/log/rspamd/rspamd.log

2017-09-09 19:08:05 #1455(normal) <79a04e>; task; rspamd_message_parse: loaded message; id: <undef>; queue-id: <4233520144>; size: 6; checksum: <a6a8e3835061e53ed251c57ab4f22463>

2017-09-09 19:08:05 #1455(normal) <79a04e>; task; rspamd_task_write_log: id: <undef>, qid: <4233520144>, ip: 192.168.122.1, from: <root@example.org>, (default: F (add header): [9.40/15.00] [MISSING_MID(2.50){},MISSING_FROM(2.00){},MISSING_SUBJECT(2.00){},MISSING_TO(2.00){},MISSING_DATE(1.00){},MIME_GOOD(-0.10){text/plain;},ARC_NA(0.00){},FROM_NEQ_ENVFROM(0.00){;root@example.org;},RCVD_COUNT_ZERO(0.00){0;},RCVD_TLS_ALL(0.00){}]), len: 6, time: 87.992ms real, 4.723ms virtual, dns req: 0, digest: <a6a8e3835061e53ed251c57ab4f22463>, rcpts: <root@localhost>

It works !

Training

If you have already a spam or junk folder is really easy training the Bayesian classifier with rspamc.

I use Maildir, so for my setup the initial training is something like this:

 # cd /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/

# find . -type f -exec rspamc learn_spam {} \;

Auto-Training

I’ve read a lot of tutorials that suggest real-time training via dovecot plugins or something similar. I personally think that approach adds complexity and for small companies or personal setup, I prefer using Cron daemon:


 @daily /bin/find /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/ -type f -mtime -1 -exec rspamc learn_spam {} \;

That means every day, search for new emails in my spam folder and use them to train rspamd.

Training from mbox

First of all seriously ?

Split mbox

There is a nice and simply way to split a mbox to separated files for rspamc to use them.

# awk '/^From / {i++}{print > "msg"i}' Spam

and then feed rspamc:

# ls -1 msg* | xargs rspamc --verbose learn_spam

Stats

# rspamc stat


Results for command: stat (0.068 seconds)
Messages scanned: 2
Messages with action reject: 0, 0.00%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 2, 100.00%
Messages with action greylist: 0, 0.00%
Messages with action no action: 0, 0.00%
Messages treated as spam: 2, 100.00%
Messages treated as ham: 0, 0.00%
Messages learned: 1859
Connections count: 2
Control connections count: 2157
Pools allocated: 2191
Pools freed: 2170
Bytes allocated: 542k
Memory chunks allocated: 41
Shared chunks allocated: 10
Chunks freed: 0
Oversized chunks: 736
Fuzzy hashes in storage "rspamd.com": 659509399
Fuzzy hashes stored: 659509399
Statfile: BAYES_SPAM type: sqlite3; length: 32.66M; free blocks: 0; total blocks: 430.29k; free: 0.00%; learned: 1859; users: 1; languages: 4
Statfile: BAYES_HAM type: sqlite3; length: 9.22k; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 1; languages: 1
Total learns: 1859

X-Spamd-Result

To view the spam score in every email, we need to enable extended reporting headers and to do that we need to edit our configuration:

# vim /etc/rspamd/modules.d/milter_headers.conf

and just above use add :

    # ebal, Wed, 06 Sep 2017 01:52:08 +0300
    extended_spam_headers = true;

   use = [];

then reload rspamd:

# /etc/init.d/rspamd reload

syntax OK
Reloading rspamd:                                          [  OK  ]

View Source

If your open the email in view-source then you will see something like this:


X-Rspamd-Queue-Id: D0A5728ABF
X-Rspamd-Server: centos69
X-Spamd-Result: default: False [3.40 / 15.00]

Web Server

Rspamd comes with their own web server. That is really useful if you dont have a web server in your mail server, but it is not recommended.

By-default, rspamd web server is only listening to local connections. We can see that from the below ss output

# ss -lp | egrep -i rspamd

LISTEN     0      128                    :::11332                   :::*        users:(("rspamd",7469,10),("rspamd",7471,10),("rspamd",7472,10),("rspamd",7473,10))
LISTEN     0      128                     *:11332                    *:*        users:(("rspamd",7469,9),("rspamd",7471,9),("rspamd",7472,9),("rspamd",7473,9))
LISTEN     0      128                    :::11333                   :::*        users:(("rspamd",7469,18),("rspamd",7473,18))
LISTEN     0      128                     *:11333                    *:*        users:(("rspamd",7469,16),("rspamd",7473,16))
LISTEN     0      128                   ::1:11334                   :::*        users:(("rspamd",7469,14),("rspamd",7472,14),("rspamd",7473,14))
LISTEN     0      128             127.0.0.1:11334                    *:*        users:(("rspamd",7469,12),("rspamd",7472,12),("rspamd",7473,12))

127.0.0.1:11334

So if you want to change that (dont) you have to edit the rspamd.conf (core file):

# vim +/11334 /etc/rspamd/rspamd.conf

and change this line:

bind_socket = "localhost:11334";

to something like this:

bind_socket = "YOUR_SERVER_IP:11334";

or use sed:

# sed -i -e 's/localhost:11334/YOUR_SERVER_IP/' /etc/rspamd/rspamd.conf

and then fire up your browser:

Web Password

It is a good tactic to change the default password of this web-gui to something else.

# vim /etc/rspamd/worker-controller.inc

  # password = "q1";
  password = "password";

always a good idea to restart rspamd.

Reverse Proxy

I dont like having exposed any web app without SSL or basic authentication, so I shall put rspamd web server under a reverse proxy (apache).

So on httpd-2.2 the configuration is something like this:

ProxyPreserveHost On

<Location /rspamd>
    AuthName "Rspamd Access"
    AuthType Basic
    AuthUserFile /etc/httpd/rspamd_htpasswd
    Require valid-user

    ProxyPass http://127.0.0.1:11334
    ProxyPassReverse http://127.0.0.1:11334 

    Order allow,deny
    Allow from all 

</Location>

Http Basic Authentication

You need to create the file that is going to be used to store usernames and password for basic authentication:

# htpasswd -csb /etc/httpd/rspamd_htpasswd rspamd rspamd_passwd
Adding password for user rspamd

restart your apache instance.

bind_socket

Of course for this to work, we need to change the bind socket on rspamd.conf
Dont forget this ;)

bind_socket = "127.0.0.1:11334";

Selinux

If there is a problem with selinux, then:

# setsebool -P httpd_can_network_connect=1

# setsebool httpd_can_network_connect_db on

Errors ?

If you see an error like this:
IO write error

when running rspamd, then you need explicit tell rspamd to use:

rspamc -h 127.0.0.1:11334

To prevent any future errors, I’ve created a shell wrapper:

/usr/local/bin/rspamc

#!/bin/sh
/usr/bin/rspamc -h 127.0.0.1:11334 $*

Final Thoughts

I am using rspamd for a while know and I am pretty happy with it.

I’ve setup a spamtrap email address to feed my spam folder and let the cron script to train rspamd.

So after a thousand emails:

Tag(s): rspamd, postfix, apache

rspamd

postfix

apache

Evaggelos Balaskas - System Engineer

Fighting Spam

Rspamd

Installation

Epel Repository

Rspamd Repository

Rpm

Init File

Start Rspamd

Postfix

Rspamd Proxy

Milter

Testing

netcat

Logs

MailLog

Rspamd Log

Training

Auto-Training

Training from mbox

Split mbox

Stats

X-Spamd-Result

View Source

Web Server

Web Password

Reverse Proxy

Http Basic Authentication

bind_socket

Selinux

Errors ?

Final Thoughts

Admin area

Categories

Archives

Evaggelos Balaskas - System Engineer

Fighting Spam

Rspamd

Installation

Epel Repository

Rspamd Repository

Rpm

Init File

Start Rspamd

Postfix

Rspamd Proxy

Milter

Testing

netcat

Logs

MailLog

Rspamd Log

Training

Auto-Training

Training from mbox

Split mbox

Stats

X-Spamd-Result

View Source

Web Server

Web Password

Reverse Proxy

Http Basic Authentication

bind_socket

Selinux

Errors ?

Final Thoughts

Search

Admin area

Categories

Archives