back to my homepage

My Spammeter

I have realized recently that I get an enormous amount of unsolicited junk e-mail, and that it is getting worse every day. Some people don't understand why I just don't change my e-mail, but I am not about to get rid of my @apache.org or @modpython.org addresses.

To demonstrate just how much of a problem it really is, I have created a system that counts spam as it is identified by my spam filtering program (SpamBayes), places the count into a database (RRDtool) and periodically updates charts showing the rate of incoming spam over a period of time (see bottom of page for more detail).






Source Code

Dec 29 2003

People have been asking me for source code for this page. The truth is there isn't much source code needed to get this going. There is probably a miriad of other ways to get the same thing accomplished, this is just one of them.

I use procmail and spambayes to filter my mail. Spambayes comes with a handy script called hammie that you use in your procmail rules. Hammie will insert a header into every email with a determination of the probability of the e-mail being spam.

Before anything, you will need to create the RRD file. (For more info on rrdtool use google). I don't remember how I created mine, but it's something like:

rrdtool create spam.rrd --step 3600 DS:count:ABSOLUTE:864000:0:100000 \
        RRA:AVERAGE:.5:1:87600 \
        RRA:MIN:.5:288:3650 \
        RRA:MAX:.5:288:3650

Here is the relevant part of my .procmailrc file. This calls the spamcount.py script whenever spam is encountered:

# Call spambayes  hammie 
:0fw
        | /usr/local/bin/hammiefilter.py

# SPAM?
:0 c: /home/grisha/.procmail/spamcount.lck
      * ^X-Spambayes-Classification: spam
        | /home/grisha/.procmail/spamcount.py
:0:
      * ^X-Spambayes-Classification: spam
      $HOME/mail/spam
The spamcount.py is a Python script that updates the RRD file. It uses the RRDtool interface for Python:
#!/usr/local/bin/python

import time
import sys
import RRDtool

try:
    rrd = RRDtool.RRDtool()
    rrd.update(("/home/grisha/.procmail/spam.rrd", "%d:1" % int(time.time())))
finally:
    # consume all input
    sys.stdin.read()
Finally you need a script to generate the graphs, which you'd call from cron on a regular interval. Mine looks like this:
#!/usr/local/bin/python

import time
import sys
import RRDtool


rrd = RRDtool.RRDtool()
rrd.graph(("spam.gif",
           "-s", "1010000000",
           '--title=Spam Graph. Last Updated: %s' % time.ctime(),
           "DEF:count=/home/grisha/.procmail/spam.rrd:count:AVERAGE",
           "CDEF:hr=count,86400,*",
           'LINE2:hr#ff0000:Spams/Day'))

rrd.graph(("spamweek.gif",
           "-s", "-604800",
           '--title=Spam Graph. Last Updated: %s' % time.ctime(),
           "DEF:count=/home/grisha/.procmail/spam.rrd:count:AVERAGE",
           "CDEF:hr=count,3600,*",
           'LINE2:hr#ff0000:Spams/Hr'))

rrd.graph(("spamday.gif",
           "-s", "-86400",
           '--title=Spam Graph. Last Updated: %s' % time.ctime(),
           "DEF:count=/home/grisha/.procmail/spam.rrd:count:AVERAGE",
           "CDEF:hr=count,3600,*",
           'LINE2:hr#ff0000:Spams/Hr'))

You also need a script to copy/upload the graphs to your website, this I leave as an excercise for the reader.