These classnotes are depreciated. As of 2005, I no longer teach the classes. Notes will remain online for legacy purposes

UNIX03/SpamAssassin

Classnotes | UNIX03 | RecentChanges | Preferences

SpamAssassin is one of [My Favorite Apps] and is really one of the neatest tools I've seen in a very long time.

SpamAssassin passes a message through a series of rule-based content tests. When a message passes a given test, SpamAssassin assigns a score to it. When SpamAssassin is done with the message, the message has an accumulated score associated with it. You as a system administrator (or, possibily even your users) then determine what the minimum score a message must have in order to be classified as SPAM.

SpamAssassin can integrate nicely with other filters. In fact, it can use other SPAM filters as a part of its rulebase to integrate many various SPAM scanners into one unified SPAM filtration system.

To see the list of test performed, take a look at this: http://www.spamassassin.org/tests.html

SpamAssassin homepage: http://www.spamassassin.org/

/etc/mail/spamassassin/local.cf

/etc/mail/spamassassin/local.cf is the main configuration file (this is where it is installed by default, but it may be in a different location depending upon how your distribution set it up. For example, under Debian and Red Hat this is actually in /etc/spamassassin/local.cf). An example is as follows:

 rewrite_subject 1

(1|0) Tells SpamAssassin to change the subject on SPAM messages to include the subject_tag

 report_safe 0

(0|1|2) This setting configures how to handle SPAM. A setting of 0 puts the SpamAssassin report into the headers. A setting of 1 puts it in the main email and attaches the original email as an attachment. Setting 2 is similar to setting 1, plus it changes the type of attachment to text/plain (as a security measure).

 use_terse_report 0

(0|1) Setting this to 0 gives the normal length explanation of why the message was considered SPAM. Setting it to 1 gives a shorter report. (Note that this report only appears if you change the report_safe setting, or if you configure blocking like we will do...in which case the sender gets this report).

 use_bayes 1
 bayes_path /var/amavisd/.spamassassin/bayes
 auto_learn 1

Bayesian is a statistical approach to spam filtration. It can have very good results, when the Bayesian system is properly trained. We will not go into how this is accomplished, but will instead leave this as an excersize for the student. Check some of the following relevent articles:

 skip_rbl_checks 1

By default, SpamAssassin will check against RBLs. We want to disable this. Note, if you are not as morally opposed to RBLs as I am, then using SpamAssassin in conjunction with RBLs will likely stop all of your SPAM. Of course, that will come at a cost.

 use_razor2 1
 use_dcc 1
 use_pyzor 0
 dcc_add_header 1

These are some of the various checks that can be done. We will be using DCC and Razor (described next) so we enable them.

 dns_available yes

 header LOCAL_RCVD Received =~ /\S+\.domain\.com\s+\(.*\[.*\]\)/
 describe LOCAL_RCVD Received from local machine
 score LOCAL_RCVD -50

The last lines header, describe and score are used to prevent your outgoing mail from being tested for spam. Your users would likely be upset if their mail was tagged as spam before a client read it. This rule basically checks the header for the Received from: lines showing the message route.

The rule is a standard SpamAssassin rule and uses Regular Expression syntax. To explain it in regular terms, it looks for *.domain.com (*[*]) on the received line (where the stars are anything). When it finds a match, it gives the message a SPAM score of -50 (ensuring it is not counted as SPAM).

 ## Optional Score Increases
 score DCC_CHECK 4.000
 score RAZOR2_CHECK 2.500
 score BAYES_99 4.300
 score BAYES_90 3.500
 score BAYES_80 3.000

In this section, we turn up the value of several of the rules. The default score for a spam that turns up in the DCC database is only 2.756 when we're using Bayes and network checks. This seems a little low, so we up it to 4 points. If you wanted every message listed in the DCC database to be tagged as SPAM, you'd set this to 6.3 points. You can check the default scores for everything in the file /usr/share/spamassassin/50_scores.cf. You may see 4 different scores listed next to some rules. The file has different scores for whether or not you are using Bayes and network checks. When there is only 1 score, that score applies all the time, otherwise the 4th score is for bayes and network checks like we are using.



Classnotes | UNIX03 | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited June 6, 2003 10:37 pm (diff)
Search:
(C) Copyright 2003 Samuel Hart
Creative Commons License
This work is licensed under a Creative Commons License.