| • Science | • People | • Locations | • Timeline |
| Contents | ||
SpamAssassin is generally regarded as one of the most effective spam filters, especially when used in combination with spam databases. Even simple text-matching alone may, for most users, be sufficient to correctly classify a majority of incoming mail.
SpamAssassin is a Perl-based application (Mail::SpamAssassin in CPAN) which is usually used to filter all incoming mail for one or several users. It can be used as a standalone application or as a client (spamc) in combination with a daemon (spamd) which runs in the background. The latter modus operandi has performance benefits, but potential security downsides.
Typically either variant of the application is set up in a generic mail filter program, or it is called directly from a graphical mail user agent that supports this, whenever new mail arrives. Mail filter programs such as procmail can be made to pipe all incoming mail through SpamAssassin with an adjustment to user's .procmailrc file.
SpamAssassin comes with a large set of rules which are applied to determine whether an email is spam or not. To decide, specific fields within the email header and the email body are typically searched for certain regular expressions, and if these expressions match, the email is assigned a certain score, depending on the test, and several (customizable) headers are added to the mail. The total score resulting from all tests or other criteria can then be used by the end user or by the ISP to set the conditions under which email is moved to a separate spam folder, deleted, flagged etc.
Each test has a label and a description. The label is usually an all upper case identifier separated with underscores, such as "LIMITED_TIME_ONLY", with the description for that label being "Offers a limited time offer". A mail that passes that test (in this case, contains certain variants of the "limited time only" phrase) might be assigned a score of +0.3. With a spam threshold of 5 (default as of version 2.55), several other tests would usually have to pass for the mail to be classified as spam. On the other hand, some tests, such as those for invalid message IDs or years, result in a very high score being assigned, where even a single test can almost put a mail "over the edge".
E-mail recognized as spam by SpamAssassin, here in the Novell EvolutionEvolution or Novell Evolution (formerly Ximian Evolution is the official personal information manager and workgroup information management tool for GNOME. It combines E-mail, calendar, address book, and task list management. It has been an official part o email client.
When a mail's total score is higher than the "required_hits" setting in SpamAssassin's configuration, the mail is treated as spam and rewritten according to several options. In the default configuration, the content of the mail is appended as a MIMEMultipurpose Internet Mail Extensions MIME is an Internet Standard for the format of e-mail. Virtually all Internet e-mail is transmitted via SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME standards that it is sometim attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the passed tests and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious.
The user can customize these filters using a file "user_prefs" in their home directoryOn Unix systems, a home directory (sometimes called a home folder is a path on the local file system where a user's personal files are stored. Typically this includes configuration files (usually hidden, i. starting with a ". documents, locally installed. Within this file, they can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and SpamAssassin then assigns a higher score to all mails that appear to be written in another language. This can be very useful to users receiving a lot of foreign spam but never actually corresponding with people in that language.