It's a new fact of life... the fight against spam will never end. For
several years we have been actively persuing and implementing methods
of blocking spam. For several years we have been actively working on
blocking spam. Because of the increased amount of spam you've recieved,
you might not have realized we were doing anything.
Until now, all the resources for blocking spam were concentrated on
preventing delivery of spam. However, the most effective techniques
for doing this, blocking email delivery from specific foreign
countries, are not feasible within our working environment. (The
Offices of Academic Affairs and Student Affairs need to
be able to receive email from anyone residing in any country.)
Recognizing the past approach wasn't working, we are now trying something
new.
We'll tell you how likely we think an email message is to be spam
and you can decide what to do with it.
The email messages that get through the installed spam blocks will be
processed and tested for spam by a program called SpamAssassin (SA). SA
is one more tool in the arsenal against spam. Hopefully, after a little
tuning, it will be the tool that enables you to put the final nails in the
spam coffin.
What follows is a brief description of what SA is, how it works, how you
can use it to remove spam from your inbox, why we choose to
configure it the way we do and how you can customize its configuration.
After reading through this and using the filtering tutorial, hopefully
email will once again become a useful tool instead of a daily nuisance.
Spam-A-What? A guide to how SpamAssasin works.
SpamAssassin (SA) is an email filter used to identify spam. It currently uses
over 900 rule based tests on email headers and body
text to identify and tag spam. The spam-identification tactics used
include:
Header Analysis: spammers use a number of tricks to
mask their identities, fool you into thinking they've sent a
valid email, or fool you into thinking you must have subscribed
to their list. SA tries to spot these tricks.
Text Analysis: again, spam emails often have a characteristic
style and some characteristic disclaimers. How many times
have you read an email where you'll receive ONE MILLION
US DOLLARS if you act now and don't delay from something
that is legitimate email from some organization
where you either registered to be on this list or
opted-in at a sponsored website but you can be removed from
this list at any time? SA can spot these too.
RAZOR: a collaborative spam-tracking database, which
works by taking a signature of spam emails. Since spam
typically works by sending an identical message to hundreds
of people, Razor short-circuits this by allowing the first
person to receive a spam to add it to a database -- at
which point everyone else will block it.
Every test that SA uses has a corresponding positive, this message is
likely spam, or negative, this message is not likely to be spam, score
associated with it. For example, if the message mentions viagra
anywhere in the subject or body almost three
points will be added to the final test score. Conversely, if the
message originates from Eudora or Outlook mail clients, clients
not typically used by spammers, up to a half a point can be removed
from the final score. If the total score for all tests is over
a set threshold the message is tagged as spam.
The default threshold that SA uses is five. However, in our
testing we discovered a setting that low resulted in a number of
false positives.
As a result, we increased the threshold to
six.
At this setting, however, there is more likelihood that
actual spam will not be properly tagged. This is a trade-off
that will need to be tuned over time as we become more familiar with
SA and the needs/desires of our customers.
Messages that pass the threshold are then changed in significant ways in
order to protect you and your computer as well as provide easy ways for
your email client to recognize and filter the message to a junk
mailbox.
So That's Why My Message Looks Like That. An explanation of the changes made by SpamAssassin
The advent of HTML email has spawned a whole variety of problems that can
result from spam messages. The most obvious is exposure to obscene content.
However, there are many other possibilities that you probably never thought
of. The more sneaky tricks used by spammers in HTML email messages include
image references that encode your email address and executable content
that
phones home.
Both techniques are used by spammers to try to validate your email address.
View the content preview in section #2 of
this sample spam message for an example. The spammer doesn't even try
to disguise what they are doing with the message as witnessed by:
If you were set to view HTML email and you received this message, you
would have just verified to the spammer that the message was sucessfully
delivered and guaranteed that you'd receive even more spam in the weeks to
come.
Viruses also take advantage of HTML email to attempt to infect your
computer system.
As you can see, spam can be quite tricky. The configuration settings that
we've choosen are to protect you and your computer from
what it thinks is illegitimate and harmful email. The primary way it does
this is by including the original email message as an attachment to a new
message. Configured like this, you will not be viewing the HTML content
of the original spam message and not risking validation of your email
address or infection by a virus. The new message will contain information
to help you evaluate how and why the message was tagged as spam.
The SA tagged spam message will look like
this sample spam message. It contains four distinct sections as
marked in the sample and as described below.
Every message processed by SA will have some header
information added to the message. The information in the
header will not be readily viewable but will be useful in
filtering the spam to a junk mailbox.
X-Spam-Status:
A summary of whether or not SA thinks the message is spam,
the score the message received, the tests that passed on
the message, and the version of SA used to process the message.
X-Spam-Level:
A string of asteriks, '*', representing the score the message received.
X-Spam-Checker-Version:
The version of SA and the revision of the test file used. The
test file will be periodically updated to reflect new methods of
identifying spam much like virus definition files are updated to
catch new viruses that are released.
X-Spam-Flag:
Equal to YES on spam messages
The first three header values will exist on any
message processed by SA; the last, X-Spam-Flag
will only exist if a message is tagged as spam.
The content preview section gives you an easy method to review the
first few lines of text from the email message. Given the
subject, sender, and content
preview you should be able to determine if the message
was correctly or incorrectly tagged as spam.
A summary of the tests that passed with a description and point
value. You can use the information here to help tune SA to your
needs... increasing and decreasing point values on tests to help
eliminate
false positives and catch
false negatives.
The original email message is included as an attachment. Use this
as a means of viewing messages that are
false positives. This section
usually includes a disclaimer stating the message may be unsafe to
open.
UPDATE: Configuration of various email clients (Eudora, Outlook,
etc) varies from client-to-client. One option users can set is to
view text attachments within the body of an email message. If
you have this enabled most spam messages will be included in full in
this section.
Let's Get This Spam Outta-Here! A filtering tutorial.
This tutorial uses terminology and screenshots from Eudora since the
majority of Engineering Administration uses Eudora as an email client.
However, the basic concepts presented here are universal and should
be applicable within any email client.
The engineering administration mail server tags messages it thinks is
spam as described in the previous section. These changes and additions
to the email message can then be used by your email client to remove the
messages from your INBOX. We recommend that you filter the messages to
a new mailbox so that you can later review them for
false positives. By doing this you free yourself from immediately having
to weed-out illegitimate messages in your inbox and can save that task for
a later time.
Create a new mailbox for spam messages.
A.
From the Mailbox selection on the menubar chooseNew.
B.
In the New Mailbox dialog, type a name for a new mailbox and
click the OK button. We've choosen the name Spam.
C.
A new mailbox will now be displayed in the Mailboxes pane on the
left side of the Eudora program window
Create the filter.
A.
From the Tools selection on the menubar chooseFilters.
B.
In the Filters window, click the New button to begin editing a new filter.
C.
Now click in the Header (#1) section of the Filters window and type: X-Spam-Level:.
D.
Click in the next text box (#2) and type: ****** MORE INFO
E.
In the Actions area at the bottom, click the down-arrow (#3) on the first selection box
and choose Transfer To (#4) at the bottom of the menu.
F.
Click on the button that is labeled In (#1) and choose the
mailbox you created from the selection list (#2).
G.
Close the Filters window and choose Yes when asked if
you want to save your changes.
Test the filter.
Send yourself the following message (just copy and paste it into a
new mail message). This message contains a lot of the keywords that
will ensure it scores high on the SA tests.
However, the server is configured to not scan local
email with SpamAssassin... so if you just send this to yourself using
your Engineering email account it will not be tagged. You will need to
send this to your Engineering email account from some other mail service
like Gatorlink, Yahoo Mail, Gmail, or Hotmail.
This message is not spam!
YOU WON'T GET RICH, YOU WON'T GET TWENTY PERCENT OF $3,200,000.00 (THREE
MILLION, TWO HUNDRED THOUSAND U.S. DOLLARS), AND NONE OF YOUR BODY PARTS
ARE LIKELY TO GET LARGER. BUT YOU CAN HELP TEST SPAMASSASSIN AND SPAM
FILTERING ON OUR NEW MAIL SERVER. THAT'S BETTER THAN VIAGRA!!! BETTER
THAN A MORTGAGE APPLICATION AND A FAST RE-FI!!! BETTER
THAN FREE! FREE!! FREE!!! WEBSITE ACCESS! BETTER THAN $$$ IN YOUR
MAILBOX (well, maybe I'm getting carried away). JUST BE SURE TO HANDLE
THIS TRANSACTION IN CONFIDENCE.
My wife, Jody told me that testing a spam filter was a great thing to
do. But be careful to copy each name on the list below exactly.
This email was sent in compliance with a law that doesn't exist, but
that would have made it legal to send spam if you put this notice at the
bottom of it.
If the filter is working properly you should now have your test message
in the new mailbox. If the filter is not working properly, verify that
you saved the filter and typed in the choices exactly as described in
this tutorial.
I'm Still Getting Spam, What Now?
This solution will not capture 100% of all spam messages. In fact, we
deliberately increased the threshold on SA so that there would be less possibility
of false positives. This means you will continue to get spam... we just hope
that it will be dramatically less. There are a few options for trying to
better recognize and filter spam explained below.
The first, and easiest, method would be to filter messages that score lower
on the SA tests. Every message, regardless if SA tags it as spam or not, is
given a score by SA and includes the X-Spam-Level, X-Spam-Status,
and X-Spam-Checker-Version headers. You can choose to filter messages
that score a five or four by SA by changing your filter to that number of asterisks.
This is a very simple change that you can do without any change of server settings.
Described previousily was something very simple to do that would require little
knowledge and investigative work on your part. However, it would result in a larger
number of false positives. The more appropriate way to tune SA is through
blacklisting and changing the scoring of certain tests. Both of these methods require
establishing patterns of behaviour and identical characteristics of your spam messages.
Blacklisting refers to automatically tagging messages as spam
based on the sender of the message. This is not entirely effective because
most spammers use randomly generated
email addresses in the FROM and REPLY TO headers.
However, in instances where these addresses are consistently the same
it is trivial to
have SA automatically tag these messages as spam. If you have an address
you want blacklisted send it to mis@eng.ufl.edu.
The best method for tuning SA is to weight tests differently from their defaults
based on patterns you see in your incoming email messages. As you become familiar
with the SA tests through reading the SA content analysis and the X-Spam-Status
header you will begin to see what types of tests are consistently scoring in
your spam message and are missing from legitimate mail. Once you identify a test
you want to change, send the test name and the new score to mis@eng.ufl.edu.
As an example of tuning SA through the weight given to specific tests, look at the
sample spam message again. After viewing several spam messages and several
legitimate messages I notice that I never receive legitimate email that is a large
part HTML. I would then give the HTML_60_70 and MIME_HTML_ONLY
tests higher scores making them have a heavier weight in determining a
message is spam. The default score for both tests are 0.1, I may want to score them
as high as 1.0 or even higher depending on how confident I am that I never receive
HTML email.
Hey! That Wasn't Spam!
Again, SA is not always accurate. It will generate false positives and
tag messages as spam that really aren't. The goal is to try to minimize this.
This is accomplished using the same concepts described in the previous section.
However, just changing your filter to seven or eight asterisks isn't good enough
because the messages are still tagged as spam. As a result, you need to use
the more advanced techniques of whitelisting (the oposite of blacklisting)
and score changing.
The most useful technique in tuning SA to not tag messages as spam will
undoubtably be whitelisting. This refers to creating a list of email
addresses that SA will never tag as spam. (They basically get a free ride
through the system.) If you receive legitimate email from someone or some
service that are routinely tagged as spam, forward the email address to
mis@eng.ufl.edu and ask us to whitelist
it for you.
A good example where whitelisting is the solution is internet newsletter
or mailing list services. Many of these newsletters use HTML email, contain
ads from sponsors, and have information about unsubscribing from the newsletter
(of which all of these are tests that SA use to tag spam). Because the
newsletter always comes from the same sender we can whitelist the address
and the newsletter will get a free pass through the system.