Gator Engineering Wordmark
University of Florida Wordmark
 
 
College Home
Finance & Personnel Office
Administration Fiscal Office
Budget & Planning
Contract & Grants Accounting
Personnel & Payroll
Engineering Processing
Management Information Systems

About Us
Staff, Mission, Goals, Contact Info...
  Services
Project Request, Support Agreement...
  Forms / Checklists
Account, Entrance, Exit...
Software Licenses
MSDNAA, Microsoft, Labview, Novell...
  Resources
Training, Web, Email, FAQ, DNS...
  Security
Policy, Unit ISM List...

MIS / Resources / Email / Filtering Spam

Fed up with wasting time sorting through your mailboxes to find legitimate mail?

Scared to open ambiguous messages because
you're afraid of what they may contain?

The solution is here...
FILTER IT!

FORGET THE ARTICLE, TAKE ME TO THE FILTERING TUTORIAL!

by Shawn C Lander on August 21, 2003
Printer Friendly

It's a new fact of life... the fight against spam will never end. For several years we have been actively persuing and implementing methods of blocking spam. For several years we have been actively working on blocking spam. Because of the increased amount of spam you've recieved, you might not have realized we were doing anything.

Until now, all the resources for blocking spam were concentrated on preventing delivery of spam. However, the most effective techniques for doing this, blocking email delivery from specific foreign countries, are not feasible within our working environment. (The Offices of Academic Affairs and Student Affairs need to be able to receive email from anyone residing in any country.) Recognizing the past approach wasn't working, we are now trying something new.

We'll tell you how likely we think an email message is to be spam and you can decide what to do with it.

The email messages that get through the installed spam blocks will be processed and tested for spam by a program called SpamAssassin (SA). SA is one more tool in the arsenal against spam. Hopefully, after a little tuning, it will be the tool that enables you to put the final nails in the spam coffin.

What follows is a brief description of what SA is, how it works, how you can use it to remove spam from your inbox, why we choose to configure it the way we do and how you can customize its configuration. After reading through this and using the filtering tutorial, hopefully email will once again become a useful tool instead of a daily nuisance.

Spam-A-What?
A guide to how SpamAssasin works.

SpamAssassin (SA) is an email filter used to identify spam. It currently uses over 900 rule based tests on email headers and body text to identify and tag spam. The spam-identification tactics used include:
  • Header Analysis: spammers use a number of tricks to mask their identities, fool you into thinking they've sent a valid email, or fool you into thinking you must have subscribed to their list. SA tries to spot these tricks.
  • Text Analysis: again, spam emails often have a characteristic style and some characteristic disclaimers. How many times have you read an email where you'll receive ONE MILLION US DOLLARS if you act now and don't delay from something that is legitimate email from some organization where you either registered to be on this list or opted-in at a sponsored website but you can be removed from this list at any time? SA can spot these too.
  • Blacklists: SA supports many useful blacklists such as mail-abuse.org, ordb.org, and others.
  • RAZOR: a collaborative spam-tracking database, which works by taking a signature of spam emails. Since spam typically works by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to a database -- at which point everyone else will block it.

Every test that SA uses has a corresponding positive, this message is likely spam, or negative, this message is not likely to be spam, score associated with it. For example, if the message mentions viagra anywhere in the subject or body almost three points will be added to the final test score. Conversely, if the message originates from Eudora or Outlook mail clients, clients not typically used by spammers, up to a half a point can be removed from the final score. If the total score for all tests is over a set threshold the message is tagged as spam.

The default threshold that SA uses is five. However, in our testing we discovered a setting that low resulted in a number of false positives. As a result, we increased the threshold to six.

At this setting, however, there is more likelihood that actual spam will not be properly tagged. This is a trade-off that will need to be tuned over time as we become more familiar with SA and the needs/desires of our customers.
View
the complete list
of tests and
default scores.

Messages that pass the threshold are then changed in significant ways in order to protect you and your computer as well as provide easy ways for your email client to recognize and filter the message to a junk mailbox.

So That's Why My Message Looks Like That.
An explanation of the changes made by SpamAssassin

The advent of HTML email has spawned a whole variety of problems that can result from spam messages. The most obvious is exposure to obscene content. However, there are many other possibilities that you probably never thought of. The more sneaky tricks used by spammers in HTML email messages include image references that encode your email address and executable content that phones home.

Both techniques are used by spammers to try to validate your email address.

View the content preview in section #2 of this sample spam message for an example. The spammer doesn't even try to disguise what they are doing with the message as witnessed by:

URI:http://198.66.203.224/Track/track_hillary2.htm
URI:http://www.newsmax.com/popunders/hillaryscheme.jpg?||fpy^hsy(rqh||

If you were set to view HTML email and you received this message, you would have just verified to the spammer that the message was sucessfully delivered and guaranteed that you'd receive even more spam in the weeks to come.

Viruses also take advantage of HTML email to attempt to infect your computer system.

As you can see, spam can be quite tricky. The configuration settings that we've choosen are to protect you and your computer from what it thinks is illegitimate and harmful email. The primary way it does this is by including the original email message as an attachment to a new message. Configured like this, you will not be viewing the HTML content of the original spam message and not risking validation of your email address or infection by a virus. The new message will contain information to help you evaluate how and why the message was tagged as spam.

The SA tagged spam message will look like this sample spam message. It contains four distinct sections as marked in the sample and as described below.

DESCRIPTION
1

HEADER:

Every message processed by SA will have some header information added to the message. The information in the header will not be readily viewable but will be useful in filtering the spam to a junk mailbox.
X-Spam-Status: A summary of whether or not SA thinks the message is spam, the score the message received, the tests that passed on the message, and the version of SA used to process the message.
X-Spam-Level: A string of asteriks, '*', representing the score the message received.
X-Spam-Checker-Version: The version of SA and the revision of the test file used. The test file will be periodically updated to reflect new methods of identifying spam much like virus definition files are updated to catch new viruses that are released.
X-Spam-Flag: Equal to YES on spam messages

The first three header values will exist on any message processed by SA; the last, X-Spam-Flag will only exist if a message is tagged as spam.

2

CONTENT PREVIEW:

The content preview section gives you an easy method to review the first few lines of text from the email message. Given the subject, sender, and content preview you should be able to determine if the message was correctly or incorrectly tagged as spam.
3

CONTENT ANALYSIS:

A summary of the tests that passed with a description and point value. You can use the information here to help tune SA to your needs... increasing and decreasing point values on tests to help eliminate false positives and catch false negatives.
4

ATTACHMENT:

The original email message is included as an attachment. Use this as a means of viewing messages that are false positives. This section usually includes a disclaimer stating the message may be unsafe to open.

UPDATE: Configuration of various email clients (Eudora, Outlook, etc) varies from client-to-client. One option users can set is to view text attachments within the body of an email message. If you have this enabled most spam messages will be included in full in this section.
Let's Get This Spam Outta-Here!
A filtering tutorial.
This tutorial uses terminology and screenshots from Eudora since the majority of Engineering Administration uses Eudora as an email client. However, the basic concepts presented here are universal and should be applicable within any email client.

The engineering administration mail server tags messages it thinks is spam as described in the previous section. These changes and additions to the email message can then be used by your email client to remove the messages from your INBOX. We recommend that you filter the messages to a new mailbox so that you can later review them for false positives. By doing this you free yourself from immediately having to weed-out illegitimate messages in your inbox and can save that task for a later time.

  1. Create a new mailbox for spam messages.
    A. From the Mailbox selection on the menubar chooseNew. [Menu Mailbox Pic]

     

    B. In the New Mailbox dialog, type a name for a new mailbox and click the OK button. We've choosen the name Spam. [New Mailbox Pic]

     

    C. A new mailbox will now be displayed in the Mailboxes pane on the left side of the Eudora program window [Mailbox Pane]
  2. Create the filter.
    A. From the Tools selection on the menubar chooseFilters. [Menu Filters Pic]

     

    B. In the Filters window, click the New button to begin editing a new filter. [New Filter Pic]

     

    C. Now click in the Header (#1) section of the Filters window and type:
    X-Spam-Level:.
    []
    D. Click in the next text box (#2) and type: ******
    MORE INFO
    E. In the Actions area at the bottom, click the down-arrow (#3) on the first selection box and choose Transfer To (#4) at the bottom of the menu.

     

    F. Click on the button that is labeled In (#1) and choose the mailbox you created from the selection list (#2). []

     

    G. Close the Filters window and choose Yes when asked if you want to save your changes.
  3. Test the filter.

    Send yourself the following message (just copy and paste it into a new mail message). This message contains a lot of the keywords that will ensure it scores high on the SA tests.

    However, the server is configured to not scan local email with SpamAssassin... so if you just send this to yourself using your Engineering email account it will not be tagged. You will need to send this to your Engineering email account from some other mail service like Gatorlink, Yahoo Mail, Gmail, or Hotmail.

    This message is not spam!

    YOU WON'T GET RICH, YOU WON'T GET TWENTY PERCENT OF $3,200,000.00 (THREE MILLION, TWO HUNDRED THOUSAND U.S. DOLLARS), AND NONE OF YOUR BODY PARTS ARE LIKELY TO GET LARGER. BUT YOU CAN HELP TEST SPAMASSASSIN AND SPAM FILTERING ON OUR NEW MAIL SERVER. THAT'S BETTER THAN VIAGRA!!! BETTER THAN A MORTGAGE APPLICATION AND A FAST RE-FI!!! BETTER THAN FREE! FREE!! FREE!!! WEBSITE ACCESS! BETTER THAN $$$ IN YOUR MAILBOX (well, maybe I'm getting carried away). JUST BE SURE TO HANDLE THIS TRANSACTION IN CONFIDENCE.

    My wife, Jody told me that testing a spam filter was a great thing to do. But be careful to copy each name on the list below exactly.

    Click below to be removed from this list.

    To be removed from this list, send email to remove_me@hotmail.com

    This email was sent in compliance with a law that doesn't exist, but that would have made it legal to send spam if you put this notice at the bottom of it.

    If the filter is working properly you should now have your test message in the new mailbox. If the filter is not working properly, verify that you saved the filter and typed in the choices exactly as described in this tutorial.

    I'm Still Getting Spam, What Now?

    This solution will not capture 100% of all spam messages. In fact, we deliberately increased the threshold on SA so that there would be less possibility of false positives. This means you will continue to get spam... we just hope that it will be dramatically less. There are a few options for trying to better recognize and filter spam explained below.

    The first, and easiest, method would be to filter messages that score lower on the SA tests. Every message, regardless if SA tags it as spam or not, is given a score by SA and includes the X-Spam-Level, X-Spam-Status, and X-Spam-Checker-Version headers. You can choose to filter messages that score a five or four by SA by changing your filter to that number of asterisks. This is a very simple change that you can do without any change of server settings.

    Described previousily was something very simple to do that would require little knowledge and investigative work on your part. However, it would result in a larger number of false positives. The more appropriate way to tune SA is through blacklisting and changing the scoring of certain tests. Both of these methods require establishing patterns of behaviour and identical characteristics of your spam messages.

    Blacklisting refers to automatically tagging messages as spam based on the sender of the message. This is not entirely effective because most spammers use randomly generated email addresses in the FROM and REPLY TO headers. However, in instances where these addresses are consistently the same it is trivial to have SA automatically tag these messages as spam. If you have an address you want blacklisted send it to mis@eng.ufl.edu.

    The best method for tuning SA is to weight tests differently from their defaults based on patterns you see in your incoming email messages. As you become familiar with the SA tests through reading the SA content analysis and the X-Spam-Status header you will begin to see what types of tests are consistently scoring in your spam message and are missing from legitimate mail. Once you identify a test you want to change, send the test name and the new score to mis@eng.ufl.edu.

    As an example of tuning SA through the weight given to specific tests, look at the sample spam message again. After viewing several spam messages and several legitimate messages I notice that I never receive legitimate email that is a large part HTML. I would then give the HTML_60_70 and MIME_HTML_ONLY tests higher scores making them have a heavier weight in determining a message is spam. The default score for both tests are 0.1, I may want to score them as high as 1.0 or even higher depending on how confident I am that I never receive HTML email.

    Hey! That Wasn't Spam!

    Again, SA is not always accurate. It will generate false positives and tag messages as spam that really aren't. The goal is to try to minimize this. This is accomplished using the same concepts described in the previous section. However, just changing your filter to seven or eight asterisks isn't good enough because the messages are still tagged as spam. As a result, you need to use the more advanced techniques of whitelisting (the oposite of blacklisting) and score changing.

    The most useful technique in tuning SA to not tag messages as spam will undoubtably be whitelisting. This refers to creating a list of email addresses that SA will never tag as spam. (They basically get a free ride through the system.) If you receive legitimate email from someone or some service that are routinely tagged as spam, forward the email address to mis@eng.ufl.edu and ask us to whitelist it for you.

    A good example where whitelisting is the solution is internet newsletter or mailing list services. Many of these newsletters use HTML email, contain ads from sponsors, and have information about unsubscribing from the newsletter (of which all of these are tests that SA use to tag spam). Because the newsletter always comes from the same sender we can whitelist the address and the newsletter will get a free pass through the system.

    Additional Information.

    Additional information about working with the email server can be found on the Email Server Documentation page. This includes pages on frequently asked questions (FAQs), using the webmail client, and a list of installed utilities on the server.

    You can, also, always contact MIS at 392-9217 or mis@eng.ufl.edu for information.

  Phone: (352) 392-6000
Fax: (352) 392-9673
College of Engineering
300 Weil Hall, PO Box 116550
Gainesville, FL 32611-6550
 

Last Modified: Friday, 08-Aug-2008 15:24:41 EDT