Science & Engineering Node Services — UB Engineering / Natural Sciences & Mathematics — University at Buffalo
Managing SENS Email Filters
Topics: Introduction | Set Up Basic Filtering | Solving Problems | Teach the Filters | Customize Your Filters | Useful Links
Introduction
With the SENS Bulk Email Filters, mail that is recognizable as "bulk mail" — for example, mail that is sent to a larger-than-normal number of users — is caught and deleted. Filtering options range from basic bulk mail filtering to much more specific filters that you write yourself.
SENS uses two different filtering systems: The Distributed Checksum Clearinghouse, or DCC, and SpamAssassin. Each uses different methods to detect bulk email, and they make an effective combination.
There are two parts to the process: Bulk email detection, which is done by programs running on the mail server computer systems, and handling of detected bulk email, which is done by individual users based on rules defined for a program named Procmail. Experienced users can customize how bulk email is handled, but for most users the rules we've defined for them should be sufficient.
Please note that these filters are only intended for users who use the SENS email system to read email, and not for those who have their email forwarded to other accounts, as setting them will remove any forwarding that is currently in place for such users.
How to Set Up the Basic Bulk Mail Filter
Some of the following steps require a basic knowledge of how to log in to the SENS Unix timeshare systems. If you are not sure if your computer can do this, please contact SENS.
- Connect to the unix.eng.buffalo.edu or unix.nsm.buffalo.edu SENS Unix timeshare systems using a commmunications program such as "ssh" or "telnet" (ssh is preferred for security reasons), and log in using your UBitname and SENS password.
- At the command prompt (">" or "%") enter the command mkfilter. This command will create the files necessary to enable filtering for your account. It may ask several questions, and for most users it is OK to go ahead and answer them with "yes". For example, if you had previously used the mail filters, it will ask you if you want to rename your existing ".procmailrc" file, and it's OK to say "yes" and not worry about it any further unless you've made your own changes to it. However, if you currently use your ".forward" file to forward your mail to another account, or to perform special processing on your mail, you may need to consult with SENS about the proper way to set that file. You may log out of the timeshare when this command has completed.
- As part of the previous step, a mail folder named "IN.BULK" was created under your "Mail" folder. From now on, all mail that is recognized as bulk mail will go directly into that folder.
- Every Saturday night, all messages in the IN.BULK folder will be moved to the folder IN.BULK.lastweek. This will overwrite the previous contents of the latter folder, which means that bulk mail in it will be permanently deleted every week. Users should check both folders weekly to make sure that valid messages are not being sent to them.
- If you are using an email client such as Microsoft Outlook, Pine, Netscape, Mozilla, Thunderbird, Mulberry, etc., you may need to subscribe to the IN.BULK and IN.BULK.lastweek folders. Use the help feature of your particular client for instructions on subscribing to folders.
- REMEMBER: the mail in your IN.BULK folder, your IN.BULK.lastweek folder, and any folders you set up as part of another filter are included in your quota. Although the messages in the IN.BULK.lastweek folder are overwritten and deleted every week, it's still possible to accumulate a large amount of mail in a short time. In order to keep from exceeding your quota, it's a good idea to check these folders periodically and delete any unwanted messages.
- If you subscribe to mailing lists that do not come from the buffalo.edu domain: Most of these messages will be sent to your IN.BULK folder unless you create a separate filter for them. Instructions for doing this are listed below.
Solving Problems
Here are some problems you may experience when setting up bulk email filters.
- Running the command mkfilter returns the error "Command not found":
- To solve this problem, enter the command use update. Then log out of the timeshare system and log back in. After you've logged back in you should be able to run the mkfilter command.
- I'm still getting bulk email in my inbox:
- Unfortunately, the filters are not perfect. Spammers are clever, and are constantly finding new ways to fool automated filtering systems. There are ways to teach the filters about new kinds of bulk email; please see the next section for more information.
- I'm not seeing messages I expect to be receiving:
- It is really important that you check your "IN.BULK" and "IN.BULK.lastweek" folders on a regular basis. The filters can sometimes get "false positives" on a piece of valid mail, and treat it as bulk email when it really isn't. This can be especially true of messages from mailing lists and automated messages from online services.
- "mkfilter" created a folder named "Mail", but I use a folder named "mail":
- Your mail files live on a Unix system, where file and directory names are case-sensitive ("Mail" and "mail" are two different names on such systems). The convention for many Unix-based email programs is to use "Mail" as the name of the folder in which you store email, but some other email clients use "mail". If you are already using a folder named "mail" (lower-case "m"), Unix links will be created from the "Mail" folder (capital "M") to the "mail" folder so that the "IN.BULK" and "IN.BULK.lastweek" mail folders will appear to be in there as well (they are the same files, just pointed to from two different locations).
- I was using custom Procmail rules and now they're gone:
- When you ran the command mkfilter, your existing ".procmailrc" file was renamed ".procmailrc.pre-bulk-YYYYMMDD" (where "YYYYMMDD" represents the date on which you ran the command). Your old rules should be still in there, and you can cut-and-paste them to your new file using the text editor of your choice. The comments in the new file give suggestions on where to put certain kinds of rules.
- I used the vacation program, and now I'm getting bulk email again:
- Unfortunately, both the bulk email filters and the vacation system make use of the same file in your home directory, ".forward", to redirect your incoming mail to programs designed to process those messages. We are currently looking at ways to work around this problem, but for now you can contact SENS and we will help you reset your bulk email filters.
Teaching the Filters About New Bulk Email
The SpamAssassin filters use a method known as Bayesian filtering to detect some kinds of bulk email messages. These filters build tables in a hidden directory in your home directory based on the kinds of bulk email they have found. If you get a bulk email message in your inbox, you can save it to a file in your SENS home directory (for example, "newspam"), and run this command on a SENS Unix timeshare system:
% sa-learn --spam < newspam
If the above command does not work, then you probably do not have the "/util/perl/bin" directory in your search path, and probably need to run the command "use update". After running that command you will need to either log out and log back in, or type the command "rehash". Please contact SENS if you need help correcting this problem.
How to Customize Your Mail Filters
When you ran the command mkfilter, a file called .procmailrc was created in your home directory. That file is the basis of all of your email filtering, because it gives specific instructions to procmail, a mail processing program that is used to automatically process and deliver incoming mail messages. In order to customize your mail filters, you will need to edit your .procmailrc file.
Editing Your .procmailrc File
Procmail is a sophisticated filtering tool. We suggest you read the manual page for procmail, and also for the .procmailrc file that lives in your home directory and which you are going to be editing if you wish to make changes to the behavior of procmail. On a SENS Unix timeshare system, you can type man procmail and man procmailrc to view these man pages, respectively. There is also a page of examples that you can view by typing man procmailex.
Extreme caution is advised if you are going to edit your .procmailrc file, because if you make a mistake you could lose email.
- Log in to a SENS Unix timeshare system. You should be in your home directory, but if not, type "cd to get there.
- If you'd like, you can check to make sure the file is already in your home folder. Note that because the file name begins with a dot (.), it is hidden from normal directory listings. You can view hidden files by using the command ls with the -a flag:
% ls -a - Open the file with your favorite UNIX text editor (e.g., pico, nano, emacs, vi). For this example, we're using vi:
% vi .procmailrc - Make your changes, and save the file.
Procmail Variable Settings
The beginning of the .procmailrc file defines the variables used by the procmail program:
SHELL=/bin/sh PATH=/eng/local/bin:/usr/local/bin:/util/bin:/usr/ucb:/usr/bin VERBOSE=off ORGMAIL=/var/mail/$USER MAILDIR=/[your home directory]/Mail # This directory must exist LOGFILE=$MAILDIR/procmail.log # recommended LOGABSTRACT=no
In general, we discourage changing any of these lines, unless you are experienced in using and debugging Procmail.
Note that variables such as ORGMAIL above are referenced in subsequent recipes by placing a dollar sign ($) in front of them. There is also a built-in variable named $USER that represents your UBitname, so, for example, if your username is user123, the directory that is represented by $ORGMAIL is /var/mail/user123.
Procmail Bulk Email (Spam) Recipes
The body of the document contains the rules for filtering incoming mail. Each line of each rule provides specific information (so yes, it matters what order you put them in).
Here's what the bulk filter section of the .procmailrc file contains when it's created:
############################################################# # # Start of Anti-Bulk Email rules: # # # Start of DCC rules: # # This checks to see if a message did NOT originate inside of buffalo.edu, # and if it passes that test it uses a regular expression to see if any of # the DCC checksum values is greater than 50. If both conditions are true # it puts the message in the ~/Mail/IN.BULK folder. # :0: *!^From:.*buffalo\.edu *X-DCC.*(Body|Fuz[1234])=([0-9]*[0-9][0-9][0-9]|[5-9][0-9]) IN.BULK # # This rule checks to see if DCC has seen the message so many times it # has given up counting and simply returns "many" for that checksum value. # Again, if the message did not originate in buffalo.edu and one of the # checksum values is "many" it gets placed in the ~/Mail/IN.BULK folder. # :0: *!^From:.*buffalo\.edu *X-DCC.*(Body|Fuz[1234])=many IN.BULK # # End of DCC rules. # # # Start of SpamAssassin rules (for more info, check the web page # http://www.stearns.org/doc/spamassassin-setup.current.html#procmail). # Note that we only process messages less than 250KB so that large # messages don't choke SpamAssassin. # :0fw: spamassassin.lock * < 256000 | /usr/local/bin/spamc # # Ten asterisks or more is almost definitely Spam: # :0: * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\* IN.BULK # # This is based on our mail system's threshhold rule: # :0: * ^X-Spam-Status: Yes IN.BULK # # End of SpamAssassin rules. # # # End of Anti-Bulk Email rules. # #############################################################
And now for some explanation:
- The first line lets procmail know that a rule is being defined. Use it as written (a zero between two colons).
- The second line tells procmail what to look for.
- The third line tells procmail where to send any mail that matches the criteria defined in the second line.
In addition to the rules that are already in your .procmailrc file, you can add your own to make sure that mail that you want is not sent to the "IN.BULK folder. We've included an example in the .procmailrc file which is commented out (on a line, anything after a pound sign ("#") is ignored):
#:0: #* ^From: .*amazon\.com #$ORGMAIL
The first line, with the ":0:" text, denotes the start of a new recipe. The second line is the condition under which this recipe will be used; in this case, it will look in the "From:" field of an email message for the text "amazon.com". If it finds a match, the action in the third line of the recipe is performed, which in this case simply delivers the message to your inbox.
You can have as many recipes as you want, and can do interesting things with them, such as having them deliver messages from certain places to specific mailboxes in your "Mail" directory.
When you're finished editing your .procmailrc file, save it. You can always go back later and fine-tune it if it isn't doing what you want it to do.
There are some good online resources for configuring procmail listed in the "Useful Links" section.
If you need further assistance, please email nodehelp@eng.buffalo.edu.
Useful Links
Procmail:
- Procmail home page (very old)
- A Procmail FAQ
- Mail Filtering with Procmail
- Procmail Mini-Tutorial from Linux Gazette
- Mail Management with Procmail from DevShed
SpamAssassin:
- SpamAssassin home page
- SpamAssassin FAQ
Distributed Checksum Clearinghouse (DCC)
- Distributed Checksum Clearinghouse home page
- DCC FAQ
University at Buffalo - State University of New York

