PDA

View Full Version : To be continued - Controlling SPAM in my domains


jacauc
02-09-2006, 09:58 PM
Hi,

It is a bit difficult now to continue on a thread that doesn't exist anymore, but i'd like to recreate this thread for people to share best practices on controlling SPAM and setting up the Scores properly in SPAMASSASSIN.

In the old thread, I had a user by the name of richard (he still has to join the new forums again though) who discussed some methods in using the "sa-learn" command in SSH to teach the bayesian filters about uncaught spam.

If i remember correctly, the method he described was to create a folder under your mailbox called "uncaught" or something similar, and moving all SPAM messages that slipped through spamassassin in this folder.

The, to make SA aware of this, you run the command:

sa-learn -D --mbox --spam ~/mail/yourdomain.com/yourmailbox/uncaught

After running this command from SSH, SA will know about these messages that it let through, and eventually learn to mark these with a higher spam score.

richard can run this command on his bluehost account with no problems, but unfortunately, when I run this command on box77, I get some errors:

[31364] dbg: metadata: X-Spam-Relays-Trusted:
[31364] dbg: metadata: X-Spam-Relays-Untrusted:
[31364] dbg: message: ---- MIME PARSER START ----
[31364] dbg: message: main message type: text/plain
[31364] dbg: message: parsing normal part
[31364] dbg: message: added part, type: text/plain
[31364] dbg: message: ---- MIME PARSER END ----
[31364] dbg: message: no encoding detected
[31364] dbg: bayes: DB_File module not installed, cannot use bayes
Learned tokens from 0 message(s) (1 message(s) examined)
ERROR: the Bayes learn function returned an error, please re-run with -D for more information

Can anyone else confirm if this is working on their accounts, and how to enable the particular DB_File module?

And continue sharing new ideas please :D
Thanks!
Jacauc

richard
02-10-2006, 10:34 AM
That's a good summary of where the thread was, jacauc. Me and DB_File.so are on box62. I guess I just lucked into a good one. Maybe you should take this direct to a support ticket?

jacauc
02-12-2006, 02:58 AM
OK, logged a ticket.. they might be able to help me.

Richard,
Is it necessary to run this command on my "spam" folder too (to tell SA that it has correctly identified spam - kinda making it more confident for next time :D )

And also, should I run it with the Ham option on my inbox?

Cheers
J

Mistywindow
02-12-2006, 09:32 AM
Most of you will be aware of this, but a surprising number aren't.
A lot of incoming garbage can be avoided by changing your IE privacy cookie control settings from the Windows default setting, which is not tough enough.
Either change it to Medium High at least:

http://www.mistywindow.com/security/security-images/tutori2.jpg

or click on the advanced tab and block all 3rd party cookies:

http://www.mistywindow.com/security/security-images/tutori3.jpg

Firefox doesn't give you as much control but you can check "For the originating site only" in the Options dialogue box.

richard
02-12-2006, 08:10 PM
OK, logged a ticket.. they might be able to help me.

Richard,
Is it necessary to run this command on my "spam" folder too (to tell SA that it has correctly identified spam - kinda making it more confident for next time :D )

And also, should I run it with the Ham option on my inbox?

Cheers
J

I would just answer flatly, but since we already know our boxes are somewhat different, I'll tell you how to check:

Look at the X-Spam-Status header of a spam message. Near the end of the line you should see "autolearn=spam". If you do, then spam messages are automatically being routed through the spam learner. Also, inspect similarly a clean message and check for "autolearn=ham". That means it was learned as good stuff.

If you see something else, time for another support ticket! If you see what I've described, all you ever have to do is feed sa-learn the mistakes with the appropriate choice of "--spam" or "--ham". That will undo the autolearn damage and learn the message(s) in the proper category.

Please let me know how support resolves your DB_File issue. Just curious.

jacauc
02-12-2006, 09:20 PM
Ok, got it! :D

Got a reply on my ticket on saturday stating:
I will forward this request to our tech2 group. They will be getting back to you as soon as they can.

Will wait for this one to resolve before going ahead. (might be related issues.) I Did quote this thread in the ticket though, so they'll see this.

I also noticed on mattheaton.com that they will be implementing a proper spam filtering system soon. :D YAY!

Shot!
J

MikeKieffer
02-13-2006, 08:55 AM
One thing I have done is create a php script that goes through my inbox and removes spam several times an hour. The script is really basic, and works with the ability that Spam Assassin has to rewrite the messages header. I have it put a [SPAM] tag at the beginning of the subject line, and then the script deletes messages that have that tag.

It is not a very sophisticated script, but it does the job. If you are interested in using the script, you can download it by going to http://www.lakemicro.com/spam_killer.php

In the future, I will be modifying the script to go off of the header information instead of the subject line. But that is something that is on the "Will do when I have time" list.

jacauc
02-13-2006, 09:09 AM
Well, isn't this the same as going into "email filtering" from the control panel, and setting up a filter like this:

To filter all mail that SpamAssasin has marked as spam, just choose "SpamAssassin Spam Header", "begins with", and then enter "Yes" in the box.

That'll do the same job right?

Cheers!
J

MikeKieffer
02-13-2006, 01:21 PM
I did not know that was there... Thanks for bringing it to my attention.

jacauc
02-13-2006, 09:35 PM
Guess thats what makes this forum a great place :D

jacauc
02-15-2006, 09:51 PM
Excellent, I can now run the sa-learn command successfully.

Apparently it was the DB_File module that became corrupt on my box as well is in my profile. Support fixed it and all is well now :D:D

Cheers
J

nhgpga.org
03-24-2006, 07:26 PM
I have been getting alot of spam that is not being caught by the filters.... But I dont fully understand how to teach SA what is spam and what is not... could you please start at step 1 and explain it a bit slower for the rest of us.... :confused: do I need to get my mail via the web interface to move it? What if I am pulling it via pop?

Thanks

userwaldo
03-27-2006, 04:54 PM
I'm also interested in sa-learn, and found this. It looks like sa-learn isn't too hard to use, but I'll have to try it.

http://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html

richard
03-27-2006, 08:31 PM
It's pretty simple. I use imap, and I file spam that slips through in a mailbox named Junk. I run this cron job once a week (mailing myself the rather verbose output):

sa-learn -D --mbox --spam mail/mydomain.com/myaddress/Junk
cat </dev/null >mail/mydomain.com/myaddress/Junk

userwaldo
03-28-2006, 06:54 AM
Does anyone know how to enable the Spam Box? For some reason it is marked as disabled, and the button to enable the Spam Box is missing, and nothing seems to be happening, even though I've done my training.

Found the problem. I needed to setup the Email Filtering to trigger off of the SpamAssassin header and move the mail to a discard or elsewhere as desired.

userwaldo
03-28-2006, 01:50 PM
It's pretty simple. I use imap, and I file spam that slips through in a mailbox named Junk. I run this cron job once a week (mailing myself the rather verbose output):

sa-learn -D --mbox --spam mail/mydomain.com/myaddress/Junk
cat </dev/null >mail/mydomain.com/myaddress/Junk


my recomendation would be to nice down the process as to keep the load down on the server.

nice -n 19 sa-learn -D --mbox --spam mail/mydomain.com/myaddress/Junk

jansportw
03-28-2006, 10:51 PM
Does anyone know how to enable the Spam Box? For some reason it is marked as disabled, and the button to enable the Spam Box is missing, and nothing seems to be happening, even though I've done my training.

You can also enable the spam box by giving blue host a call (they disabled activating it from the web site).
Jansen

MrGibbage
04-22-2006, 06:53 AM
Resurrecting an old thread here, but it's a good one. How do you go about using sa-learn? How do you pipe your missed spams and ham messages that were classified as spam through sa-learn?

Those of you that are deleting your SA-tagged spam (rather than refiling it), how confident are you that you aren't deleting HAM? I haven't had SA misfire on a HAM in a while, but I guess the fear is still there. I guess I am getting tired of going through my spam inbox and don't really see it as that much of a time saver over seeing it in my regular inbox.

greg_at_drogens
09-27-2006, 12:29 PM
I'm curious if there's a way to use MY spam as fodder for the ENTIRE domain in training using sa-learn?

Also, is there a way to upload my already-popped emails from Outlook BACK to the server for sa-learn training?

Thanks for your help!

Greg

jansportw
12-13-2006, 04:19 PM
Some of this may be else where, but I thought it best to put it all in one spot for those of us that are not experts.

History and Relevance
This is for those with lots of people with different email addresses on one BlueHost account using different methods of checking and sending email. Spam Assassin (SA) uses Rules and The Bayesian classifier to identify spam. Training spam assassin will allow the Bayes to more intelligently mark your emails both ham and spam. This will allow your users to redirect mislabled messages to be properly learned by SA as either spam or ham.

Terms
You will need to change CAPITALIZED TERMS to make things work for you
USERNAME = Your Bluehost UserName
SITE.COM = You domain
"SPAM@SITE.COM" = Email to redirect spam to (Step 2&4)
"HAM@SITE.COM" = Email to redirect ham to (Step 2&4)

1. Setting Up Spam Assasin
First Turn On Spamassasin: cPanel->Email Manager->Spam Assassin-> "Enable Spam Assassin"
Next Turn on Bayes (auto learning for SA) "Configure Spam Assassin" -> check the box for "use-bayes".

Note: You may also want to tweak other settings like required score (I like 4.0), and the scores assigned to each test (go to: http://spamassassin.apache.org/tests_3_1_x.html for a list of all tests and the default values. After training SA and Bayes I made Bayes_40 and up worth more than their default amounts).

2. Set up Email Addresses for Spam and Ham
Create 2 email addresses. One for Spam one for Ham. I will call them "SPAM@SITE.COM" and "HAM@SITE.COM" for simplicity. But it is suggested that you don't use Ham@... since a spammer might guess to send spam to that address and that would cause Spamassassin to start learning spam as ham.
Users will redirect/resend miss marked messages to these email accounts and a cron job will alow SA to learn from them.

3. Setting up Cron Jobs so learning happens automatically
Set up 2 Cron Jobs, 1 to learn spam, 1 to learn ham. Spam assassin will learn tokens from all the messages at "SPAM@SITE.COM" & "HAM@SITE.COM" as spam or ham respectively.
sa-learn --spam /home/USERNAME/mail/SITE.COM/SPAM/*
where "SPAM@SITE.COM" is where you direct spam. Set it up to run one a day or once a week.
sa-learn --ham /home/USERNAME/mail/SITE.COM/HAM/*
where "HAM@SITE.COM" is where you direct HAM that got mislabeled as spam, or that you just want SA to learn as Ham. Set it up to run one a day or once a week.

OPTION Adding "-d" after "sa-learn" will give you a more detailed printout of what it did. Otherwise you will just see that it learned tokens from X messages and examined Y messages. SA will only learn new tokens from emails it has not already examined. So depending on space, you might log in and empty those emails on occasion OR run this cron job to automatically delete the HAM email as often as you want.
rm ~/mail/SITE.COM/HAM/cur/*

4. Instructions for Users
Your users will then want to Redirect or Resend messages to "SPAM@SITE.COM" or "HAM@SITE.COM" appropriately. This varies depending on how your users check their email.
Web Mail with HORDE - Choose "Redirect"
Outlook (Anyone with a script so this can be done with the push of a button?)
A. Open the message (double click on it)
B. Action-->Resend This Message...
C. Say "Yes" to the warning that you were not the original sender.
D. Change the "To..." line to: "SPAM@SITE.COM" or "HAM@SITE.COM"
E. Hit "Send" (do not change the message)

IMAP - Set up an account for each email and simply move messages (or copies) into the appropriate folder

NOTE - While not on the front of your mind, it is good to teach SA Ham as well as Spam as it will decrease the risk of falsely marking ham as spam.

5. Advanced Options (like everything else was easy)
5.1 Scores Review spam and ham yourself and look at the headers (I look at them with the web mail HORDE) to see which tests results seem common for Ham and Spam. If lots of your Ham gets caught by TEST_A then turn it off in Spam Assassin Configuration (score "TEST_A 0"). If lots of Spam meets rule TEST_B and none of your real mail (ham) seems to meet it, give it more points (score TEST_B 2.0). You will also notice that just because a message was flaged as spam, it was not nessesarily learned as ham. You can redirect it to "SPAM@SITE.COM" to help Bayes even more.
5.2 Protection - Do not list your email addresses on your website. Mask them with ASCII or Hexadecimal values that appear as normal letters to viewers but not to robots http://homepages.comnet.co.nz/disguise.html.
5.3 Advanced Learning - Hide your "SPAM@SITE.COM" in the source of your website such that humans will not see it but spam robots will. Then they will kindly send you spam to "SPAM@SITE.COM" which SA will learn as spam. (better yet, hide "X@SITE.COM" which auto forwards to "SPAM@SITE.COM" in case the robot is smart enough not to send emails to "SPAM@SITE.COM").
5.4 Risky If you have a folder full of spam (like Spam Box which is default to ".spam") , and are certain that no ham are in it, run this cron job.
sa-learn --spam /home/USERNAME/mail/SITE.COM/EMAIL/.SPAM/*
where .SPAM = Folder where spam is stored and EMAIL = Your email name ("email@site.com").
Or to learn from ALL email accounts on your domain (only do this if you know for certain their is no ham on anyone's ".spam" folder)
sa-learn --spam /home/USERNAME/mail/SITE.COM/*/.SPAM/*

nhgpga.org
12-20-2006, 06:47 PM
OK, with the detailed information above, I am able to get it running... and learn tokens.... but is it possiable to somehow run the sa-learn command from a browser? I am running this once a week, and thinking of once a day (until I start to see a slow down/more accurate reconization of spam/ham) but Id like to beable to run this on the spot if I have alot Id like to learn instead of editing my chron tab each time....

jansportw
12-23-2006, 12:48 AM
I believe SSH/Shell Access (see cPanel) will alow you to do this.

juasiepo
12-31-2006, 11:38 PM
Hi guys!

I made a simple script for teaching your SPAMASSASIN daemon with HAM and SPAMUNCAUGTH from all the emails accounts of all your domains. It means with only one script you can add the "SPAM/HAM learning feature" to all your email accounts of all your domains.

Just set up the variables of the script with your preferences. Run it manually several times to see all is working ok and then run this script once a day using crontab and take a sit while your Spamassasin is learning ;P

If you have LOGDEBUG enabled, the script will return you something like this:

===== Checking /home/bluehostlogin/mail/elsotanillo.net/user1/.SpamHam ==========================
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user1/.SpamNoCogido ==========================
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user2/.SpamHam ==========================
Learned tokens from 0 message(s) (0 message(s) examined)
Learned tokens from 0 message(s) (0 message(s) examined)
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user2/.SpamNoCogido ==========================
Learned tokens from 0 message(s) (0 message(s) examined)
Learned tokens from 3 message(s) (3 message(s) examined)
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user3/.SpamHam ==========================
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user3/.SpamNoCogido ==========================
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user4/.SpamHam ==========================
===== Checking /home/bluehostlogin/mail/elsotanillo.net/user4/.SpamNoCogido ==========================
===== Checking /home/bluehostlogin/mail/the10thfloor.net/user1/.SpamHam ==========================
===== Checking /home/bluehostlogin/mail/the10thfloor.net/user1/.SpamNoCogido ==========================

Here you are the Script with instructions:

#!/bin/sh
# InstruirSpamAssassin.sh
# TEACH YOUR SPAMASSASSIN script for cpanel (www.bluehost.com in this case) hosting accounts
# This script teach your SPAMASSASIN daemon with HAM and SPAMUNCAUGTH from all the emails accounts of all your domains

#"THE BEER-WARE LICENSE" (Revision 42):
#Juan Sierra Pons wrote this file. As long as you retain this notice you
#can do whatever you want with this stuff. If we meet some day, and you
#think this stuff is worth it, you can buy me a beer in return.
#Juan Sierra Pons - juan [at} elsotanillo {dot] net
#http://www.elsotanillo.net/
#Original beerware license is due to Poul-Henning Kamp.

#SPAMASSASSIN CONFIGURATION
#1.- Turn On Spamassasin: cPanel->Email Manager->Spam Assassin-> "Enable Spam Assassin"
#2.- Turn on Bayes (auto learning for SA) "Configure Spam Assassin" -> check the box for "use-bayes".

#HOW TO SET UP USERS' EMAIL ACCOUNT
#### always use the same name for HAM and UnCaughtSpam for all yours users
# 1.- Create one folder for SPAMHAM
# 2.- Create one folder for SPAMUNCAUGTH

#USERS' INSTRUCTIONS
# 1.- Move all your UnCaughtSpam messages to your UnCaughtSpam folder
# 2.- Move all your HAM messages to your HAM folder

#HOW TO USE THIS SCRIPT:
# 1.- Fill the LOGIN variable
# 2.- Fill the SPAMHAMDIRECTORY variable
# 3.- Fill the SPAMUNCAUGTH variable
# 4.- Run this script using the crontab daemon once a day for example
# 5.- After you check all is running ok for you, you can comment LOGSALEARN and LOGDEBUG lines

########################## Variables ###########################
LOGIN="loginbluehost"
SPAMHAMDIRECTORY="SpamHam"
SPAMUNCAUGTH="SpamNoCogido"
### uncoment next line if you want to run sa-learn in verbose mode
#LOGSALEARN="-D"
### LOGDEBUG=0 if you wan to see which accounts are being checked - only for debugging pourposes
LOGDEBUG=0
### MOVESPAMHAMMESSAGES=0 if you want to move SPAMHAM messages to each INBOX directory after teach your spamassasin
MOVESPAMHAMMESSAGES=1
### CLEANSPAMUNCAUGTHMESSAGES=0 if you want clean SPAMUNCAUGTH directory after teach your spamassassin
CLEANSPAMUNCAUGTHMESSAGES=1
################################################## ##############
for i in /home/$LOGIN/mail/*/*
do
##### Teach SpamAssassin with HAM messages from SPAMHAM directory
if [ $LOGDEBUG = 0 ] ;then echo ===== Checking $i/.$SPAMHAMDIRECTORY ========================== ; fi
if test -d $i/.$SPAMHAMDIRECTORY/new; then nice -n 19 sa-learn $LOGSALEARN --ham --dir $i/.$SPAMHAMDIRECTORY/new; fi
if test -d $i/.$SPAMHAMDIRECTORY/cur; then nice -n 19 sa-learn $LOGSALEARN --ham --dir $i/.$SPAMHAMDIRECTORY/cur; fi
##### move SPAMHAM messages to each INBOX directory
if [ $MOVESPAMHAMMESSAGES = 0 ]
then mv $i/.$SPAMHAMDIRECTORY/new/* $i/new ;mv $i/.$SPAMHAMDIRECTORY/cur/* $i/cur
fi
##### Teach SpamAssassin from SPAMUNCAUGTH directory
if [ $LOGDEBUG = 0 ] ;then echo ===== Checking $i/.$SPAMUNCAUGTH ========================== ; fi
if test -d $i/.$SPAMUNCAUGTH/new; then nice -n 19 sa-learn $LOGSALEARN --spam --dir $i/.$SPAMUNCAUGTH/new; fi
if test -d $i/.$SPAMUNCAUGTH/cur; then nice -n 19 sa-learn $LOGSALEARN --spam --dir $i/.$SPAMUNCAUGTH/cur; fi
##### Clean SPAMUNCAUGTH directory
if [ $CLEANSPAMUNCAUGTHMESSAGES = 0 ]
then rm $i/.$SPAMUNCAUGTH/new/*;rm $i/.$SPAMUNCAUGTH/cur/*
fi
done


After you see all is working fine you can change MOVESPAMHAMMESSAGES and CLEANSPAMUNCAUGTHMESSAGES to move HAM to each inbox and to clean SpamUncaught directories. For security reasons they are not enabled by default.

Remember the license of the script ;)

Happy new year.

Juanillo
--
----------------------------------------------------------------------------
Linux User Registered: 257202
http://www.elsotanillo.net
----------------------------------------------------------------------------

Capitaine Caverne
04-22-2009, 01:24 PM
I however have the feeling that even if the learning process is successful, Spamassassin still uses a global bayes DB rather than the user-defined one.

Is there a way to check that indeed, SA has run the incoming mail against the local bayes database ?

Thanks
S.

jansportw
06-03-2009, 03:46 PM
1. Capitaine, Sorry I don't know your answer excpet that it seems to be run against my local databse,

2. I recently ran sa-learn (same cron job I've been running for a few years) and get back:
bayes: cannot open bayes databases /ramdisk/etc/spamassassin/data/bayes_* R/W: tie failed: Permission denied

Any idea what might be causing this, what changed, and what I can do about it?