[Maia-users] Bayes_00 pain

Robert LeBlanc rjl at renaissoft.com
Thu Aug 24 06:04:28 PDT 2006


Robert Hoekstra wrote:

> Awaiting FuzzyOCR results, I noticed that already two 'regular' spam
> messages have been getting through in the last 24 hours. And again it hits
> BAYES_00, like this:
> BAYES_00(-2.599)
> ADVANCE_FEE_1(0),
> DEAR_FRIEND(1.632),
> DNS_FROM_RFC_ABUSE(0.2),
> HTML_IMAGE_ONLY_32(1.052),
> HTML_MESSAGE(0.001),
> SUBJ_ALL_CAPS(0.997)
> 
> All VERY spammy tags (to me), but yet the threshold isn't triggered. (Even
> without BAYES_00 the required level of 5 wouldn't be triggered, it would
> stick at 3.882).
> 
> Am I doing something wrong here? This mail would look very classic spam to
> me, but apparently spamassassin doesn't think so.

Looking at the mail itself, there's not much spammy content; it looks
like a fairly typical business letter, apart from the items that
SpamAssassin already noted (e.g. DEAR_FRIEND, etc.).  It's long enough
that it contains a lot of non-spam tokens, so it's not difficult to
understand why your Bayes sees this as probable non-spam.

What would help in this case is having more sources of
information--network tests like Razor, Pyzor, DCC, SPF, and DomainKeys
would be useful, and some of the SARE rulesets may help as well.  In
general, when you find spam slipping through like this it's a sign that
you need to have SpamAssassin apply a wider range of tests to your mail.
 SpamAssassin's greatest strength is its broad-spectrum approach: it can
use a wide range of tests that work in very different ways, so that if
one kind of test is ineffective, a different test may still catch
something interesting.

When I check that same email here (reconstructed from your
newspam.html), I get:

X-Spam-Status: Yes, score=6.853 required=5.0
tests=ADVANCE_FEE_1 (0),
      BAYES_50 (0),
      DEAR_FRIEND (1.6),
      DK_SIGNED (0),
      DNS_FROM_RFC_ABUSE (0.2),
      HTML_IMAGE_ONLY (1.052),
      HTML_MESSAGE (0.001),
      J_CHICKENPOX_32 (0.6),
      J_CHICKENPOX_53 (0.6),
      J_CHICKENPOX_62 (0.6),
      J_CHICKENPOX_64 (0.6),
      J_CHICKENPOX_83 (0.6),
      SUBJ_ALL_CAPS (1),
      UNPARSEABLE_RELAY (0)

So while my Bayes is a bit more confident than yours (50%), it's still
not certain enough to call the mail spam or non-spam on that basis
alone.  What made the biggest difference in this case was the extra
information from the Chickenpox ruleset (available from the SARE
website), which caught a number of sloppy sentence constructs in which
the spammer forgot to insert a space after his commas:

dbg: rules: ran body rule J_CHICKENPOX_32 ======> got hit: " job,we "
dbg: rules: ran body rule J_CHICKENPOX_83 ======> got hit: " position,you "
dbg: rules: ran body rule J_CHICKENPOX_64 ======> got hit: " Europe,Asia "
dbg: rules: ran body rule J_CHICKENPOX_62 ======> got hit: " Canada,we "
dbg: rules: ran body rule J_CHICKENPOX_53 ======> got hit: " goods,and "

-- 
Robert LeBlanc <rjl at renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamailguard.com/>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://www.renaissoft.com/pipermail/maia-users/attachments/20060824/0eadb285/attachment-0001.bin 


More information about the Maia-users mailing list