[Maia-users] Spam-trap account

Robert LeBlanc rjl at renaissoft.com
Fri Apr 13 14:21:41 PDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Morton wrote:
> Craig Thompson wrote:
>> I have designated an account as a spam-trap account, but I can't find
>> any documentation on setting it up.  My goal in using a spam-trap
>> account is to turn off the non-spam cache.
> 
>> I am forwarding spam emails to the spam-trap account that Maia allowed
>> through. The system is still trying to forward the email to an account,
>> so I am getting bounces.  I was under the impression that Maia would not
>> forward these emails.  When I look at statistics for the account, all
>> emails have been confirmed as spam, which is good.
> 
> 
> That's not how spamtraps work; a spamtrap address is one that receives only spam
> directly from the spammers. You cannot forward messages to it, otherwise it will
> think you are the spammer.

Indeed, I think you misunderstand the purpose of a spam-trap.  A
spam-trap is an account that receives no legitimate mail--it must use an
email address that has never been used for any legitimate purpose, nor
ever been advertised in any way that would invite people to send
legitimate mail to it.  It may not even have been advertised at all;
thanks to dictionary attacks, spammers doing address probes will
discover addresses like "asdfghjk at example.com" eventually without any
help, as long as that address appears to accept mail.

The point feature of a spam-trap, though, is that by its careful design,
everything it receives is by definition "unsolicited," so there's no
need to do any analysis of the mail--not for viruses, not spam, not for
banned attachments or bad headers.  Everything it receives is
automatically classified as "confirmed spam," so no additional resources
are required to deal with it.

Spam-traps are useful mainly as a means of gathering spam samples for
Bayes training and reporting to collaborative networks like Razor,
Pyzor, DCC, and SpamCop.  Spam samples gathered this way are generally
considered to be "more provably" spam, given the design of the spam-trap
address mechanism, so such evidence is more likely to be accepted with
confidence by DNSBLs and other authorities.

As David points out, spam-traps are /not/ "spam-reporting" addresses,
and should never be used that way.  The main problem is that mail
forwarding/redirecting modifies the mail headers in the process, so the
evidence that gets submitted for Bayes training and spam reporting is
tainted with information from the forwarder.  Eventually your Bayes
database starts recognizing the forwarders' mail as spam, and if you've
been reporting this stuff to Razor/Pyzor/DCC/SpamCop, others around the
world will eventually start doing the same.

Using a "spam-reporting" address for reporting false negatives is a
tricky business.  To do it properly, you need to ensure that the
original email is not modified in any way.  Encapsulating it as an
attachment is one way to do this, but it of course requires that a
process at the receiving end know to unpack the attachment.  This also
requires that all of your submitters know to do the encapsulating
properly, and that they never forget to do so.

In fact, it was this very problem with "spam-reporting" addresses that
motivated us to devise the non-spam cache mechanism for Maia for
reporting false positives (i.e. the mechanism you're so eager to get rid
of).  Since Maia stores a pristine copy of the email in its database,
there's no chance that the headers will get munged during the
learning/reporting phase.  It's also simpler for end-users to use than
having to consistently encapsulate their spam as attachments (and of
course for you to write an attachment unpacker at the receiving end).
Admittedly, it's a new concept to get through to users, many of whom are
initially a bit confused about what the non-spam cache is for, but in
the end I think it's a much safer and more reliable way to report false
negatives.

There's another reason to use the non-spam cache, though: Bayes
training.  Even if you perfected a "spam-reporting" address mechanism,
you'd still be doing yourself a disservice by denying your Bayes
database an opportunity to learn from user-confirmed non-spam.  The
Bayes engine works best (and fastest) when it gets feedback from users
to tell it not only when it has made mistakes, but also when it has
guessed correctly--that's what the "confirmation" process is all about.
 When you hit the "confirm" button in the spam quarantine or the
non-spam cache, you're not just rescuing false positives and reporting
false negatives, you're also telling SpamAssassin that it got everything
else /right/.  That increases SpamAssassin's confidence in its judgment
about the tokens in those items, so it learns faster and more
effectively.  If you disable the non-spam cache, then, it will never get
that kind of confirmation about non-spam items, so most of what it
learns will be about spam, and its ability to discriminate one from the
other will suffer.

- --
Robert LeBlanc <rjl at renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamailguard.com/>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGH/RlGmqOER2NHewRArxdAJ0cJHdbsE2LaUndoNmEobKd13z4PQCcDqvc
XhKBe5RxVxui5ECFSLMfTnI=
=BPui
-----END PGP SIGNATURE-----


More information about the Maia-users mailing list