[Maia-users] I know how it works, but...

Robert LeBlanc rjl at renaissoft.com
Wed Apr 18 02:48:52 PDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kurt Buff wrote:

> While this sounds very doable, what impact will this have on the cache? Will
> a domain/super administrator have more work to do to clear it periodically?
> Any other impacts that might have on maintenance?

Well, the first thing users will notice is that their spam quarantine
count will look wrong when they login.  They'll see that they have, say,
320 spam items quarantined, but when they look at the actual spam
quarantine they may see a much smaller number, say 80, and this may lead
them to wonder where the other 240 items went.  This could lead to some
interesting questions coming at you from users, unless you also adjust
the SQL query on the welcome.php page to only provide the count of items
that score lower than your threshold.

The second side-effect is that the items that score above this magic
threshold will be largely neglected unless the administrators
impersonate each user's account and do the confirmation of those
high-scoring spam items for them.  Otherwise those items will hang
around until they age past the expiry threshold (and thus get deleted by
the expire-quarantine-cache script).  To make sure you at least get the
Bayes training benefit out of those items, I'd suggest that you enable
SpamAssassin's auto-learning mechanism, and set the auto-learn threshold
for spam to the same threshold you're using for this magic cutoff.  That
way everything that scores at least that high will be auto-learned by
the Bayes as spam anyway, regardless of whether administrators confirm
them.  Then all you'd really lose is the ability to /report/ that
high-scoring spam (which can only be done by confirming it).


> I like it - it's not terribly high priority, but it sounds like a nice
> feature. Have to be careful about the rules, I would suspect. I'm guessing
> in particular that while an email might hit a rule, and the rule would
> usefully detect spam sign, the rule might not actually be sign of a
> particular category of spam.

That's true for a certain category of spam rules, but as you can see by
looking at the different *.cf files that SpamAssassin ships with, and
the different sets that SARE issues, many of these are grouped with
specific purposes in mind.  The 70_sare_stocks.cf ruleset, for instance,
is designed to identify stock-related scams, while 20_porn and
70_sare_adult.cf go after porn, and 20_drugs.cf targets pill-pushers,
and 20_advance_fee.cf is aimed at "Nigerian letter"/419 scams.

The key would seem to be to allow administrators to define their own
categories and assign rules to those categories as they see fit, mapping
an action to the category as a whole.  If there is a handful of "kill on
sight" rules that should be grouped together into one category for that
purpose, so be it, but that's something for each administrator to
decide.  The concept needs a lot of refinement, but I think the basic
idea has merit.


> I'm still trying to convince HR/execs that the current web interface (which
> they haven't explored much) is a sufficient barrier to mitigate their
> concerns - I've broached the chestnut that not all corporate computer
> problems have a technical solution, and that in those cases, a
> managerial/policy/educational solution is the better answer. We'll see.

It's probably also worth noting that even if one of your users does
release some quarantined spam and decides to forward it along to a
co-worker, the mail will more than likely just end up in the receivers'
spam quarantines, so the problem your bosses are trying to solve may not
really exist after all.

You're correct to point out to them that basing this policy decision of
theirs on an abstract score threshold is not likely to achieve what
they're looking for, though.  The only way to guarantee that it will do
what they want is to boost the score values of the individual rules that
indicate the kind of content they want to prohibit, such that if even
one of those rules triggers, the score is guaranteed to be above the
magic threshold.  You'd do that by adding a bunch of "score" overrides
in your local.cf file.  If your magic threshold was, say, 15, then you'd
have entries like:

 score RULE_X 15.0
 score RULE_Y 15.0
 score RULE_Z 15.0
 ...etc...

If you wanted to get a bit more sophisticated, you could add some custom
META-rules that provide a score boost when particular combinations of
other rules trigger together.

Either way it's not pretty, and it's not really foolproof; offensive
spam is going to slip through at some (hopefully very low) false
negative rate, and that stuff won't even get quarantined--it will end up
getting delivered to the receipients' inboxes as if it were legitimate
mail.  If that sort of thing is going to get the execs up in arms, then
perhaps they need to be reminded that no spam filter on the planet is
going to be 100% effective--Maia can get you above 99%, but the harder
you drive toward 100% from there, the more false positives you invite.

- --
Robert LeBlanc <rjl at renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamailguard.com/>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGJemEGmqOER2NHewRAszSAJ9n/Nn2t8Q/WrWjELY5TJJyrK+TogCgp/Pn
xuMcYHnTbT38Xks6OMzhuRA=
=6i4o
-----END PGP SIGNATURE-----


More information about the Maia-users mailing list