[Maia-users] Using OCR to identify image-based spam

Robert LeBlanc rjl at renaissoft.com
Tue Aug 8 16:47:37 PDT 2006


Robert LeBlanc wrote:

> The upside to this approach is that it doesn't require any changes to
> SpamAssassin--no need for any OCR plugins or for the SpamAssassin devs
> to include OCR as a built-in feature.

All right, let me contradict myself, here :)  After a bit more thought,
I've realized that modifying the mail before submitting it to
SpamAssassin would break the hashing systems (Razor, Pyzor, DCC,
SpamCop).  The modified mail would clearly hash to a different value,
and this value would not match (m)any hashes in the hashing databases,
so there would be few (or no) hits on Razor, Pyzor, DCC, and SpamCop
rules during the spam scan.

Doing the OCR in amavisd-maia is fine and good, but it would also
probably mean disabling the hashing features in SpamAssassin and doing
them instead in amavisd-maia (where we can control which version of the
mail gets used for the hashing).  It's not an unthinkable scenario, but
it's a fair bit of work.

The alternative is to modify SpamAssassin itself to do what I was
proposing in the first place--have SpamAssassin do the OCR'ing before
subjecting the mail to its battery of tests.  This would require the
least work, but would of course mean patching SpamAssassin
(spamassassin-maia, anyone?).  There's a chance the SpamAssassin devs
might be interested in the patch, however, in which case it could get
adopted into the official distribution at some point in the future.

I'm open to other ideas, of course; as I've proven once already today,
being "too close to the problem" can make one's perspective too narrow,
so by all means share your thoughts.

-- 
Robert LeBlanc <rjl at renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamailguard.com/>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://www.renaissoft.com/pipermail/maia-users/attachments/20060808/c1212067/attachment-0001.bin 


More information about the Maia-users mailing list