[Maia-users] Using OCR to identify image-based spam

Robert LeBlanc rjl at renaissoft.com
Tue Aug 8 14:28:57 PDT 2006


In my previous response I mentioned two ways that OCR support could be
added to SpamAssassin such that the extracted text could be subjected to
the full battery of SpamAssassin tests.  What Pete Barnwell reminded me
in the meantime was that I was still thinking "inside the box".  There's
certainly another way to get the job done, and it involves using OCR at
the amavisd-maia stage--i.e. /before/ calling SpamAssassin.

Indeed, OCR would then become Maia's responsibility rather than
SpamAssassin's.  Just as amavisd-maia currently unpacks MIME structures
and decodes them, preparing them to be subjected to other tests, it
reasons that it could just as easily apply OCR techniques to any image
attachments, appending the extracted text to the mail body for the
purpose of submitting it to SpamAssassin.

The upside to this approach is that it doesn't require any changes to
SpamAssassin--no need for any OCR plugins or for the SpamAssassin devs
to include OCR as a built-in feature.  It also fits quite well, I think,
with the role of amavisd-maia in the processing chain--as an unpacker
and decoder, it's the proper place for OCR to be introduced.  I guess I
have a new enhancement ticket to open :)

-- 
Robert LeBlanc <rjl at renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamailguard.com/>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://www.renaissoft.com/pipermail/maia-users/attachments/20060808/56737cd6/attachment.bin 


More information about the Maia-users mailing list