[Maia-users] Bayes_00 pain
Paul Westbrook
paul at westbrooks.org
Wed Aug 23 11:44:45 PDT 2006
Hello,
Is this the same as the ocrtext plugin as described here?
http://www.nabble.com/GIF-Spam----Setting-up-the-%27OCR-scanner-and-
image-validator-SA-plugin%27-tf2042373.html
It looks like the rules for this plugin use regular expressions to
help work around errors in the character recognition.
--Paul
On Aug 23, 2006, at 5:33 AM, Robert LeBlanc wrote:
> Stefan G. Weichinger wrote:
>
>> Is this OCR-stuff ready for usage in "productive environments"?
>> I haven't yet found/taken the time to take a real close look, as
>> until
>> now I wasn't affected by image-spam AFAIK (at least my own boxes/
>> domains).
>>
>> I was under the impression that all this was relatively new and
>> experimental. What's your impression? Does it generate much load in
>> terms of cpu/ram-usage?
>
> While I would still call it "experimental" at this stage, that's
> mostly
> because it's being developed very rapidly. The version I'm using in
> production is the one I describe in the wiki (2.1c), but there are
> already beta versions in the 2.2 series, and alphas in the 2.3 series,
> with new experimental releases becoming available at a rate of one or
> two a day. Clearly this is an area receiving a lot of attention at
> the
> moment, and there's a mailing list called "Devel-Spam"
> <http://lists.own-hero.net/mailman/listinfo/devel-spam> you can
> subscribe to if you want to keep up with the bleeding edge of its
> development.
>
> The 2.1 series is quite stable and works quite well for most purposes.
> In terms of the extra load and resource usage, it's minor because
> of the
> fact that the OCR plugin only gets invoked on mail that contains
> inline
> images. For those particular emails, it adds 2-4 seconds of
> processing
> time, but since those emails represent a very small fraction of the
> total mail volume, the average increase in processing time works out a
> few milliseconds per item, or a few (i.e. < 10) extra processor-
> minutes
> per day.
>
> The decision to implement OCR in a production environment at this
> stage
> is obviously your call, but with the 2.1 stable series I don't see the
> harm in it, unless perhaps your server is very close to its resource
> limits as it is. You must also weigh this against the prevalence of
> image-spam, of course; if you haven't been receiving much of it
> yet, you
> probably won't feel much pressure to implement OCR. Once you /do/
> start
> receiving it in larger volumes, however, the pressure may reach a
> tipping point, and you may be willing to accept a bit more risk and a
> bit more resource consumption in order to stem the tide of the
> image-spam.
>
> As image-spam becomes more pervasive, however, we're eventually /all/
> going to need to implement OCR or something equivalent. When the spam
> content is entirely within the images, and the text portion of the
> mail
> contains just non-spammy words and phrases, there's really very little
> else left for us to do but try to extract the spam content from the
> images.
>
> --
> Robert LeBlanc <rjl at renaissoft.com>
> Renaissoft, Inc.
> Maia Mailguard <http://www.maiamailguard.com/>
>
> _______________________________________________
> Maia-users mailing list
> Maia-users at renaissoft.com
> http://www.renaissoft.com/mailman/listinfo/maia-users
--
Paul Westbrook
paul at westbrooks.org
<http://www.westbrooks.org>
More information about the Maia-users
mailing list