[Maia-users] Bayes_00 pain
Stefan G. Weichinger
lists at xunil.at
Wed Aug 23 06:41:47 PDT 2006
Robert LeBlanc wrote:
> While I would still call it "experimental" at this stage, that's mostly
> because it's being developed very rapidly. The version I'm using in
> production is the one I describe in the wiki (2.1c), but there are
> already beta versions in the 2.2 series, and alphas in the 2.3 series,
> with new experimental releases becoming available at a rate of one or
> two a day. Clearly this is an area receiving a lot of attention at the
> moment, and there's a mailing list called "Devel-Spam"
> <http://lists.own-hero.net/mailman/listinfo/devel-spam> you can
> subscribe to if you want to keep up with the bleeding edge of its
> development.
>
> The 2.1 series is quite stable and works quite well for most purposes.
What does "quite" mean in this context? False negatives? Crashing
binaries? Stopped mail-delivery? Need for manual intervention?
> In terms of the extra load and resource usage, it's minor because of the
> fact that the OCR plugin only gets invoked on mail that contains inline
> images. For those particular emails, it adds 2-4 seconds of processing
> time, but since those emails represent a very small fraction of the
> total mail volume, the average increase in processing time works out a
> few milliseconds per item, or a few (i.e. < 10) extra processor-minutes
> per day.
>
> The decision to implement OCR in a production environment at this stage
> is obviously your call, but with the 2.1 stable series I don't see the
> harm in it, unless perhaps your server is very close to its resource
> limits as it is.
Ok, this means I have no problems with CPU/RAM ...
> You must also weigh this against the prevalence of
> image-spam, of course; if you haven't been receiving much of it yet, you
> probably won't feel much pressure to implement OCR. Once you /do/ start
> receiving it in larger volumes, however, the pressure may reach a
> tipping point, and you may be willing to accept a bit more risk and a
> bit more resource consumption in order to stem the tide of the image-spam.
Correct ;-) Exactly my point of view although I wasn't able to verbalize
in the first mail ... y'know, english isn't my first language.
> As image-spam becomes more pervasive, however, we're eventually /all/
> going to need to implement OCR or something equivalent. When the spam
> content is entirely within the images, and the text portion of the mail
> contains just non-spammy words and phrases, there's really very little
> else left for us to do but try to extract the spam content from the images.
Yup. I think I am gonna head over to your HOWTO and give it a try. From
what I have seen, this OCR-functionality is switched by that
loadplugin-line in SA, so I can still decide to keep it turned off per
default as long as I get familiar with it.
Thanks so far, greetings to you, Robert, and Maia ;-)
Stefan
More information about the Maia-users
mailing list