[Maia-users] newbie problems

Dale Carstensen dlc at lampinc.com
Sun Dec 3 06:07:26 PST 2006


Ah.  I see that maiadbtool.pl is in svn trunk, but not in 1.0.1.  Thanks.

I tried to see where the Bayes data is, and found that the tables in
the maia database with bayes and awl in the table names have zero rows.
Hmm.  Then I looked here and there, for instance, locate bayes, and
eventually in /var/amavisd/.spamassassin I found some files that must
be the data.  The file names are auto-whitelist, bayes_seen and bayes_toks.
The auto-whitelist has a date of mid-Friday afternoon (it's Sunday morning
now) and the other two have a date of about 5 minutes ago.  The "locate"
database, of course, skips /var, so running locate was useless, by the
way.  "find /var -name '*bayes*'" was the key.  I'm not sure locate
finds names beginning with "." or descendants, come to think of it,
and maybe it does do /var and missed it because of the dot.

So then I thought I would see what's really there, maybe what words
contribute what score.  The "perldoc sa-learn" documentation says the
data can be viewed in human-readable format via the --backup format.
It also mentions a --dump option.  Of course, I had to add another
option, --dbpath=/var/amavisd/.spamassassin, too.

I don't know what comes from "seen" and what comes from "toks" in the
--dump output, though it's similar to the --backup where the first
column is "s" or "t."  And for the "t" lines in --backup and the only
lines in --dump, the fourth column looks like a date in seconds since
New Years 1970, and the fifth column is 10 hex digits.

Maybe the bayes data is reasonable??  Maybe not.  I did feed all those
miserably mis-classified false negatives back through, after all,
hundreds of them.

Then, via "od -c" I see that auto-whitelist indeed has bad places
like chello.nl and hundreds of other domains that definitely do
not belong in any whitelist.

So I guess I could develop my own large samples of ham and spam,
and feed them through sa-learn with the appropriate options, --dbpath
chief among them, and get a decent Bayesian database.

My question is, can I get any useful representation of the Bayes
data?  These hex strings defy interpretation.  Well, maybe I just
need to push messages through spamd and see what gets returned --
hmm, how to just do the Bayes part and get the actual score???

And another question:  Is this how maia normally uses the Bayes
and AWL features in spamassassin?

  Dale

>Dale Carstensen wrote:
>> I'm working on a long reply with Robert LeBlanc's first reply fully
>> quoted.  But for now, I have a simple (I hope) question, so I'll
>> put a short quote here, then the question, then a bigger quote for
>> context.
>> 
>>> start by fixing the
>>> internal_networks and trusted_networks settings in your local.cf file.
>>> Then you'll want to wipe your Bayes and AWL databases to start fresh.
>> 
>> OK.  Now, how do I wipe the Bayes and AWL databases without just
>> starting over completely?
>
>options to maiadbtool.pl
>
>  --clear-bayes                      : empty the Bayes database
>  --clear-awl                        : empty the AWL database




More information about the Maia-users mailing list