In article <op.t57xb5vmnn735j@[EMAIL PROTECTED]
>,
"John H Meyers" <jhmeyers@[EMAIL PROTECTED]
> wrote:
> [removed non-Eudora group]
>
> On Thu, 07 Feb 2008 20:44:18 -0600:
>
> > Periodically open your Junk folder and mark all the messages as junk
> > (after checking that there are no false positives).
>
> Is there any need to re-junk already junked mail?
>
> One could speculate whether re-junking (or re-non-junking)
> would any better train the algorithm, or whether in any cases
> it might prove counter-productive; might take a lot of research to find
out.
Not really.
The Bayesian database used for SpamWatch is just plain text, and it
takes a few seconds to confirm that explicity telling Eudora that a
message is or is not junk changes the token counts for words in that
message, whatever the prior classification status of the message.
In principle, it is best for any Bayesian mail filtering system to be
fed by correct classifications of every message it sees, but in practice
that would make the whole thing conceptually worthless, so real filters
like the one in Eudora usually use a combination of train-on-error and
feedback auto-learning, but actual user confirmation of a classification
is always more valuable as a basis for feeding the filter than using the
filter's determination as a basis for auto-learning.
> By the way, it appears that in the Windows version,
> it may not even be possible to re-junk what's already in Junk mailbox,
> nor un-junk what isn't even in Junk mailbox,
Not so for the Mac version.
> which suggests discouragement of the practice.
Which would be unwise.
I believe based on its behavior that the last version of the Bayesian
filter in Mac Eudora has unspecified auto-learning thresholds for both
spam and non-spam that are different from the classification threshold.
Any similar filter that does any auto-learning should work that way,
i.e. only automatically feed back the messages that are certainly spam
or not spam, not the borderline messages.
--
Now where did I hide that website...


|