In article <1ic4yc1.1sgpsfh199q1j2N%peter@[EMAIL PROTECTED]
>,
peter@[EMAIL PROTECTED]
(Peter Ceresole) wrote:
> Bill Cole <bill@[EMAIL PROTECTED]
> wrote:
>
> > > The whole point of heuristic learning (which I believe the spam
filters
> > > use) is that they derive their own rules from the examples you give
> > > them. These rules can be very complicated. This avoids human
> > > programming, which is the source of so many errors.
> >
> > That's not really how most training-based spam filters work. Most are
> > based on a system known as "Naive Bayesian" classification. That
> > approach isn't based on complex rules, but rather on a lot of very
> > simple rules with scores derived statistically. The words in the
message
> > are tallied to a big list of word frequencies in spam and non-spam
> > messages, and that list of relative frequencies is used as a lot of
very
> > simple scoring rules, with each rule testing the presence of a word
and
> > adding a score based on the frequency of the word in prior spam and
> > non-spam.
>
> I expressed it badly; the point is that this process, involving many
> simple criteria, is in itself a complicated one.
Well, I guess so...
But only if you consider probabilistic analysis complicated.
<g,d,&r>
--
Now where did I hide that website...


|