added extended section on spam

[gnus] / texi / gnus.texi
diff --git a/texi/gnus.texi b/texi/gnus.texi

index 18a7d34..ca274ca 100644 (file)
--- a/texi/gnus.texi
+++ b/texi/gnus.texi
@@ -709,6 +709,7 @@ Browsing the Web
  @sc{imap}
  
  * Splitting in IMAP::           Splitting mail with nnimap.
+* Expiring in IMAP::            Expiring mail with nnimap.
  * Editing IMAP ACLs::           Limiting/enabling other users access to a mailbox.
  * Expunging mailboxes::         Equivalent of a "compress mailbox" button.
  * A note on namespaces::        How to (not) use IMAP namespace in Gnus.
@@ -846,9 +847,12 @@ Picons
  
  Thwarting Email Spam
  
+* The problem of spam::         Some background, and some solutions
  * Anti-Spam Basics::            Simple steps to reduce the amount of spam.
  * SpamAssassin::                How to use external anti-spam tools.
  * Hashcash::                    Reduce spam by burning CPU time.
+* Filtering Spam Using spam.el::  
+* Filtering Spam Using Statistics (spam-stat.el)::  
  
  Appendices
  
@@ -2768,11 +2772,12 @@ See also @code{gnus-total-expirable-newsgroups}.
  @item expiry-wait
  @cindex expiry-wait
  @vindex nnmail-expiry-wait-function
-If the group parameter has an element that looks like @code{(expiry-wait
-. 10)}, this value will override any @code{nnmail-expiry-wait} and
-@code{nnmail-expiry-wait-function} when expiring expirable messages.
-The value can either be a number of days (not necessarily an integer) or
-the symbols @code{never} or @code{immediate}.
+If the group parameter has an element that looks like
+@code{(expiry-wait . 10)}, this value will override any
+@code{nnmail-expiry-wait} and @code{nnmail-expiry-wait-function}
+(@pxref{Expiring Mail}) when expiring expirable messages.  The value
+can either be a number of days (not necessarily an integer) or the
+symbols @code{never} or @code{immediate}.
  
  @item score-file
  @cindex score file group parameter
@@ -8206,6 +8211,12 @@ positions in the alphabet, e. g. @samp{B} (letter #2) -> @samp{O} (letter
  #15).  It is sometimes referred to as ``Caesar rotate'' because Caesar
  is rumored to have employed this form of, uh, somewhat weak encryption.
  
+@item W m
+@kindex W m (Summary)
+@findex gnus-summary-morse-message
+@c @icon{gnus-summary-morse-message}
+Morse decode the article buffer (@code{gnus-summary-morse-message}).
+
  @item W t
  @item t
  @kindex W t (Summary)
@@ -13639,20 +13650,45 @@ Gnus will not delete your old, read mail.  Unless you ask it to, of
  course.
  
  To make Gnus get rid of your unwanted mail, you have to mark the
-articles as @dfn{expirable}.  This does not mean that the articles will
-disappear right away, however.  In general, a mail article will be
+articles as @dfn{expirable}.  (With the default keybindings, this means
+that you have to type @kbd{E}.)  This does not mean that the articles
+will disappear right away, however.  In general, a mail article will be
  deleted from your system if, 1) it is marked as expirable, AND 2) it is
  more than one week old.  If you do not mark an article as expirable, it
  will remain on your system until hell freezes over.  This bears
  repeating one more time, with some spurious capitalizations: IF you do
  NOT mark articles as EXPIRABLE, Gnus will NEVER delete those ARTICLES.
  
+You do not have to mark articles as expirable by hand.  Gnus provides
+two features, called `auto-expire' and `total-expire', that can help you
+with this.  In a nutshell, `auto-expire' means that Gnus hits @kbd{E}
+for you when you select an article.  And `total-expire' means that Gnus
+considers all articles as expirable that are read.  So, in addition to
+the articles marked @samp{E}, also the articles marked @samp{r},
+@samp{R}, @samp{O}, @samp{K}, @samp{Y} and so on are considered
+expirable.
+
+When should either auto-expire or total-expire be used?  Most people
+who are subscribed to mailing lists split each list into its own group
+and then turn on auto-expire or total-expire for those groups.
+(@xref{Splitting Mail}, for more information on splitting each list
+into its own group.)
+
+Which one is better, auto-expire or total-expire?  It's not easy to
+answer.  Generally speaking, auto-expire is probably faster.  Another
+advantage of auto-expire is that you get more marks to work with: for
+the articles that are supposed to stick around, you can still choose
+between tick and dormant and read marks.  But with total-expire, you
+only have dormant and ticked to choose from.  The advantage of
+total-expire is that it works well with adaptive scoring @pxref{Adaptive
+Scoring}.  Auto-expire works with normal scoring but not with adaptive
+scoring.
+
  @vindex gnus-auto-expirable-newsgroups
-You do not have to mark articles as expirable by hand.  Groups that
-match the regular expression @code{gnus-auto-expirable-newsgroups} will
-have all articles that you read marked as expirable automatically.  All
-articles marked as expirable have an @samp{E} in the first
-column in the summary buffer.
+Groups that match the regular expression
+@code{gnus-auto-expirable-newsgroups} will have all articles that you
+read marked as expirable automatically.  All articles marked as
+expirable have an @samp{E} in the first column in the summary buffer.
  
  By default, if you have auto expiry switched on, Gnus will mark all the
  articles you read as expirable, no matter if they were read or unread
@@ -14873,7 +14909,9 @@ summary buffer.
                       (assq (gnus-summary-article-number)
                             gnus-newsgroup-data))))))
      (if url
-        (browse-url (cdr url))
+        (progn
+          (browse-url (cdr url))
+          (gnus-summary-mark-as-read-forward 1))
        (gnus-summary-scroll-up arg))))
  
  (eval-after-load "gnus"
@@ -15190,6 +15228,7 @@ variable @code{nntp-authinfo-file} for exact syntax; also see
  
  @menu
  * Splitting in IMAP::           Splitting mail with nnimap.
+* Expiring in IMAP::            Expiring mail with nnimap.
  * Editing IMAP ACLs::           Limiting/enabling other users access to a mailbox.
  * Expunging mailboxes::         Equivalent of a "compress mailbox" button.
  * A note on namespaces::        How to (not) use IMAP namespace in Gnus.
@@ -15355,6 +15394,43 @@ Nnmail equivalent: @code{nnmail-split-fancy}.
  
  @end table
  
+@node Expiring in IMAP
+@subsection Expiring in IMAP
+@cindex expiring imap mail
+
+Even though @sc{nnimap} is not a proper @sc{nnmail} derived backend,
+it supports most features in regular expiring (@pxref{Expiring Mail}).
+Unlike splitting in IMAP (@pxref{Splitting in IMAP}) it do not clone
+the @sc{nnmail} variables (i.e., creating @var{nnimap-expiry-wait})
+but reuse the @sc{nnmail} variables.  What follows below are the
+variables used by the @sc{nnimap} expiry process.
+
+A note on how the expire mark is stored on the @sc{imap} server is
+appropriate here as well.  The expire mark is translated into a
+@sc{imap} client specific mark, @code{gnus-expire}, and stored on the
+message.  This means that likely only Gnus will understand and treat
+the @code{gnus-expire} mark properly, although other clients may allow
+you to view client specific flags on the message.  It also means that
+your server must support permanent storage of client specific flags on
+messages.  Most do, fortunately.
+
+@table @code
+
+@item nnmail-expiry-wait
+@item nnmail-expiry-wait-function
+
+These variables are fully supported.  The expire value can be a
+number, the symbol @var{immediate} or @var{never}.
+
+@item nnmail-expiry-target
+
+This variable is supported, and internally implemented by calling the
+@sc{nnmail} functions that handle this.  It contains an optimization
+that if the destination is a IMAP group on the same server, the
+article is copied instead of appended (that is, uploaded again).
+
+@end table
+
  @node Editing IMAP ACLs
  @subsection Editing IMAP ACLs
  @cindex editing imap acls
@@ -20915,14 +20991,85 @@ mail group, only to find two pyramid schemes, seven advertisements
  (``New!  Miracle tonic for growing full, lustrous hair on your toes!'')
  and one mail asking me to repent and find some god.
  
-This is annoying.
+This is annoying.  Here's what you can do about it.
  
  @menu
+* The problem of spam::         Some background, and some solutions
  * Anti-Spam Basics::            Simple steps to reduce the amount of spam.
  * SpamAssassin::                How to use external anti-spam tools.
  * Hashcash::                    Reduce spam by burning CPU time.
+* Filtering Spam Using spam.el::  
+* Filtering Spam Using Statistics (spam-stat.el)::  
  @end menu
  
+@node The problem of spam
+@subsection The problem of spam
+@cindex email spam
+@cindex spam filtering approaches
+@cindex filtering approaches, spam
+@cindex UCE
+@cindex unsolicited commercial email
+
+First, some background on spam.
+
+If you have access to e-mail, you are familiar with spam (technically
+termed @acronym{UCE}, Unsolicited Commercial E-mail).  Simply put, it exists
+because e-mail delivery is very cheap compared to paper mail, so only
+a very small percentage of people need to respond to an UCE to make it
+worthwhile to the advertiser.  Ironically, one of the most common
+spams is the one offering a database of e-mail addresses for further
+spamming.  Senders of spam are usually called @emph{spammers}, but terms like
+@emph{vermin}, @emph{scum}, and @emph{morons} are in common use as well.
+
+Spam comes from a wide variety of sources.  It is simply impossible to
+dispose of all spam without discarding useful messages.  A good
+example is the TMDA system, which requires senders
+unknown to you to confirm themselves as legitimate senders before
+their e-mail can reach you.  Without getting into the technical side
+of TMDA, a downside is clearly that e-mail from legitimate sources may
+be discarded if those sources can't or won't confirm themselves
+through the TMDA system.  Another problem with TMDA is that it
+requires its users to have a basic understanding of e-mail delivery
+and processing.
+
+The simplest approach to filtering spam is filtering.  If you get 200
+spam messages per day from @email{random-address@@vmadmin.com}, you
+block @samp{vmadmin.com}.  If you get 200 messages about
+@samp{VIAGRA}, you discard all messages with @samp{VIAGRA} in the
+message.  This, unfortunately, is a great way to discard legitimate
+e-mail.  For instance, the very informative and useful RISKS digest
+has been blocked by overzealous mail filters because it
+@strong{contained} words that were common in spam messages.
+Nevertheless, in isolated cases, with great care, direct filtering of
+mail can be useful.
+
+Another approach to filtering e-mail is the distributed spam
+processing, for instance DCC implements such a system.  In essence,
+@code{N} systems around the world agree that a machine @samp{X} in
+China, Ghana, or California is sending out spam e-mail, and these
+@code{N} systems enter @samp{X} or the spam e-mail from @samp{X} into
+a database.  The criteria for spam detection vary - it may be the
+number of messages sent, the content of the messages, and so on.  When
+a user of the distributed processing system wants to find out if a
+message is spam, he consults one of those @code{N} systems.
+
+Distributed spam processing works very well against spammers that send
+a large number of messages at once, but it requires the user to set up
+fairly complicated checks.  There are commercial and free distributed
+spam processing systems.  Distributed spam processing has its risks as
+well.  For instance legitimate e-mail senders have been accused of
+sending spam, and their web sites have been shut down for some time
+because of the incident.
+
+The statistical approach to spam filtering is also popular.  It is
+based on a statistical analysis of previous spam messages.  Usually
+the analysis is a simple word frequency count, with perhaps pairs or
+words or 3-word combinations thrown into the mix.  Statistical
+analysis of spam works very well in most of the cases, but it can
+classify legitimate e-mail as spam in some cases.  It takes time to
+run the analysis, the full message must be analyzed, and the user has
+to store the database of spam analyses.
+
  @node Anti-Spam Basics
  @subsection Anti-Spam Basics
  @cindex email spam
@@ -21146,6 +21293,531 @@ hashcash cookies, it is expected that this is performed by your hand
  customized mail filtering scripts.  Improvements in this area would be
  a useful contribution, however.
  
+@node Filtering Spam Using spam.el
+@subsection Filtering Spam Using spam.el
+@cindex spam filtering
+@cindex spam.el
+
+The idea behind @code{spam.el} is to have a control center for spam detection
+and filtering in Gnus.  To that end, @code{spam.el} does two things: it
+filters incoming mail, and it analyzes mail known to be spam.
+
+So, what happens when you load @code{spam.el}?  First of all, you get
+the following keyboard commands:
+
+@table @kbd
+
+@item M-d
+@itemx S x
+@kindex M-d
+@kindex S x
+@findex gnus-summary-mark-as-spam
+(@code{gnus-summary-mark-as-spam})
+
+Mark current article as spam, showing it with the @samp{H} mark.
+Whenever you see a spam article, make sure to mark its summary line
+with @kbd{M-d} before leaving the group.
+
+@item S t
+@kindex S t
+@findex spam-bogofilter-score
+(@code{spam-bogofilter-score}
+
+You must have bogofilter processing enabled for that command to work
+properly.
+
+@xref{Bogofilter}.
+
+@end table
+
+@strong{FIXME!  The justification for @kbd{M-d} is that this is what Paul Graham
+suggests in his original article, and what Eric Raymond's patch for Mutt
+uses.  But more importantly, that binding was still free in Summary mode!}
+
+@strong{FIXME!  Lars has not blessed the following key bindings yet.  It looks
+convenient that the score analysis command uses a sequence ending with the
+letter @kbd{t}, so it nicely parallels @kbd{B t} or @kbd{V t}.  @kbd{M-d} is a kind of
+"alternate" @kbd{d}, it is also the sequence suggested in Paul Graham article,
+and also in Eric Raymond's patch for Mutt.  @kbd{S x} might be the more
+official key binding for @kbd{M-d}.}
+
+Gnus can learn from the spam you get.  All you have to do is collect
+your spam in one or more spam groups, and set the variable
+@code{spam-junk-mailgroups} as appropriate.  In these groups, all messages
+are considered to be spam by default: they get the @samp{H} mark.  You must
+review these messages from time to time and remove the @samp{H} mark for
+every message that is not spam after all.  When you leave a a spam
+group, all messages that continue with the @samp{H} mark, are passed on to
+the spam-detection engine (bogofilter, ifile, and others).  To remove
+the @samp{H} mark, you can use @kbd{M-u} to "unread" the article, or @kbd{d} for
+declaring it read the non-spam way.  When you leave a group, all @samp{H}
+marked articles, saved or unsaved, are sent to Bogofilter or ifile
+(depending on @code{spam-use-bogofilter} and @code{spam-use-ifile}), which will study
+them as spam samples.
+
+Messages may also be deleted in various other ways, and unless
+@code{`spam-ham-marks-form} gets overridden below, marks @samp{R} and @samp{r} for
+default read or explicit delete, marks @samp{X} and @samp{K} for automatic or
+explicit kills, as well as mark @samp{Y} for low scores, are all considered
+to be associated with articles which are not spam.  This assumption
+might be false, in particular if you use kill files or score files as
+means for detecting genuine spam, you should then adjust
+@code{spam-ham-marks-form}.  When you leave a group, all _unsaved_ articles
+bearing any the above marks are sent to Bogofilter or ifile, which
+will study these as not-spam samples.  If you explicit kill a lot, you
+might sometimes end up with articles marked @samp{K} which you never saw,
+and which might accidentally contain spam.  Best is to make sure that
+real spam is marked with @samp{H}, and nothing else.
+
+All other marks do not contribute to Bogofilter or ifile
+pre-conditioning.  In particular, ticked, dormant or souped articles
+are likely to contribute later, when they will get deleted for real,
+so there is no need to use them prematurely.  Explicitly expired
+articles do not contribute, command @kbd{E} is a way to get rid of an
+article without Bogofilter or ifile ever seeing it.
+
+@strong{TODO: @code{spam-use-ifile} does not process spam articles on group exit.
+I'm waiting for info from the author of @code{ifile-gnus.el}, because I think
+that functionality should go in @code{ifile-gnus.el} rather than @code{spam.el}.}
+
+To use the @code{spam.el} facilities for incoming mail filtering, you
+must add the following to your fancy split list
+(@code{nnmail-split-fancy} or @code{nnimap-split-fancy}:
+
+@example
+(: spam-split)
+@end example
+
+Note that the fancy split may be called @code{nnmail-split-fancy} or
+@code{nnimap-split-fancy}, depending on whether you use the nnmail or
+nnimap backends to retrieve your mail.
+
+The @code{spam-split} function will process incoming mail and send the mail
+considered to be spam into the group name given by the variable
+@code{spam-split-group}.  Usually that group name is @samp{spam}.
+
+The following are the methods you can use to control the behavior of
+@code{spam-split}:
+
+@menu
+* Blacklists and Whitelists::   
+* BBDB Whitelists::             
+* Blackholes::                  
+* Bogofilter::                  
+* Ifile spam filtering::        
+* Extending spam.el::           
+@end menu
+
+@node Blacklists and Whitelists
+@subsubsection Blacklists and Whitelists
+@cindex spam filtering
+@cindex whitelists, spam filtering
+@cindex blacklists, spam filtering
+@cindex spam.el
+
+@defvar spam-use-blacklist
+Set this variables to t (the default) if you want to use blacklists.
+@end defvar
+
+@defvar spam-use-whitelist
+Set this variables to t if you want to use whitelists.
+@end defvar
+
+Blacklists are lists of regular expressions matching addresses you
+consider to be spam senders.  For instance, to block mail from any
+sender at @samp{vmadmin.com}, you can put @samp{vmadmin.com} in your
+blacklist.  Since you start out with an empty blacklist, no harm is
+done by having the @code{spam-use-blacklist} variable set, so it is
+set by default.  Blacklist entries use the Emacs regular expression
+syntax.
+
+Conversely, whitelists tell Gnus what addresses are considered
+legitimate.  All non-whitelisted addresses are considered spammers.
+This option is probably not useful for most Gnus users unless the
+whitelists is very comprehensive.  Also see @ref{BBDB Whitelists}.
+Whitelist entries use the Emacs regular expression syntax.
+
+The Blacklist and whitelist location can be customized with the
+@code{spam-directory} variable (@file{~/News/spam} by default).  The whitelist
+and blacklist files will be in that directory, named @file{whitelist} and
+@file{blacklist} respectively.
+
+@node BBDB Whitelists
+@subsubsection BBDB Whitelists
+@cindex spam filtering
+@cindex BBDB whitelists, spam filtering
+@cindex BBDB, spam filtering
+@cindex spam.el
+
+@defvar spam-use-bbdb
+
+Analogous to @code{spam-use-whitelist} (@pxref{Blacklists and
+Whitelists}), but uses the BBDB as the source of whitelisted addresses,
+without regular expressions.  You must have the BBDB loaded for
+@code{spam-use-bbdb} to work properly.  Only addresses in the BBDB
+will be allowed through; all others will be classified as spam.
+
+@end defvar
+
+@node Blackholes
+@subsubsection Blackholes
+@cindex spam filtering
+@cindex blackholes, spam filtering
+@cindex spam.el
+
+@defvar spam-use-blackholes
+
+You can let Gnus consult the blackhole-type distributed spam
+processing systems (DCC, for instance) when you set this option.  The
+variable @code{spam-blackhole-servers} holds the list of blackhole servers
+Gnus will consult.
+
+This variable is disabled by default.  It is not recommended at this
+time because of bugs in the @code{dns.el} code.
+
+@end defvar
+
+@node Bogofilter
+@subsubsection Bogofilter
+@cindex spam filtering
+@cindex bogofilter, spam filtering
+@cindex spam.el
+
+@defvar spam-use-bogofilter
+
+Set this variable if you want to use Eric Raymond's speedy Bogofilter.
+This has been tested with a locally patched copy of version 0.4.  Make
+sure to read the installation comments in @code{spam.el}.
+
+With a minimum of care for associating the @samp{H} mark for spam
+articles only, Bogofilter training all gets fairly automatic.  You
+should do this until you get a few hundreds of articles in each
+category, spam or not.  The shell command @command{head -1
+~/.bogofilter/*} shows both article counts.  The command @kbd{S t} in
+summary mode, either for debugging or for curiosity, triggers
+Bogofilter into displaying in another buffer the @emph{spamicity}
+score of the current article (between 0.0 and 1.0), together with the
+article words which most significantly contribute to the score.
+
+@end defvar
+
+@node Ifile spam filtering
+@subsubsection Ifile spam filtering
+@cindex spam filtering
+@cindex ifile, spam filtering
+@cindex spam.el
+
+@defvar spam-use-ifile
+
+Enable this variable if you want to use Ifile, a statistical analyzer
+similar to Bogofilter.  Currently you must have @code{ifile-gnus.el}
+loaded.  The integration of Ifile with @code{spam.el} is not finished
+yet, but you can use @code{ifile-gnus.el} on its own if you like.
+
+@end defvar
+
+@node Extending spam.el
+@subsubsection Extending spam.el
+@cindex spam filtering
+@cindex spam.el, extending
+@cindex extending spam.el
+
+Say you want to add a new backend called blackbox.  Provide the following:
+
+@enumerate
+@item documentation
+
+@item code
+
+@example
+(defvar spam-use-blackbox nil
+  "True if blackbox should be used.")
+@end example
+
+Add
+@example
+    (spam-use-blackbox  . spam-check-blackbox)
+@end example
+to @code{spam-list-of-checks}.
+
+@item functionality
+Write the @code{spam-check-blackbox} function.  It should return
+@samp{nil} or @code{spam-split-group}.  See the existing
+@code{spam-check-*} functions for examples of what you can do.
+@end enumerate
+
+@node Filtering Spam Using Statistics (spam-stat.el)
+@subsection Filtering Spam Using Statistics (spam-stat.el)
+@cindex Paul Graham
+@cindex Graham, Paul
+@cindex naive Bayesian spam filtering
+@cindex Bayesian spam filtering, naive
+@cindex spam filtering, naive Bayesian
+
+Paul Graham has written an excellent essay about spam filterung using
+statisticts: @uref{http://www.paulgraham.com/spam.html,A Plan for
+Spam}.  In it he describes the inherent deficiency of rule-based
+filtering as used by SpamAssassin, for example: Somebody has to write
+the rules, and everybody else has to install these rules.  You are
+always late.  It would be much better, he argues, to filter mail based
+on wether it somehow resembles spam or non-spam.  One way to measure
+this is word distribution.  He then goes on to describe a solution
+that checks wether a new mail resembles any of your other spam mails
+or not.
+
+The basic idea is this:  Create a two collections of your mail, one
+with spam, one with non-spam.  Count how often each word appears in
+either collection, weight this by the total number of mails in the
+collections, and store this information in a dictionary.  For every
+word in a new mail, determine its probability to belong to a spam or a
+non-spam mail.  Use the 15 most conspicuous words, compute the total
+probability of the mail being spam.  If this probability is higher
+than a certain threshold, the mail is considered to be spam.
+
+Gnus supports this kind of filtering.  But it needs some setting up.
+First, you need two collections of your mail, one with spam, one with
+non-spam.  Then you need to create a dictionary using these two
+collections, and save it.  And last but not least, you need to use
+this dictionary in your fancy mail splitting rules.
+
+@menu
+* Creating a spam-stat dictionary::  
+* Splitting mail using spam-stat::  
+* Low-level interface to the spam-stat dictionary::  
+@end menu
+
+@node Creating a spam-stat dictionary
+@subsubsection Creating a spam-stat dictionary
+
+Before you can begin to filter spam based on statistics, you must
+create these statistics based on two mail collections, one with spam,
+one with non-spam.  These statistics are then stored in a dictionary
+for later use.  In order for these statistics to be meaningfull, you
+need several hundred emails in both collections.
+
+Gnus currently supports only the nnml backend for automated dictionary
+creation.  The nnml backend stores all mails in a directory, one file
+per mail.  Use the following 
+
+@defun spam-stat-process-spam-directory
+Create spam statistics for every file in this directory.  Every file
+is treated as one spam mail.
+@end defun
+
+@defun spam-stat-process-non-spam-directory
+Create non-spam statistics for every file in this directory.  Every
+file is treated as one non-spam mail.
+@end defun
+
+Usually you would call @code{spam-stat-process-spam-directory} on a
+directory such as @file{~/Mail/mail/spam} (this usually corresponds
+the the group @samp{nnml:mail.spam}), and you would call
+@code{spam-stat-process-non-spam-directory} on a directory such as
+@file{~/Mail/mail/misc} (this usually corresponds the the group
+@samp{nnml:mail.misc}).
+
+@defvar spam-stat
+This variable holds the hash-table with all the statistics -- the
+dictionary we have been talking about.  For every word in either
+collection, this hash-table stores a vector describing how often the
+word appeared in spam and often it appeared in non-spam mails.
+
+If you want to regenerate the statistics from scratch, you need to
+reset the dictionary.
+
+@end defvar
+
+@defun spam-stat-reset
+Reset the @code{spam-stat} hash-table, deleting all the statistics.
+
+When you are done, you must save the dictionary.  The dictionary may
+be rather large.  If you will not update the dictionary incrementally
+(instead, you will recreate it once a month, for example), then you
+can reduce the size of the dictionary by deleting all words that did
+not appear often enough or that do not clearly belong to only spam or
+only non-spam mails.
+@end defun
+
+@defun spam-stat-reduce-size
+Reduce the size of the dictionary.  Use this only if you do not want
+to update the dictionary incrementally.
+@end defun
+
+@defun spam-stat-save
+Save the dictionary.
+@end defun
+
+@defvar spam-stat-file
+The filename used to store the dictionary.  This defaults to
+@file{~/.spam-stat.el}.
+@end defvar
+
+@node Splitting mail using spam-stat
+@subsubsection Splitting mail using spam-stat
+
+In order to use @code{spam-stat} to split your mail, you need to add the
+following to your @file{~/.gnus} file:
+
+@example
+(require 'spam-stat)
+(spam-stat-load)
+@end example
+
+This will load the necessary Gnus code, and the dictionary you
+created.
+
+Next, you need to adapt your fancy splitting rules:  You need to
+determine how to use @code{spam-stat}.  In the simplest case, you only have
+two groups, @samp{mail.misc} and @samp{mail.spam}.  The following expression says
+that mail is either spam or it should go into @samp{mail.misc}.  If it is
+spam, then @code{spam-stat-split-fancy} will return @samp{mail.spam}.
+
+@example
+(setq nnmail-split-fancy
+      `(| (: spam-stat-split-fancy)
+         "mail.misc"))
+@end example
+
+@defvar spam-stat-split-fancy-spam-group
+The group to use for spam.  Default is @samp{mail.spam}.
+@end defvar
+
+If you also filter mail with specific subjects into other groups, use
+the following expression.  It only the mails not matching the regular
+expression are considered potential spam.
+
+@example
+(setq nnmail-split-fancy
+      `(| ("Subject" "\\bspam-stat\\b" "mail.emacs")
+         (: spam-stat-split-fancy)
+         "mail.misc"))
+@end example
+
+If you want to filter for spam first, then you must be careful when
+creating the dictionary.  Note that @code{spam-stat-split-fancy} must
+consider both mails in @samp{mail.emacs} and in @samp{mail.misc} as
+non-spam, therefore both should be in your collection of non-spam
+mails, when creating the dictionary!
+
+@example
+(setq nnmail-split-fancy
+      `(| (: spam-stat-split-fancy)
+          ("Subject" "\\bspam-stat\\b" "mail.emacs")
+         "mail.misc"))
+@end example
+
+You can combine this with traditional filtering.  Here, we move all
+HTML-only mails into the @samp{mail.spam.filtered} group.  Note that since
+@code{spam-stat-split-fancy} will never see them, the mails in
+@samp{mail.spam.filtered} should be neither in your collection of spam mails,
+nor in your collection of non-spam mails, when creating the
+dictionary!
+
+@example
+(setq nnmail-split-fancy
+      `(| ("Content-Type" "text/html" "mail.spam.filtered")
+         (: spam-stat-split-fancy)
+          ("Subject" "\\bspam-stat\\b" "mail.emacs")
+         "mail.misc"))
+@end example
+
+
+@node Low-level interface to the spam-stat dictionary
+@subsubsection Low-level interface to the spam-stat dictionary
+
+The main interface to using @code{spam-stat}, are the following functions:
+
+@defun spam-stat-buffer-is-spam
+called in a buffer, that buffer is considered to be a new spam mail;
+use this for new mail that has not been processed before
+
+@end defun
+
+@defun spam-stat-buffer-is-no-spam
+called in a buffer, that buffer is considered to be a new non-spam
+mail; use this for new mail that has not been processed before
+
+@end defun
+
+@defun spam-stat-buffer-change-to-spam
+called in a buffer, that buffer is no longer considered to be normal
+mail but spam; use this to change the status of a mail that has
+already been processed as non-spam
+
+@end defun
+
+@defun spam-stat-buffer-change-to-non-spam
+called in a buffer, that buffer is no longer considered to be spam but
+normal mail; use this to change the status of a mail that has already
+been processed as spam
+
+@end defun
+
+@defun spam-stat-save
+save the hash table to the file; the filename used is stored in the
+variable @code{spam-stat-file}
+
+@end defun
+
+@defun spam-stat-load
+load the hash table from a file; the filename used is stored in the
+variable @code{spam-stat-file}
+
+@end defun
+
+@defun spam-stat-score-word
+return the spam score for a word
+
+@end defun
+
+@defun spam-stat-score-buffer
+return the spam score for a buffer
+
+@end defun
+
+@defun spam-stat-split-fancy
+for fancy mail splitting; add the rule @samp{(: spam-stat-split-fancy)} to
+@code{nnmail-split-fancy}
+
+This requires the following in your @file{~/.gnus} file:
+
+@example
+(require 'spam-stat)
+(spam-stat-load)
+@end example
+
+@end defun
+
+Typical test will involve calls to the following functions:
+
+@example
+Reset: (setq spam-stat (make-hash-table :test 'equal))
+Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
+Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
+Save table: (spam-stat-save)
+File size: (nth 7 (file-attributes spam-stat-file))
+Number of words: (hash-table-count spam-stat)
+Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
+Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
+Reduce table size: (spam-stat-reduce-size)
+Save table: (spam-stat-save)
+File size: (nth 7 (file-attributes spam-stat-file))
+Number of words: (hash-table-count spam-stat)
+Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
+Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
+@end example
+
+Here is how you would create your dictionary:
+
+@example
+Reset: (setq spam-stat (make-hash-table :test 'equal))
+Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
+Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
+Repeat for any other non-spam group you need...
+Reduce table size: (spam-stat-reduce-size)
+Save table: (spam-stat-save)
+@end example
+
  @node Various Various
  @section Various Various
  @cindex mode lines
@@ -21308,7 +21980,8 @@ XEmacs is distributed as a collection of packages.  You should install
  whatever packages the Gnus XEmacs package requires.  The current
  requirements are @samp{gnus}, @samp{w3}, @samp{mh-e},
  @samp{mailcrypt}, @samp{rmail}, @samp{eterm}, @samp{mail-lib},
-@samp{xemacs-base}, and @samp{fsf-compat}.
+@samp{xemacs-base}, and @samp{fsf-compat}.  The @samp{misc-games}
+package is required for Morse decoding.
  
  
  @node History