-\f
-;;;; Training via Bogofilter. Last updated 2002-09-02.
-
-;;; See Paul Graham article, at `http://www.paulgraham.com/spam.html'.
-
-;;; This page is for those wanting to control spam with the help of Eric
-;;; Raymond's speedy Bogofilter, see http://www.tuxedo.org/~esr/bogofilter.
-;;; This has been tested with a locally patched copy of version 0.4.
-
-;;; Make sure Bogofilter is installed. Bogofilter internally uses Judy fast
-;;; associative arrays, so you need to install Judy first, and Bogofilter
-;;; next. Fetch both distributions by visiting the following links and
-;;; downloading the latest version of each:
-;;;
-;;; http://sourceforge.net/projects/judy/
-;;; http://www.tuxedo.org/~esr/bogofilter/
-;;;
-;;; Unpack the Judy distribution and enter its main directory. Then do:
-;;;
-;;; ./configure
-;;; make
-;;; make install
-;;;
-;;; You will likely need to become super-user for the last step. Then, unpack
-;;; the Bogofilter distribution and enter its main directory:
-;;;
-;;; make
-;;; make install
-;;;
-;;; Here as well, you need to become super-user for the last step. Now,
-;;; initialises your word lists by doing, under your own identity:
-;;;
-;;; mkdir ~/.bogofilter
-;;; touch ~/.bogofilter/badlist
-;;; touch ~/.bogofilter/goodlist
-;;;
-;;; These two files are text files you may edit, but you normally don't!
-
-;;; The `M-d' command gets added to Gnus summary mode, marking current article
-;;; as spam, showing it with the `H' mark. Whenever you see a spam article,
-;;; make sure to mark its summary line with `M-d' before leaving the group.
-;;; Some groups, as per variable `spam-junk-mailgroups' below, receive articles
-;;; from Gnus splitting on clues added by spam recognisers, so for these
-;;; groups, we tack an `H' mark at group entry for all summary lines which
-;;; would otherwise have no other mark. Make sure to _remove_ `H' marks for
-;;; any article which is _not_ genuine spam, before leaving such groups: you
-;;; may use `M-u' to "unread" the article, or `d' for declaring it read the
-;;; non-spam way. When you leave a group, all `H' marked articles, saved or
-;;; unsaved, are sent to Bogofilter which will study them as spam samples.
-
-;;; Messages may also be deleted in various other ways, and unless
-;;; `spam-ham-marks-form' gets overridden below, marks `R' and `r' for default
-;;; read or explicit delete, marks `X' and 'K' for automatic or explicit
-;;; kills, as well as mark `Y' for low scores, are all considered to be
-;;; associated with articles which are not spam. This assumption might be
-;;; false, in particular if you use kill files or score files as means for
-;;; detecting genuine spam, you should then adjust `spam-ham-marks-form'. When
-;;; you leave a group, all _unsaved_ articles bearing any the above marks are
-;;; sent to Bogofilter which will study these as not-spam samples. If you
-;;; explicit kill a lot, you might sometimes end up with articles marked `K'
-;;; which you never saw, and which might accidentally contain spam. Best is
-;;; to make sure that real spam is marked with `H', and nothing else.
-
-;;; All other marks do not contribute to Bogofilter pre-conditioning. In
-;;; particular, ticked, dormant or souped articles are likely to contribute
-;;; later, when they will get deleted for real, so there is no need to use
-;;; them prematurely. Explicitly expired articles do not contribute, command
-;;; `E' is a way to get rid of an article without Bogofilter ever seeing it.
-
-;;; In a word, with a minimum of care for associating the `H' mark for spam
-;;; articles only, Bogofilter training all gets fairly automatic. You should
-;;; do this until you get a few hundreds of articles in each category, spam
-;;; or not. The shell command `head -1 ~/.bogofilter/*' shows both article
-;;; counts. The command `S S' in summary mode, either for debugging or for
-;;; curiosity, triggers Bogofilter into displaying in another buffer the
-;;; "spamicity" score of the current article (between 0.0 and 1.0), together
-;;; with the article words which most significantly contribute to the score.
-
-;;; The real way for using Bogofilter, however, is to have some use tool like
-;;; `procmail' for invoking it on message reception, then adding some
-;;; recognisable header in case of detected spam. Gnus splitting rules might
-;;; later trip on these added headers and react by sorting such articles into
-;;; specific junk folders as per `spam-junk-mailgroups'. Here is a possible
-;;; `.procmailrc' contents (still untested -- please tell me how it goes):
-;;;
-;;; :0HBf:
-;;; * ? bogofilter
-;;; | formail -bfI "X-Spam-Status: Yes"
-
-(defvar spam-output-buffer-name "*Bogofilter Output*"
- "Name of buffer when displaying `bogofilter -v' output.")
-
-(defvar spam-spaminfo-header-regexp
- ;; FIXME! In the following regexp, we should explain which tool produces
- ;; which kind of header. I do not even remember them all by now. X-Junk
- ;; (and previously X-NoSpam) are produced by the `NoSpam' tool, which has
- ;; never been published, so it might not be reasonable leaving it in the
- ;; list.
- "^X-\\(jf\\|Junk\\|NoSpam\\|Spam\\|SB\\)[^:]*:"
- "Regexp for spam markups in headers.
-Markup from spam recognisers, as well as `Xref', are to be removed from
-articles before they get registered by Bogofilter.")
-
-(defvar spam-bogofilter-path (executable-find "bogofilter")
- "File path of the Bogofilter executable program.
-Force this variable to nil if you want to inhibit the functionality.")
-
-(defun spam-check-bogofilter ()
- ;; Dynamic spam check. I do not know how to check the exit status,
- ;; so instead, read `bogofilter -v' output.
- (when (and spam-use-bogofilter spam-bogofilter-path)
- (spam-bogofilter-articles nil "-v" (list (gnus-summary-article-number)))
- (when (save-excursion
- (set-buffer spam-output-buffer-name)
- (goto-char (point-min))
- (re-search-forward "Spamicity: \\(0\\.9\\|1\\.0\\)" nil t))
- spam-split-group)))