3 @setfilename emacs-mime
4 @settitle Emacs MIME Manual
10 * Emacs MIME: (emacs-mime). The MIME de/composition library.
15 @setchapternewpage odd
19 This file documents the Emacs MIME interface functionality.
21 Copyright (C) 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc.
23 Permission is granted to copy, distribute and/or modify this document
24 under the terms of the GNU Free Documentation License, Version 1.1 or
25 any later version published by the Free Software Foundation; with no
26 Invariant Sections, with the Front-Cover texts being ``A GNU
27 Manual'', and with the Back-Cover Texts as in (a) below. A copy of the
28 license is included in the section entitled ``GNU Free Documentation
29 License'' in the Emacs manual.
31 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
32 this GNU Manual, like GNU software. Copies published by the Free
33 Software Foundation raise funds for GNU development.''
35 This document is part of a collection distributed under the GNU Free
36 Documentation License. If you want to distribute this document
37 separately from the collection, you can do so by adding a copy of the
38 license to the document, as described in section 6 of the license.
44 @title Emacs MIME Manual
46 @author by Lars Magne Ingebrigtsen
49 @vskip 0pt plus 1filll
50 Copyright @copyright{} 1998, 1999, 2000, 2001, 2002 Free Software
53 Permission is granted to copy, distribute and/or modify this document
54 under the terms of the GNU Free Documentation License, Version 1.1 or
55 any later version published by the Free Software Foundation; with the
56 Invariant Sections being none, with the Front-Cover texts being ``A GNU
57 Manual'', and with the Back-Cover Texts as in (a) below. A copy of the
58 license is included in the section entitled ``GNU Free Documentation
59 License'' in the Emacs manual.
61 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
62 this GNU Manual, like GNU software. Copies published by the Free
63 Software Foundation raise funds for GNU development.''
65 This document is part of a collection distributed under the GNU Free
66 Documentation License. If you want to distribute this document
67 separately from the collection, you can do so by adding a copy of the
68 license to the document, as described in section 6 of the license.
77 This manual documents the libraries used to compose and display
80 This manual is directed at users who want to modify the behaviour of
81 the MIME encoding/decoding process or want a more detailed picture of
82 how the Emacs MIME library works, and people who want to write
83 functions and commands that manipulate @sc{mime} elements.
85 @sc{mime} is short for @dfn{Multipurpose Internet Mail Extensions}.
86 This standard is documented in a number of RFCs; mainly RFC2045 (Format
87 of Internet Message Bodies), RFC2046 (Media Types), RFC2047 (Message
88 Header Extensions for Non-ASCII Text), RFC2048 (Registration
89 Procedures), RFC2049 (Conformance Criteria and Examples). It is highly
90 recommended that anyone who intends writing @sc{mime}-compliant software
91 read at least RFC2045 and RFC2047.
94 * Decoding and Viewing:: A framework for decoding and viewing.
95 * Composing:: MML; a language for describing @sc{mime} parts.
96 * Interface Functions:: An abstraction over the basic functions.
97 * Basic Functions:: Utility and basic parsing functions.
98 * Standards:: A summary of RFCs and working documents used.
99 * Index:: Function and variable index.
103 @node Decoding and Viewing
104 @chapter Decoding and Viewing
106 This chapter deals with decoding and viewing @sc{mime} messages on a
109 The main idea is to first analyze a @sc{mime} article, and then allow
110 other programs to do things based on the list of @dfn{handles} that are
111 returned as a result of this analysis.
114 * Dissection:: Analyzing a @sc{mime} message.
115 * Non-MIME:: Analyzing a non-@sc{mime} message.
116 * Handles:: Handle manipulations.
117 * Display:: Displaying handles.
118 * Display Customization:: Variables that affect display.
119 * New Viewers:: How to write your own viewers.
126 The @code{mm-dissect-buffer} is the function responsible for dissecting
127 a @sc{mime} article. If given a multipart message, it will recursively
128 descend the message, following the structure, and return a tree of
129 @sc{mime} handles that describes the structure of the message.
134 Gnus also understands some non-@sc{mime} attachments, such as
135 postscript, uuencode, binhex, shar, forward, gnatsweb, pgp. Each of
136 these features can be disabled by add an item into
137 @code{mm-uu-configure-list}. For example,
141 (add-to-list 'mm-uu-configure-list '(pgp-signed . disabled))
163 Non-@sc{mime} forwarded message.
171 PGP signed clear text.
174 @findex pgp-encrypted
175 PGP encrypted clear text.
182 @findex emacs-sources
183 Emacs source code. This item works only in the groups matching
184 @code{mm-uu-emacs-sources-regexp}.
191 A @sc{mime} handle is a list that fully describes a @sc{mime}
194 The following macros can be used to access elements in a handle:
197 @item mm-handle-buffer
198 @findex mm-handle-buffer
199 Return the buffer that holds the contents of the undecoded @sc{mime}
203 @findex mm-handle-type
204 Return the parsed @code{Content-Type} of the part.
206 @item mm-handle-encoding
207 @findex mm-handle-encoding
208 Return the @code{Content-Transfer-Encoding} of the part.
210 @item mm-handle-undisplayer
211 @findex mm-handle-undisplayer
212 Return the object that can be used to remove the displayed part (if it
215 @item mm-handle-set-undisplayer
216 @findex mm-handle-set-undisplayer
217 Set the undisplayer object.
219 @item mm-handle-disposition
220 @findex mm-handle-disposition
221 Return the parsed @code{Content-Disposition} of the part.
223 @item mm-handle-disposition
224 @findex mm-handle-disposition
225 Return the description of the part.
227 @item mm-get-content-id
228 Returns the handle(s) referred to by @code{Content-ID}.
236 Functions for displaying, removing and saving.
239 @item mm-display-part
240 @findex mm-display-part
244 @findex mm-remove-part
245 Remove the part (if it has been displayed).
248 @findex mm-inlinable-p
249 Say whether a @sc{mime} type can be displayed inline.
251 @item mm-automatic-display-p
252 @findex mm-automatic-display-p
253 Say whether a @sc{mime} type should be displayed automatically.
255 @item mm-destroy-part
256 @findex mm-destroy-part
257 Free all resources occupied by a part.
261 Offer to save the part in a file.
265 Offer to pipe the part to some process.
267 @item mm-interactively-view-part
268 @findex mm-interactively-view-part
269 Prompt for a mailcap method to use to view the part.
274 @node Display Customization
275 @section Display Customization
279 @item mm-inline-media-tests
280 This is an alist where the key is a @sc{mime} type, the second element
281 is a function to display the part @dfn{inline} (i.e., inside Emacs), and
282 the third element is a form to be @code{eval}ed to say whether the part
283 can be displayed inline.
285 This variable specifies whether a part @emph{can} be displayed inline,
286 and, if so, how to do it. It does not say whether parts are
287 @emph{actually} displayed inline.
289 @item mm-inlined-types
290 This, on the other hand, says what types are to be displayed inline, if
291 they satisfy the conditions set by the variable above. It's a list of
292 @sc{mime} media types.
294 @item mm-automatic-display
295 This is a list of types that are to be displayed ``automatically'', but
296 only if the above variable allows it. That is, only inlinable parts can
297 be displayed automatically.
299 @item mm-attachment-override-types
300 Some @sc{mime} agents create parts that have a content-disposition of
301 @samp{attachment}. This variable allows overriding that disposition and
302 displaying the part inline. (Note that the disposition is only
303 overridden if we are able to, and want to, display the part inline.)
305 @item mm-discouraged-alternatives
306 List of @sc{mime} types that are discouraged when viewing
307 @samp{multipart/alternative}. Viewing agents are supposed to view the
308 last possible part of a message, as that is supposed to be the richest.
309 However, users may prefer other types instead, and this list says what
310 types are most unwanted. If, for instance, @samp{text/html} parts are
311 very unwanted, and @samp{text/richtech} parts are somewhat unwanted,
312 you could say something like:
315 (setq mm-discouraged-alternatives
316 '("text/html" "text/richtext")
318 (remove "text/html" mm-automatic-display))
321 @item mm-inline-large-images-p
322 When displaying inline images that are larger than the window, XEmacs
323 does not enable scrolling, which means that you cannot see the whole
324 image. To prevent this, the library tries to determine the image size
325 before displaying it inline, and if it doesn't fit the window, the
326 library will display it externally (e.g. with @samp{ImageMagick} or
327 @samp{xv}). Setting this variable to @code{t} disables this check and
328 makes the library display all inline images as inline, regardless of
331 @item mm-inline-override-type
332 @code{mm-inlined-types} may include regular expressions, for example to
333 specify that all @samp{text/.*} parts be displayed inline. If a user
334 prefers to have a type that matches such a regular expression be treated
335 as an attachment, that can be accomplished by setting this variable to a
336 list containing that type. For example assuming @code{mm-inlined-types}
337 includes @samp{text/.*}, then including @samp{text/html} in this
338 variable will cause @samp{text/html} parts to be treated as attachments.
340 @item mm-inline-text-html-renderer
341 This selects the function used to render @sc{html}. The predefined
342 renderers are selected by the symbols @code{w3},
343 @code{w3m}@footnote{See @uref{http://emacs-w3m.namazu.org/} for more
344 information about emacs-w3m}, @code{links}, @code{lynx} or
345 @code{html2text}. You can also specify a function, which will be
346 called with a @sc{mime} handle as the argument.
348 @item mm-inline-text-html-with-images
349 Some @sc{html} mails might have the trick of spammers using
350 @samp{<img>} tags. It is likely to be intended to verify whether you
351 have read the mail. You can prevent your personal informations from
352 leaking by setting this option to @code{nil} (which is the default).
353 It is currently ignored by Emacs/w3. For emacs-w3m, you may use the
354 command @kbd{t} on the image anchor to show an image even if it is
355 @code{nil}.@footnote{The command @kbd{T} will load all images. If you
356 have set the option @code{w3m-key-binding} to @code{info}, use @kbd{i}
359 @item mm-inline-text-html-with-w3m-keymap
360 You can use emacs-w3m command keys in the inlined text/html part by
361 setting this option to non-@code{nil}. The default value is @code{t}.
369 Here's an example viewer for displaying @code{text/enriched} inline:
372 (defun mm-display-enriched-inline (handle)
375 (mm-insert-part handle)
376 (save-window-excursion
377 (enriched-decode (point-min) (point-max))
378 (setq text (buffer-string))))
379 (mm-insert-inline handle text)))
382 We see that the function takes a @sc{mime} handle as its parameter. It
383 then goes to a temporary buffer, inserts the text of the part, does some
384 work on the text, stores the result, goes back to the buffer it was
385 called from and inserts the result.
387 The two important helper functions here are @code{mm-insert-part} and
388 @code{mm-insert-inline}. The first function inserts the text of the
389 handle in the current buffer. It handles charset and/or content
390 transfer decoding. The second function just inserts whatever text you
391 tell it to insert, but it also sets things up so that the text can be
392 ``undisplayed' in a convenient manner.
398 @cindex MIME Composing
400 @cindex MIME Meta Language
402 Creating a @sc{mime} message is boring and non-trivial. Therefore, a
403 library called @code{mml} has been defined that parses a language called
404 MML (@sc{mime} Meta Language) and generates @sc{mime} messages.
406 @findex mml-generate-mime
407 The main interface function is @code{mml-generate-mime}. It will
408 examine the contents of the current (narrowed-to) buffer and return a
409 string containing the @sc{mime} message.
412 * Simple MML Example:: An example MML document.
413 * MML Definition:: All valid MML elements.
414 * Advanced MML Example:: Another example MML document.
415 * Encoding Customization:: Variables that affect encoding.
416 * Charset Translation:: How charsets are mapped from @sc{mule} to @sc{mime}.
417 * Conversion:: Going from @sc{mime} to MML and vice versa.
418 * Flowed text:: Soft and hard newlines.
422 @node Simple MML Example
423 @section Simple MML Example
425 Here's a simple @samp{multipart/alternative}:
428 <#multipart type=alternative>
429 This is a plain text part.
430 <#part type=text/enriched>
431 <center>This is a centered enriched part</center>
435 After running this through @code{mml-generate-mime}, we get this:
438 Content-Type: multipart/alternative; boundary="=-=-="
444 This is a plain text part.
447 Content-Type: text/enriched
450 <center>This is a centered enriched part</center>
457 @section MML Definition
459 The MML language is very simple. It looks a bit like an SGML
460 application, but it's not.
462 The main concept of MML is the @dfn{part}. Each part can be of a
463 different type or use a different charset. The way to delineate a part
464 is with a @samp{<#part ...>} tag. Multipart parts can be introduced
465 with the @samp{<#multipart ...>} tag. Parts are ended by the
466 @samp{<#/part>} or @samp{<#/multipart>} tags. Parts started with the
467 @samp{<#part ...>} tags are also closed by the next open tag.
469 There's also the @samp{<#external ...>} tag. These introduce
470 @samp{external/message-body} parts.
472 Each tag can contain zero or more parameters on the form
473 @samp{parameter=value}. The values may be enclosed in quotation marks,
474 but that's not necessary unless the value contains white space. So
475 @samp{filename=/home/user/#hello$^yes} is perfectly valid.
477 The following parameters have meaning in MML; parameters that have no
478 meaning are ignored. The MML parameter names are the same as the
479 @sc{mime} parameter names; the things in the parentheses say which
480 header it will be used in.
484 The @sc{mime} type of the part (@code{Content-Type}).
487 Use the contents of the file in the body of the part
488 (@code{Content-Disposition}).
491 The contents of the body of the part are to be encoded in the character
492 set speficied (@code{Content-Type}). @xref{Charset Translation}.
495 Might be used to suggest a file name if the part is to be saved
496 to a file (@code{Content-Type}).
499 Valid values are @samp{inline} and @samp{attachment}
500 (@code{Content-Disposition}).
503 Valid values are @samp{7bit}, @samp{8bit}, @samp{quoted-printable} and
504 @samp{base64} (@code{Content-Transfer-Encoding}). @xref{Charset
508 A description of the part (@code{Content-Description}).
511 RFC822 date when the part was created (@code{Content-Disposition}).
513 @item modification-date
514 RFC822 date when the part was modified (@code{Content-Disposition}).
517 RFC822 date when the part was read (@code{Content-Disposition}).
520 Who to encrypt/sign the part to. This field is used to override any
521 auto-detection based on the To/CC headers.
524 The size (in octets) of the part (@code{Content-Disposition}).
527 What technology to sign this MML part with (@code{smime}, @code{pgp}
531 What technology to encrypt this MML part with (@code{smime},
532 @code{pgp} or @code{pgpmime})
536 Parameters for @samp{application/octet-stream}:
540 Type of the part; informal---meant for human readers
541 (@code{Content-Type}).
544 Parameters for @samp{message/external-body}:
548 A word indicating the supported access mechanism by which the file may
549 be obtained. Values include @samp{ftp}, @samp{anon-ftp}, @samp{tftp},
550 @samp{localfile}, and @samp{mailserver}. (@code{Content-Type}.)
553 The RFC822 date after which the file may no longer be fetched.
554 (@code{Content-Type}.)
557 The size (in octets) of the file. (@code{Content-Type}.)
560 Valid values are @samp{read} and @samp{read-write}
561 (@code{Content-Type}).
565 Parameters for @samp{sign=smime}:
570 File containing key and certificate for signer.
574 Parameters for @samp{encrypt=smime}:
579 File containing certificate for recipient.
584 @node Advanced MML Example
585 @section Advanced MML Example
587 Here's a complex multipart message. It's a @samp{multipart/mixed} that
588 contains many parts, one of which is a @samp{multipart/alternative}.
591 <#multipart type=mixed>
592 <#part type=image/jpeg filename=~/rms.jpg disposition=inline>
593 <#multipart type=alternative>
594 This is a plain text part.
595 <#part type=text/enriched name=enriched.txt>
596 <center>This is a centered enriched part</center>
598 This is a new plain text part.
599 <#part disposition=attachment>
600 This plain text part is an attachment.
604 And this is the resulting @sc{mime} message:
607 Content-Type: multipart/mixed; boundary="=-=-="
615 Content-Type: image/jpeg;
617 Content-Disposition: inline;
619 Content-Transfer-Encoding: base64
621 /9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRof
622 Hh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/wAALCAAwADABAREA/8QAHwAA
623 AQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQR
624 BRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RF
625 RkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ip
626 qrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/9oACAEB
627 AAA/AO/rifFHjldNuGsrDa0qcSSHkA+gHrXKw+LtWLrMb+RgTyhbr+HSug07xNqV9fQtZrNI
628 AyiaE/NuBPOOOP0rvRNE880KOC8TbXXGCv1FPqjrF4LDR7u5L7SkTFT/ALWOP1xXgTuXfc7E
629 sx6nua6rwp4IvvEM8chCxWxOdzn7wz6V9AaB4S07w9p5itow0rDLSY5Pt9K43xO66P4xs71m
630 2QXiGCbA4yOVJ9+1aYORkdK434lyNH4ahCnG66VT9Nj15JFbPdX0MS43M4VQf5/yr2vSpLnw
631 5ZW8dlCZ8KFXjOPX0/mK6rSPEGt3Angu44fNEReHYNvIH3TzXDeKNO8RX+kSX2ouZkicTIOc
632 L+g7E810ulFjpVtv3bwgB3HJyK5L4quY/C9sVxk3ij/xx6850u7t1mtp/wDlpEw3An3Jr3Dw
633 34gsbWza4nBlhC5LDsaW6+IFgupQyCF3iHH7gA7c9R9ay7zx6t7aX9jHC4smhfBkGCvHGfrm
634 tLQ7hbnRrV1GPkAP1x1/Hr+Ncr8Vzjwrbf8AX6v/AKA9eQRyYlQk8Yx9K6XTNbkgia2ciSIn
635 7p5Ga9Atte0LTLKO6it4i7dVRFJDcZ4PvXN+JvEMF9bILVGXJLSZ4zkjivRPDaeX4b08HOTC
636 pOffmua+KkbS+GLVUGT9tT/0B68eeIpIFYjB70+OOVXyoOM9+M1eaWeCLzHPyHGO/NVWvJJm
637 jQ8KGH1NfQWhXSXmh2c8eArRLwO3HSv/2Q==
640 Content-Type: multipart/alternative; boundary="==-=-="
646 This is a plain text part.
649 Content-Type: text/enriched;
653 <center>This is a centered enriched part</center>
659 This is a new plain text part.
662 Content-Disposition: attachment
665 This plain text part is an attachment.
670 @node Encoding Customization
671 @section Encoding Customization
675 @item mm-body-charset-encoding-alist
676 @vindex mm-body-charset-encoding-alist
677 Mapping from MIME charset to encoding to use. This variable is
678 usually used except, e.g., when other requirements force a specific
679 encoding (digitally signed messages require 7bit encodings). The
680 default is @code{((iso-2022-jp . 7bit) (iso-2022-jp-2 . 7bit))}. As
681 an example, if you do not want to have ISO-8859-1 characters
682 quoted-printable encoded, you may add @code{(iso-8859-1 . 8bit)} to
683 this variable. You can override this setting on a per-message basis
684 by using the @code{encoding} MML tag (@pxref{MML Definition}).
686 @item mm-coding-system-priorities
687 @vindex mm-coding-system-priorities
688 Prioritize coding systems to use for outgoing messages. The default
689 is nil, which means to use the defaults in Emacs. It is a list of
690 coding system symbols (aliases of coding systems does not work, use
691 @kbd{M-x describe-coding-system} to make sure you are not specifying
692 an alias in this variable). For example, if you have configured Emacs
693 to use prefer UTF-8, but wish that outgoing messages should be sent in
694 ISO-8859-1 if possible, you can set this variable to
695 @code{(iso-latin-1)}.
697 @item mm-content-transfer-encoding-defaults
698 @vindex mm-content-transfer-encoding-defaults
699 Mapping from MIME types to encoding to use. This variable is usually
700 used except, e.g., when other requirements force a safer encoding
701 (digitally signed messages require 7bit encoding). Besides the normal
702 MIME encodings, @code{qp-or-base64} may be used to indicate that for
703 each case the most efficient of quoted-printable and base64 should be
704 used. You can override this setting on a per-message basis by using
705 the @code{encoding} MML tag (@pxref{MML Definition}).
707 @item mm-use-ultra-safe-encoding
708 @vindex mm-use-ultra-safe-encoding
709 When this is non-nil, it means that textual parts are encoded as
710 quoted-printable if they contain lines longer than 76 characters or
711 starting with "From " in the body. Non-7bit encodings (8bit, binary)
712 are generally disallowed. This reduce the probability that a non-8bit
713 clean MTA or MDA changes the message. This should never be set
714 directly, but bound by other functions when necessary (e.g., when
715 encoding messages that are to be digitally signed).
719 @node Charset Translation
720 @section Charset Translation
723 During translation from MML to @sc{mime}, for each @sc{mime} part which
724 has been composed inside Emacs, an appropriate charset has to be chosen.
726 @vindex mail-parse-charset
727 If you are running a non-@sc{mule} Emacs, this process is simple: If the
728 part contains any non-ASCII (8-bit) characters, the @sc{mime} charset
729 given by @code{mail-parse-charset} (a symbol) is used. (Never set this
730 variable directly, though. If you want to change the default charset,
731 please consult the documentation of the package which you use to process
733 @xref{Various Message Variables, , Various Message Variables, message,
734 Message Manual}, for example.)
735 If there are only ASCII characters, the @sc{mime} charset US-ASCII is
741 @vindex mm-mime-mule-charset-alist
742 Things are slightly more complicated when running Emacs with @sc{mule}
743 support. In this case, a list of the @sc{mule} charsets used in the
744 part is obtained, and the @sc{mule} charsets are translated to @sc{mime}
745 charsets by consulting the variable @code{mm-mime-mule-charset-alist}.
746 If this results in a single @sc{mime} charset, this is used to encode
747 the part. But if the resulting list of @sc{mime} charsets contains more
748 than one element, two things can happen: If it is possible to encode the
749 part via UTF-8, this charset is used. (For this, Emacs must support
750 the @code{utf-8} coding system, and the part must consist entirely of
751 characters which have Unicode counterparts.) If UTF-8 is not available
752 for some reason, the part is split into several ones, so that each one
753 can be encoded with a single @sc{mime} charset. The part can only be
754 split at line boundaries, though---if more than one @sc{mime} charset is
755 required to encode a single line, it is not possible to encode the part.
757 When running Emacs with @sc{mule} support, the preferences for which
758 coding system to use is inherited from Emacs itself. This means that
759 if Emacs is set up to prefer UTF-8, it will be used when encoding
760 messages. You can modify this by altering the
761 @code{mm-coding-system-priorities} variable though (@pxref{Encoding
764 The charset to be used can be overriden by setting the @code{charset}
765 MML tag (@pxref{MML Definition}) when composing the message.
767 The encoding of characters (quoted-printable, 8bit etc) is orthogonal
768 to the discussion here, and is controlled by the variables
769 @code{mm-body-charset-encoding-alist} and
770 @code{mm-content-transfer-encoding-defaults} (@pxref{Encoding
777 A (multipart) @sc{mime} message can be converted to MML with the
778 @code{mime-to-mml} function. It works on the message in the current
779 buffer, and substitutes MML markup for @sc{mime} boundaries.
780 Non-textual parts do not have their contents in the buffer, but instead
781 have the contents in separate buffers that are referred to from the MML
785 An MML message can be converted back to @sc{mime} by the
786 @code{mml-to-mime} function.
788 These functions are in certain senses ``lossy''---you will not get back
789 an identical message if you run @sc{mime-to-mml} and then
790 @sc{mml-to-mime}. Not only will trivial things like the order of the
791 headers differ, but the contents of the headers may also be different.
792 For instance, the original message may use base64 encoding on text,
793 while @sc{mml-to-mime} may decide to use quoted-printable encoding, and
796 In essence, however, these two functions should be the inverse of each
797 other. The resulting contents of the message should remain equivalent,
803 @cindex format=flowed
805 The Emacs @sc{mime} library will respect the @code{use-hard-newlines}
806 variable (@pxref{Hard and Soft Newlines, ,Hard and Soft Newlines,
807 emacs, Emacs Manual}) when encoding a message, and the
808 ``format=flowed'' Content-Type parameter when decoding a message.
810 On encoding text, lines terminated by soft newline characters are
811 filled together and wrapped after the column decided by
812 @code{fill-flowed-encode-column}. This variable controls how the text
813 will look in a client that does not support flowed text, the default
814 is to wrap after 66 characters. If hard newline characters are not
815 present in the buffer, no flow encoding occurs.
817 On decoding flowed text, lines with soft newline characters are filled
818 together and wrapped after the column decided by
819 @code{fill-flowed-display-column}. The default is to wrap after
825 @node Interface Functions
826 @chapter Interface Functions
827 @cindex interface functions
830 The @code{mail-parse} library is an abstraction over the actual
831 low-level libraries that are described in the next chapter.
833 Standards change, and so programs have to change to fit in the new
834 mold. For instance, RFC2045 describes a syntax for the
835 @code{Content-Type} header that only allows ASCII characters in the
836 parameter list. RFC2231 expands on RFC2045 syntax to provide a scheme
837 for continuation headers and non-ASCII characters.
839 The traditional way to deal with this is just to update the library
840 functions to parse the new syntax. However, this is sometimes the wrong
841 thing to do. In some instances it may be vital to be able to understand
842 both the old syntax as well as the new syntax, and if there is only one
843 library, one must choose between the old version of the library and the
844 new version of the library.
846 The Emacs @sc{mime} library takes a different tack. It defines a
847 series of low-level libraries (@file{rfc2047.el}, @file{rfc2231.el}
848 and so on) that parses strictly according to the corresponding
849 standard. However, normal programs would not use the functions
850 provided by these libraries directly, but instead use the functions
851 provided by the @code{mail-parse} library. The functions in this
852 library are just aliases to the corresponding functions in the latest
853 low-level libraries. Using this scheme, programs get a consistent
854 interface they can use, and library developers are free to create
855 write code that handles new standards.
857 The following functions are defined by this library:
860 @item mail-header-parse-content-type
861 @findex mail-header-parse-content-type
862 Parse a @code{Content-Type} header and return a list on the following
867 (attribute1 . value1)
868 (attribute2 . value2)
875 (mail-header-parse-content-type
876 "image/gif; name=\"b980912.gif\"")
877 @result{} ("image/gif" (name . "b980912.gif"))
880 @item mail-header-parse-content-disposition
881 @findex mail-header-parse-content-disposition
882 Parse a @code{Content-Disposition} header and return a list on the same
883 format as the function above.
885 @item mail-content-type-get
886 @findex mail-content-type-get
887 Takes two parameters---a list on the format above, and an attribute.
888 Returns the value of the attribute.
891 (mail-content-type-get
892 '("image/gif" (name . "b980912.gif")) 'name)
893 @result{} "b980912.gif"
896 @item mail-header-encode-parameter
897 @findex mail-header-encode-parameter
898 Takes a parameter string and returns an encoded version of the string.
899 This is used for parameters in headers like @code{Content-Type} and
900 @code{Content-Disposition}.
902 @item mail-header-remove-comments
903 @findex mail-header-remove-comments
904 Return a comment-free version of a header.
907 (mail-header-remove-comments
908 "Gnus/5.070027 (Pterodactyl Gnus v0.27) (Finnish Landrace)")
909 @result{} "Gnus/5.070027 "
912 @item mail-header-remove-whitespace
913 @findex mail-header-remove-whitespace
914 Remove linear white space from a header. Space inside quoted strings
915 and comments is preserved.
918 (mail-header-remove-whitespace
919 "image/gif; name=\"Name with spaces\"")
920 @result{} "image/gif;name=\"Name with spaces\""
923 @item mail-header-get-comment
924 @findex mail-header-get-comment
925 Return the last comment in a header.
928 (mail-header-get-comment
929 "Gnus/5.070027 (Pterodactyl Gnus v0.27) (Finnish Landrace)")
930 @result{} "Finnish Landrace"
933 @item mail-header-parse-address
934 @findex mail-header-parse-address
935 Parse an address and return a list containing the mailbox and the
939 (mail-header-parse-address
940 "Hrvoje Niksic <hniksic@@srce.hr>")
941 @result{} ("hniksic@@srce.hr" . "Hrvoje Niksic")
944 @item mail-header-parse-addresses
945 @findex mail-header-parse-addresses
946 Parse a string with list of addresses and return a list of elements like
947 the one described above.
950 (mail-header-parse-addresses
951 "Hrvoje Niksic <hniksic@@srce.hr>, Steinar Bang <sb@@metis.no>")
952 @result{} (("hniksic@@srce.hr" . "Hrvoje Niksic")
953 ("sb@@metis.no" . "Steinar Bang"))
956 @item mail-header-parse-date
957 @findex mail-header-parse-date
958 Parse a date string and return an Emacs time structure.
960 @item mail-narrow-to-head
961 @findex mail-narrow-to-head
962 Narrow the buffer to the header section of the buffer. Point is placed
963 at the beginning of the narrowed buffer.
965 @item mail-header-narrow-to-field
966 @findex mail-header-narrow-to-field
967 Narrow the buffer to the header under point. Understands continuation
970 @item mail-header-fold-field
971 @findex mail-header-fold-field
972 Fold the header under point.
974 @item mail-header-unfold-field
975 @findex mail-header-unfold-field
976 Unfold the header under point.
978 @item mail-header-field-value
979 @findex mail-header-field-value
980 Return the value of the field under point.
982 @item mail-encode-encoded-word-region
983 @findex mail-encode-encoded-word-region
984 Encode the non-ASCII words in the region. For instance,
985 @samp{Naïve} is encoded as @samp{=?iso-8859-1?q?Na=EFve?=}.
987 @item mail-encode-encoded-word-buffer
988 @findex mail-encode-encoded-word-buffer
989 Encode the non-ASCII words in the current buffer. This function is
990 meant to be called narrowed to the headers of a message.
992 @item mail-encode-encoded-word-string
993 @findex mail-encode-encoded-word-string
994 Encode the words that need encoding in a string, and return the result.
997 (mail-encode-encoded-word-string
998 "This is naïve, baby")
999 @result{} "This is =?iso-8859-1?q?na=EFve,?= baby"
1002 @item mail-decode-encoded-word-region
1003 @findex mail-decode-encoded-word-region
1004 Decode the encoded words in the region.
1006 @item mail-decode-encoded-word-string
1007 @findex mail-decode-encoded-word-string
1008 Decode the encoded words in the string and return the result.
1011 (mail-decode-encoded-word-string
1012 "This is =?iso-8859-1?q?na=EFve,?= baby")
1013 @result{} "This is naïve, baby"
1018 Currently, @code{mail-parse} is an abstraction over @code{ietf-drums},
1019 @code{rfc2047}, @code{rfc2045} and @code{rfc2231}. These are documented
1020 in the subsequent sections.
1024 @node Basic Functions
1025 @chapter Basic Functions
1027 This chapter describes the basic, ground-level functions for parsing and
1028 handling. Covered here is parsing @code{From} lines, removing comments
1029 from header lines, decoding encoded words, parsing date headers and so
1030 on. High-level functionality is dealt with in the next chapter
1031 (@pxref{Decoding and Viewing}).
1034 * rfc2045:: Encoding @code{Content-Type} headers.
1035 * rfc2231:: Parsing @code{Content-Type} headers.
1036 * ietf-drums:: Handling mail headers defined by RFC822bis.
1037 * rfc2047:: En/decoding encoded words in headers.
1038 * time-date:: Functions for parsing dates and manipulating time.
1039 * qp:: Quoted-Printable en/decoding.
1040 * base64:: Base64 en/decoding.
1041 * binhex:: Binhex decoding.
1042 * uudecode:: Uuencode decoding.
1043 * rfc1843:: Decoding HZ-encoded text.
1044 * mailcap:: How parts are displayed is specified by the @file{.mailcap} file
1051 RFC2045 is the ``main'' @sc{mime} document, and as such, one would
1052 imagine that there would be a lot to implement. But there isn't, since
1053 most of the implementation details are delegated to the subsequent
1056 So @file{rfc2045.el} has only a single function:
1059 @item rfc2045-encode-string
1060 @findex rfc2045-encode-string
1061 Takes a parameter and a value and returns a @samp{PARAM=VALUE} string.
1062 @var{value} will be quoted if there are non-safe characters in it.
1069 RFC2231 defines a syntax for the @code{Content-Type} and
1070 @code{Content-Disposition} headers. Its snappy name is @dfn{MIME
1071 Parameter Value and Encoded Word Extensions: Character Sets, Languages,
1074 In short, these headers look something like this:
1077 Content-Type: application/x-stuff;
1078 title*0*=us-ascii'en'This%20is%20even%20more%20;
1079 title*1*=%2A%2A%2Afun%2A%2A%2A%20;
1083 They usually aren't this bad, though.
1085 The following functions are defined by this library:
1088 @item rfc2231-parse-string
1089 @findex rfc2231-parse-string
1090 Parse a @code{Content-Type} header and return a list describing its
1094 (rfc2231-parse-string
1095 "application/x-stuff;
1096 title*0*=us-ascii'en'This%20is%20even%20more%20;
1097 title*1*=%2A%2A%2Afun%2A%2A%2A%20;
1098 title*2=\"isn't it!\"")
1099 @result{} ("application/x-stuff"
1100 (title . "This is even more ***fun*** isn't it!"))
1103 @item rfc2231-get-value
1104 @findex rfc2231-get-value
1105 Takes one of the lists on the format above and returns
1106 the value of the specified attribute.
1108 @item rfc2231-encode-string
1109 @findex rfc2231-encode-string
1110 Encode a parameter in headers likes @code{Content-Type} and
1111 @code{Content-Disposition}.
1119 @dfn{drums} is an IETF working group that is working on the replacement
1122 The functions provided by this library include:
1125 @item ietf-drums-remove-comments
1126 @findex ietf-drums-remove-comments
1127 Remove the comments from the argument and return the results.
1129 @item ietf-drums-remove-whitespace
1130 @findex ietf-drums-remove-whitespace
1131 Remove linear white space from the string and return the results.
1132 Spaces inside quoted strings and comments are left untouched.
1134 @item ietf-drums-get-comment
1135 @findex ietf-drums-get-comment
1136 Return the last most comment from the string.
1138 @item ietf-drums-parse-address
1139 @findex ietf-drums-parse-address
1140 Parse an address string and return a list that contains the mailbox and
1141 the plain text name.
1143 @item ietf-drums-parse-addresses
1144 @findex ietf-drums-parse-addresses
1145 Parse a string that contains any number of comma-separated addresses and
1146 return a list that contains mailbox/plain text pairs.
1148 @item ietf-drums-parse-date
1149 @findex ietf-drums-parse-date
1150 Parse a date string and return an Emacs time structure.
1152 @item ietf-drums-narrow-to-header
1153 @findex ietf-drums-narrow-to-header
1154 Narrow the buffer to the header section of the current buffer.
1162 RFC2047 (Message Header Extensions for Non-ASCII Text) specifies how
1163 non-ASCII text in headers are to be encoded. This is actually rather
1164 complicated, so a number of variables are necessary to tweak what this
1167 The following variables are tweakable:
1170 @item rfc2047-default-charset
1171 @vindex rfc2047-default-charset
1172 Characters in this charset should not be decoded by this library.
1173 This defaults to @code{iso-8859-1}.
1175 @item rfc2047-header-encoding-list
1176 @vindex rfc2047-header-encoding-list
1177 This is an alist of header / encoding-type pairs. Its main purpose is
1178 to prevent encoding of certain headers.
1180 The keys can either be header regexps, or @code{t}.
1182 The values can be either @code{nil}, in which case the header(s) in
1183 question won't be encoded, or @code{mime}, which means that they will be
1186 @item rfc2047-charset-encoding-alist
1187 @vindex rfc2047-charset-encoding-alist
1188 RFC2047 specifies two forms of encoding---@code{Q} (a
1189 Quoted-Printable-like encoding) and @code{B} (base64). This alist
1190 specifies which charset should use which encoding.
1192 @item rfc2047-encoding-function-alist
1193 @vindex rfc2047-encoding-function-alist
1194 This is an alist of encoding / function pairs. The encodings are
1195 @code{Q}, @code{B} and @code{nil}.
1197 @item rfc2047-q-encoding-alist
1198 @vindex rfc2047-q-encoding-alist
1199 The @code{Q} encoding isn't quite the same for all headers. Some
1200 headers allow a narrower range of characters, and that is what this
1201 variable is for. It's an alist of header regexps / allowable character
1204 @item rfc2047-encoded-word-regexp
1205 @vindex rfc2047-encoded-word-regexp
1206 When decoding words, this library looks for matches to this regexp.
1210 Those were the variables, and these are this functions:
1213 @item rfc2047-narrow-to-field
1214 @findex rfc2047-narrow-to-field
1215 Narrow the buffer to the header on the current line.
1217 @item rfc2047-encode-message-header
1218 @findex rfc2047-encode-message-header
1219 Should be called narrowed to the header of a message. Encodes according
1220 to @code{rfc2047-header-encoding-alist}.
1222 @item rfc2047-encode-region
1223 @findex rfc2047-encode-region
1224 Encodes all encodable words in the region specified.
1226 @item rfc2047-encode-string
1227 @findex rfc2047-encode-string
1228 Encode a string and return the results.
1230 @item rfc2047-decode-region
1231 @findex rfc2047-decode-region
1232 Decode the encoded words in the region.
1234 @item rfc2047-decode-string
1235 @findex rfc2047-decode-string
1236 Decode a string and return the results.
1244 While not really a part of the @sc{mime} library, it is convenient to
1245 document this library here. It deals with parsing @code{Date} headers
1246 and manipulating time. (Not by using tesseracts, though, I'm sorry to
1249 These functions convert between five formats: A date string, an Emacs
1250 time structure, a decoded time list, a second number, and a day number.
1252 Here's a bunch of time/date/second/day examples:
1255 (parse-time-string "Sat Sep 12 12:21:54 1998 +0200")
1256 @result{} (54 21 12 12 9 1998 6 nil 7200)
1258 (date-to-time "Sat Sep 12 12:21:54 1998 +0200")
1259 @result{} (13818 19266)
1261 (time-to-seconds '(13818 19266))
1262 @result{} 905595714.0
1264 (seconds-to-time 905595714.0)
1265 @result{} (13818 19266 0)
1267 (time-to-days '(13818 19266))
1270 (days-to-time 729644)
1271 @result{} (961933 65536)
1273 (time-since '(13818 19266))
1276 (time-less-p '(13818 19266) '(13818 19145))
1279 (subtract-time '(13818 19266) '(13818 19145))
1282 (days-between "Sat Sep 12 12:21:54 1998 +0200"
1283 "Sat Sep 07 12:21:54 1998 +0200")
1286 (date-leap-year-p 2000)
1289 (time-to-day-in-year '(13818 19266))
1292 (time-to-number-of-days
1294 (date-to-time "Mon, 01 Jan 2001 02:22:26 GMT")))
1295 @result{} 4.146122685185185
1298 And finally, we have @code{safe-date-to-time}, which does the same as
1299 @code{date-to-time}, but returns a zero time if the date is
1300 syntactically malformed.
1302 The five data representations used are the following:
1306 An RFC822 (or similar) date string. For instance: @code{"Sat Sep 12
1307 12:21:54 1998 +0200"}.
1310 An internal Emacs time. For instance: @code{(13818 26466)}.
1313 A floating point representation of the internal Emacs time. For
1314 instance: @code{905595714.0}.
1317 An integer number representing the number of days since 00000101. For
1318 instance: @code{729644}.
1321 A list of decoded time. For instance: @code{(54 21 12 12 9 1998 6 t
1325 All the examples above represent the same moment.
1327 These are the functions available:
1331 Take a date and return a time.
1333 @item time-to-seconds
1334 Take a time and return seconds.
1336 @item seconds-to-time
1337 Take seconds and return a time.
1340 Take a time and return days.
1343 Take days and return a time.
1346 Take a date and return days.
1348 @item time-to-number-of-days
1349 Take a time and return the number of days that represents.
1351 @item safe-date-to-time
1352 Take a date and return a time. If the date is not syntactically valid,
1353 return a "zero" date.
1356 Take two times and say whether the first time is less (i. e., earlier)
1357 than the second time.
1360 Take a time and return a time saying how long it was since that time.
1363 Take two times and subtract the second from the first. I. e., return
1364 the time between the two times.
1367 Take two days and return the number of days between those two days.
1369 @item date-leap-year-p
1370 Take a year number and say whether it's a leap year.
1372 @item time-to-day-in-year
1373 Take a time and return the day number within the year that the time is
1382 This library deals with decoding and encoding Quoted-Printable text.
1384 Very briefly explained, qp encoding means translating all 8-bit
1385 characters (and lots of control characters) into things that look like
1386 @samp{=EF}; that is, an equal sign followed by the byte encoded as a hex
1389 The following functions are defined by the library:
1392 @item quoted-printable-decode-region
1393 @findex quoted-printable-decode-region
1394 QP-decode all the encoded text in the specified region.
1396 @item quoted-printable-decode-string
1397 @findex quoted-printable-decode-string
1398 Decode the QP-encoded text in a string and return the results.
1400 @item quoted-printable-encode-region
1401 @findex quoted-printable-encode-region
1402 QP-encode all the encodable characters in the specified region. The third
1403 optional parameter @var{fold} specifies whether to fold long lines.
1404 (Long here means 72.)
1406 @item quoted-printable-encode-string
1407 @findex quoted-printable-encode-string
1408 QP-encode all the encodable characters in a string and return the
1418 Base64 is an encoding that encodes three bytes into four characters,
1419 thereby increasing the size by about 33%. The alphabet used for
1420 encoding is very resistant to mangling during transit.
1422 The following functions are defined by this library:
1425 @item base64-encode-region
1426 @findex base64-encode-region
1427 base64 encode the selected region. Return the length of the encoded
1428 text. Optional third argument @var{no-line-break} means do not break
1429 long lines into shorter lines.
1431 @item base64-encode-string
1432 @findex base64-encode-string
1433 base64 encode a string and return the result.
1435 @item base64-decode-region
1436 @findex base64-decode-region
1437 base64 decode the selected region. Return the length of the decoded
1438 text. If the region can't be decoded, return @code{nil} and don't
1441 @item base64-decode-string
1442 @findex base64-decode-string
1443 base64 decode a string and return the result. If the string can't be
1444 decoded, @code{nil} is returned.
1455 @code{binhex} is an encoding that originated in Macintosh environments.
1456 The following function is supplied to deal with these:
1459 @item binhex-decode-region
1460 @findex binhex-decode-region
1461 Decode the encoded text in the region. If given a third parameter, only
1462 decode the @code{binhex} header and return the filename.
1472 @code{uuencode} is probably still the most popular encoding of binaries
1473 used on Usenet, although @code{base64} rules the mail world.
1475 The following function is supplied by this package:
1478 @item uudecode-decode-region
1479 @findex uudecode-decode-region
1480 Decode the text in the region.
1490 RFC1843 deals with mixing Chinese and ASCII characters in messages. In
1491 essence, RFC1843 switches between ASCII and Chinese by doing this:
1494 This sentence is in ASCII.
1495 The next sentence is in GB.~@{<:Ky2;S@{#,NpJ)l6HK!#~@}Bye.
1498 Simple enough, and widely used in China.
1500 The following functions are available to handle this encoding:
1503 @item rfc1843-decode-region
1504 Decode HZ-encoded text in the region.
1506 @item rfc1843-decode-string
1507 Decode a HZ-encoded string and return the result.
1515 The @file{~/.mailcap} file is parsed by most @sc{mime}-aware message
1516 handlers and describes how elements are supposed to be displayed.
1517 Here's an example file:
1521 audio/wav; wavplayer %s
1522 application/msword; catdoc %s ; copiousoutput ; nametemplate=%s.doc
1525 This says that all image files should be displayed with @code{gimp},
1526 that WAVE audio files should be played by @code{wavplayer}, and that
1527 MS-WORD files should be inlined by @code{catdoc}.
1529 The @code{mailcap} library parses this file, and provides functions for
1533 @item mailcap-mime-data
1534 @vindex mailcap-mime-data
1535 This variable is an alist of alists containing backup viewing rules.
1539 Interface functions:
1542 @item mailcap-parse-mailcaps
1543 @findex mailcap-parse-mailcaps
1544 Parse the @code{~/.mailcap} file.
1546 @item mailcap-mime-info
1547 Takes a @sc{mime} type as its argument and returns the matching viewer.
1557 The Emacs @sc{mime} library implements handling of various elements
1558 according to a (somewhat) large number of RFCs, drafts and standards
1559 documents. This chapter lists the relevant ones. They can all be
1560 fetched from @uref{http://quimby.gnus.org/notes/}.
1565 Standard for the Format of ARPA Internet Text Messages.
1568 Standard for Interchange of USENET Messages
1571 Format of Internet Message Bodies
1577 Message Header Extensions for Non-ASCII Text
1580 Registration Procedures
1583 Conformance Criteria and Examples
1586 @sc{mime} Parameter Value and Encoded Word Extensions: Character Sets,
1587 Languages, and Continuations
1590 HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and
1593 @item draft-ietf-drums-msg-fmt-05.txt
1594 Draft for the successor of RFC822
1597 The @sc{mime} Multipart/Related Content-type
1600 The Multipart/Report Content Type for the Reporting of Mail System
1601 Administrative Messages
1604 Communicating Presentation Information in Internet Messages: The
1605 Content-Disposition Header Field
1608 Documentation of the text/plain format parameter for flowed text.
1624 @c coding: iso-8859-1