cgit.sxemacs.org Git - sxemacs/blob - info/sxemacs/mule.texi

   1 @c This is part of the SXEmacs manual.
   2 @c Copyright (C) 1997 Free Software Foundation, Inc.
   3 @c See file emacs.texi for copying conditions.
   4 @node Mule, Major Modes, Windows, Top
   5 @chapter World Scripts Support
   6 @cindex MULE
   7 @cindex international scripts
   8 @cindex multibyte characters
   9 @cindex encoding of characters
  10
  11 @cindex Chinese
  12 @cindex Greek
  13 @cindex IPA
  14 @cindex Japanese
  15 @cindex Korean
  16 @cindex Cyrillic
  17 @cindex Russian
  18   If you compile SXEmacs with Mule option, it supports a wide variety of
  19 world scripts, including Latin script, as well as Arabic script,
  20 Simplified Chinese script (for mainland of China), Traditional Chinese
  21 script (for Taiwan and Hong-Kong), Greek script, Hebrew script, IPA
  22 symbols, Japanese scripts (Hiragana, Katakana and Kanji), Korean scripts
  23 (Hangul and Hanja) and Cyrillic script (for Byelorussian, Bulgarian,
  24 Russian, Serbian and Ukrainian).  These features have been merged from
  25 the modified version of Emacs known as MULE (for ``MULti-lingual
  26 Enhancement to GNU Emacs'').
  27
  28 @menu
  29 * Mule Intro::              Basic concepts of Mule.
  30 * Language Environments::   Setting things up for the language you use.
  31 * Input Methods::           Entering text characters not on your keyboard.
  32 * Select Input Method::     Specifying your choice of input methods.
  33 * Mule and Fonts::          Additional font-related issues
  34 * Coding Systems::          Character set conversion when you read and
  35                               write files, and so on.
  36 * Recognize Coding::        How SXEmacs figures out which conversion to use.
  37 * Specify Coding::          Various ways to choose which conversion to use.
  38 @end menu
  39
  40 @node Mule Intro, Language Environments, Mule, Mule
  41 @section What is Mule?
  42
  43 Mule is the MUltiLingual Extension to SXEmacs.  It provides facilities
  44 not only for handling text written in many different languages, but in
  45 fact multilingual texts containing several languages in the same buffer.
  46 This goes beyond the simple facilities offered by Unicode for
  47 representation of multilingual text.  Mule also supports input methods,
  48 composing display using fonts in various different encodings, changing
  49 character syntax and other editing facilities to correspond to local
  50 language usage, and more.
  51
  52 The most obvious problem is that of the different character coding
  53 systems used by different languages.  ASCII supplies all the characters
  54 needed for most computer programming languages and US English (it lacks
  55 the currency symbol for British English), but other Western European
  56 languages (French, Spanish, German) require more than 96 code positions
  57 for accented characters.  In fact, even with 8 bits to represent 96 more
  58 character (including accented characters and symbols such as currency
  59 symbols), some languages' alphabets remain incomplete (Croatian,
  60 Polish).  (The 64 "missing characters" are reserved for control
  61 characters.)  Furthermore, many European languages have their own
  62 alphabets, which must conflict with the accented characters since the
  63 ASCII characters are needed for computer interaction (error and log
  64 messages are typically in ASCII).
  65
  66 For economy of space, historical practice has been for each language to
  67 establish its own encoding for the characters it needs.  This allows
  68 most European languages to represented with one octet (byte) per
  69 character.  However, many Asian languages have thousands of characters
  70 and require two or more octets per character.  For multilingual
  71 purposes, the ISO 2022 standard establishes escape codes that allow
  72 switching encodings in midstream.  (It's also ISO 2022 that establishes
  73 the standard that code points 0-31 and 128-159 are control codes.)
  74
  75 However, this is error-prone and complex for internal processing.  For
  76 this reason SXEmacs uses an internal coding system which can encode all
  77 of the world's scripts.  Unfortunately, for historical reasons, this
  78 code is not Unicode, although we are moving in that direction.
  79
  80 SXEmacs translates between the internal character encoding and various
  81 other coding systems when reading and writing files, when exchanging
  82 data with subprocesses, and (in some cases) in the @kbd{C-q} command
  83 (see below).  The internal encoding is never visible to the user in a
  84 production SXEmacs, but unfortunately the process cannot be completely
  85 transparent to the user.  This is because the same ranges of octets may
  86 represent 1-octet ISO-8859-1 (which is satisfactory for most Western
  87 European use prior to the introduction of the Euro currency), 1-octet
  88 ISO-8859-15 (which substitutes the Euro for the rarely used "generic
  89 currency" symbol), 1-octet ISO-8859-5 (Cyrillic), or multioctet EUC-JP
  90 (Japanese).  There's no way to tell without being able to read!
  91
  92 A number of heuristics are incorporated in Mule for automatic
  93 recognition, there are facilities for the user to set defaults, and
  94 where necessary (rarely, we hope) to set coding systems directly.
  95
  96 @kindex C-h h
  97 @findex view-hello-file
  98   The command @kbd{C-h h} (@code{view-hello-file}) displays the file
  99 @file{etc/HELLO}, which shows how to say ``hello'' in many languages.
 100 This illustrates various scripts.
 101
 102   Keyboards, even in the countries where these character sets are used,
 103 generally don't have keys for all the characters in them.  So SXEmacs
 104 supports various @dfn{input methods}, typically one for each script or
 105 language, to make it convenient to type them.
 106
 107 @kindex C-x RET
 108   The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
 109 to world scripts, coding systems, and input methods.
 110
 111
 112 @node Language Environments, Input Methods, Mule Intro, Mule
 113 @section Language Environments
 114 @cindex language environments
 115
 116   All supported character sets are supported in SXEmacs buffers if it is
 117 compiled with Mule; there is no need to select a particular language in
 118 order to display its characters in an SXEmacs buffer.  However, it is
 119 important to select a @dfn{language environment} in order to set various
 120 defaults.  The language environment really represents a choice of
 121 preferred script (more or less) rather that a choice of language.
 122
 123   The language environment controls which coding systems to recognize
 124 when reading text (@pxref{Recognize Coding}).  This applies to files,
 125 incoming mail, netnews, and any other text you read into SXEmacs.  It may
 126 also specify the default coding system to use when you create a file.
 127 Each language environment also specifies a default input method.
 128
 129 @findex set-language-environment
 130   The command to select a language environment is @kbd{M-x
 131 set-language-environment}.  It makes no difference which buffer is
 132 current when you use this command, because the effects apply globally to
 133 the SXEmacs session.  The supported language environments include:
 134
 135 @quotation
 136 ASCII, Chinese-BIG5, Chinese-GB, Croatian, Cyrillic-ALT, Cyrillic-ISO,
 137 Cyrillic-KOI8, Cyrillic-Win, Czech, English, Ethiopic, French, German,
 138 Greek, Hebrew, IPA, Japanese, Korean, Latin-1, Latin-2, Latin-3, Latin-4,
 139 Latin-5, Norwegian, Polish, Romanian, Slovenian, Thai-XTIS, Vietnamese.
 140 @end quotation
 141
 142   Some operating systems let you specify the language you are using by
 143 setting locale environment variables.  SXEmacs handles one common special
 144 case of this: if your locale name for character types contains the
 145 string @samp{8859-@var{n}}, SXEmacs automatically selects the
 146 corresponding language environment.
 147
 148 @kindex C-h L
 149 @findex describe-language-environment
 150   To display information about the effects of a certain language
 151 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env}
 152 @key{RET}} (@code{describe-language-environment}).  This tells you which
 153 languages this language environment is useful for, and lists the
 154 character sets, coding systems, and input methods that go with it.  It
 155 also shows some sample text to illustrate scripts used in this language
 156 environment.  By default, this command describes the chosen language
 157 environment.
 158
 159 @node Input Methods, Select Input Method, Language Environments, Mule
 160 @section Input Methods
 161
 162 @cindex input methods
 163   An @dfn{input method} is a kind of character conversion designed
 164 specifically for interactive input.  In SXEmacs, typically each language
 165 has its own input method; sometimes several languages which use the same
 166 characters can share one input method.  A few languages support several
 167 input methods.
 168
 169   The simplest kind of input method works by mapping ASCII letters into
 170 another alphabet.  This is how the Greek and Russian input methods work.
 171
 172   A more powerful technique is composition: converting sequences of
 173 characters into one letter.  Many European input methods use composition
 174 to produce a single non-ASCII letter from a sequence that consists of a
 175 letter followed by accent characters.  For example, some methods convert
 176 the sequence @kbd{'a} into a single accented letter.
 177
 178   The input methods for syllabic scripts typically use mapping followed
 179 by composition.  The input methods for Thai and Korean work this way.
 180 First, letters are mapped into symbols for particular sounds or tone
 181 marks; then, sequences of these which make up a whole syllable are
 182 mapped into one syllable sign.
 183
 184   Chinese and Japanese require more complex methods.  In Chinese input
 185 methods, first you enter the phonetic spelling of a Chinese word (in
 186 input method @code{chinese-py}, among others), or a sequence of portions
 187 of the character (input methods @code{chinese-4corner} and
 188 @code{chinese-sw}, and others).  Since one phonetic spelling typically
 189 corresponds to many different Chinese characters, you must select one of
 190 the alternatives using special SXEmacs commands.  Keys such as @kbd{C-f},
 191 @kbd{C-b}, @kbd{C-n}, @kbd{C-p}, and digits have special definitions in
 192 this situation, used for selecting among the alternatives.  @key{TAB}
 193 displays a buffer showing all the possibilities.
 194
 195    In Japanese input methods, first you input a whole word using
 196 phonetic spelling; then, after the word is in the buffer, SXEmacs
 197 converts it into one or more characters using a large dictionary.  One
 198 phonetic spelling corresponds to many differently written Japanese
 199 words, so you must select one of them; use @kbd{C-n} and @kbd{C-p} to
 200 cycle through the alternatives.
 201
 202   Sometimes it is useful to cut off input method processing so that the
 203 characters you have just entered will not combine with subsequent
 204 characters.  For example, in input method @code{latin-1-postfix}, the
 205 sequence @kbd{e '} combines to form an @samp{e} with an accent.  What if
 206 you want to enter them as separate characters?
 207
 208   One way is to type the accent twice; that is a special feature for
 209 entering the separate letter and accent.  For example, @kbd{e ' '} gives
 210 you the two characters @samp{e'}.  Another way is to type another letter
 211 after the @kbd{e}---something that won't combine with that---and
 212 immediately delete it.  For example, you could type @kbd{e e @key{DEL}
 213 '} to get separate @samp{e} and @samp{'}.
 214
 215   Another method, more general but not quite as easy to type, is to use
 216 @kbd{C-\ C-\} between two characters to stop them from combining.  This
 217 is the command @kbd{C-\} (@code{toggle-input-method}) used twice.
 218 @ifinfo
 219 @xref{Select Input Method}.
 220 @end ifinfo
 221
 222   @kbd{C-\ C-\} is especially useful inside an incremental search,
 223 because stops waiting for more characters to combine, and starts
 224 searching for what you have already entered.
 225
 226 @vindex input-method-verbose-flag
 227 @vindex input-method-highlight-flag
 228   The variables @code{input-method-highlight-flag} and
 229 @code{input-method-verbose-flag} control how input methods explain what
 230 is happening.  If @code{input-method-highlight-flag} is non-@code{nil},
 231 the partial sequence is highlighted in the buffer.  If
 232 @code{input-method-verbose-flag} is non-@code{nil}, the list of possible
 233 characters to type next is displayed in the echo area (but not when you
 234 are in the minibuffer).
 235
 236 @node Select Input Method, Mule and Fonts, Input Methods, Mule
 237 @section Selecting an Input Method
 238
 239 @table @kbd
 240 @item C-\
 241 Enable or disable use of the selected input method.
 242
 243 @item C-x @key{RET} C-\ @var{method} @key{RET}
 244 Select a new input method for the current buffer.
 245
 246 @item C-h I @var{method} @key{RET}
 247 @itemx C-h C-\ @var{method} @key{RET}
 248 @findex describe-input-method
 249 @kindex C-h I
 250 @kindex C-h C-\
 251 Describe the input method @var{method} (@code{describe-input-method}).
 252 By default, it describes the current input method (if any).
 253
 254 @item M-x list-input-methods
 255 Display a list of all the supported input methods.
 256 @end table
 257
 258 @findex select-input-method
 259 @vindex current-input-method
 260 @kindex C-x RET C-\
 261   To choose an input method for the current buffer, use @kbd{C-x
 262 @key{RET} C-\} (@code{select-input-method}).  This command reads the
 263 input method name with the minibuffer; the name normally starts with the
 264 language environment that it is meant to be used with.  The variable
 265 @code{current-input-method} records which input method is selected.
 266
 267 @findex toggle-input-method
 268 @kindex C-\
 269   Input methods use various sequences of ASCII characters to stand for
 270 non-ASCII characters.  Sometimes it is useful to turn off the input
 271 method temporarily.  To do this, type @kbd{C-\}
 272 (@code{toggle-input-method}).  To reenable the input method, type
 273 @kbd{C-\} again.
 274
 275   If you type @kbd{C-\} and you have not yet selected an input method,
 276 it prompts for you to specify one.  This has the same effect as using
 277 @kbd{C-x @key{RET} C-\} to specify an input method.
 278
 279 @vindex default-input-method
 280   Selecting a language environment specifies a default input method for
 281 use in various buffers.  When you have a default input method, you can
 282 select it in the current buffer by typing @kbd{C-\}.  The variable
 283 @code{default-input-method} specifies the default input method
 284 (@code{nil} means there is none).
 285
 286 @findex quail-set-keyboard-layout
 287   Some input methods for alphabetic scripts work by (in effect)
 288 remapping the keyboard to emulate various keyboard layouts commonly used
 289 for those scripts.  How to do this remapping properly depends on your
 290 actual keyboard layout.  To specify which layout your keyboard has, use
 291 the command @kbd{M-x quail-set-keyboard-layout}.
 292
 293 @findex list-input-methods
 294   To display a list of all the supported input methods, type @kbd{M-x
 295 list-input-methods}.  The list gives information about each input
 296 method, including the string that stands for it in the mode line.
 297
 298 @node Mule and Fonts, Coding Systems, Select Input Method, Mule
 299 @section Mule and Fonts
 300 @cindex fonts
 301 @cindex font registry
 302 @cindex font encoding
 303 @cindex CCL programs
 304
 305 (This section is X11-specific.)
 306
 307 Text in SXEmacs buffers is displayed using various faces.  In addition to
 308 specifying properties of a face, such as font and color, there are some
 309 additional properties of Mule charsets that are used in text.
 310
 311 There is currently two properties of a charset that could be adjusted by
 312 user: font registry and so called @dfn{ccl-program}.
 313
 314 Font registry is a regular expression matching the font registry field
 315 for this character set.  For example, both the @code{ascii} and
 316 @w{@code{latin-iso8859-1}} charsets use the registry @code{"ISO8859-1"}.
 317 This field is used to choose an appropriate font when the user gives a
 318 general font specification such as @w{@samp{-*-courier-medium-r-*-140-*}},
 319 i.e. a 14-point upright medium-weight Courier font.
 320
 321 You can set font registry for a charset using
 322 @samp{set-charset-registry} function in one of your startup files.  This
 323 function takes two arguments: character set (as a symbol) and font
 324 registry (as a string).
 325
 326 E.@w{ }g., for Cyrillic texts Mule uses @w{@code{cyrillic-iso8859-5}}
 327 charset with @samp{"ISO8859-5"} as a default registry, and we want to
 328 use @samp{"koi8-r"} instead, because fonts in that encoding are
 329 installed on our system.  Use:
 330
 331 @example
 332 (set-charset-registry 'cyrillic-iso8859-5 "koi8-r")
 333 @end example
 334
 335 (Please note that you probably also want to set font registry for
 336 @samp{ascii} charset so that mixed English/Cyrillic texts be displayed
 337 using the same font.)
 338
 339 "CCL-programs" are a little special-purpose scripts defined within
 340 SXEmacs or in some package.  Those scripts allow SXEmacs to use fonts that
 341 are in different encoding from the encoding that is used by Mule for
 342 text in buffer.  Returning to the above example, we need to somehow tell
 343 SXEmacs that we have different encodings of fonts and text and so it
 344 needs to convert characters between those encodings when displaying.
 345 That's what @samp{set-charset-ccl-program} function is used for.  There
 346 are quite a few various CCL programs defined within SXEmacs, and there is
 347 no comprehensive list of them, so you currently have to consult sources.
 348 @c FIXME: there must be a list of CCL programs
 349
 350 We know that there is a CCL program called @samp{ccl-encode-koi8-r-font}
 351 that is used exactly for needed purpose: to convert characters between
 352 @samp{ISO8859-5} encoding and @samp{koi8-r}.  Use:
 353
 354 @example
 355 (set-charset-ccl-program 'cyrillic-iso8859-5 'ccl-encode-koi8-r-font)
 356 @end example
 357
 358 There are several more uses for CCL programs, not related to fonts, but
 359 those uses are not described here.
 360
 361
 362 @node Coding Systems, Recognize Coding, Mule and Fonts, Mule
 363 @section Coding Systems
 364 @cindex coding systems
 365
 366   Users of various languages have established many more-or-less standard
 367 coding systems for representing them.  SXEmacs does not use these coding
 368 systems internally; instead, it converts from various coding systems to
 369 its own system when reading data, and converts the internal coding
 370 system to other coding systems when writing data.  Conversion is
 371 possible in reading or writing files, in sending or receiving from the
 372 terminal, and in exchanging data with subprocesses.
 373
 374   SXEmacs assigns a name to each coding system.  Most coding systems are
 375 used for one language, and the name of the coding system starts with the
 376 language name.  Some coding systems are used for several languages;
 377 their names usually start with @samp{iso}.  There are also special
 378 coding systems @code{binary} and @code{no-conversion} which do not
 379 convert printing characters at all.
 380
 381   In addition to converting various representations of non-ASCII
 382 characters, a coding system can perform end-of-line conversion.  SXEmacs
 383 handles three different conventions for how to separate lines in a file:
 384 newline, carriage-return linefeed, and just carriage-return.
 385
 386 @table @kbd
 387 @item C-h C @var{coding} @key{RET}
 388 Describe coding system @var{coding}.
 389
 390 @item C-h C @key{RET}
 391 Describe the coding systems currently in use.
 392
 393 @item M-x list-coding-systems
 394 Display a list of all the supported coding systems.
 395
 396 @item C-u M-x list-coding-systems
 397 Display comprehensive list of specific details of all supported coding
 398 systems.
 399 @end table
 400
 401 @kindex C-x @key{RET} C
 402 @findex describe-coding-system
 403   The command @kbd{C-x RET C} (@code{describe-coding-system}) displays
 404 information about particular coding systems.  You can specify a coding
 405 system name as argument; alternatively, with an empty argument, it
 406 describes the coding systems currently selected for various purposes,
 407 both in the current buffer and as the defaults, and the priority list
 408 for recognizing coding systems (@pxref{Recognize Coding}).
 409
 410 @findex list-coding-systems
 411   To display a list of all the supported coding systems, type @kbd{M-x
 412 list-coding-systems}.  The list gives information about each coding
 413 system, including the letter that stands for it in the mode line
 414 (@pxref{Mode Line}).
 415
 416   Each of the coding systems that appear in this list---except for
 417 @code{binary}, which means no conversion of any kind---specifies how and
 418 whether to convert printing characters, but leaves the choice of
 419 end-of-line conversion to be decided based on the contents of each file.
 420 For example, if the file appears to use carriage-return linefeed between
 421 lines, that end-of-line conversion will be used.
 422
 423   Each of the listed coding systems has three variants which specify
 424 exactly what to do for end-of-line conversion:
 425
 426 @table @code
 427 @item @dots{}-unix
 428 Don't do any end-of-line conversion; assume the file uses
 429 newline to separate lines.  (This is the convention normally used
 430 on Unix and GNU systems.)
 431
 432 @item @dots{}-dos
 433 Assume the file uses carriage-return linefeed to separate lines,
 434 and do the appropriate conversion.  (This is the convention normally used
 435 on Microsoft systems.)
 436
 437 @item @dots{}-mac
 438 Assume the file uses carriage-return to separate lines, and do the
 439 appropriate conversion.  (This is the convention normally used on the
 440 Macintosh system.)
 441 @end table
 442
 443   These variant coding systems are omitted from the
 444 @code{list-coding-systems} display for brevity, since they are entirely
 445 predictable.  For example, the coding system @code{iso-8859-1} has
 446 variants @code{iso-8859-1-unix}, @code{iso-8859-1-dos} and
 447 @code{iso-8859-1-mac}.
 448
 449   In contrast, the coding system @code{binary} specifies no character
 450 code conversion at all---none for non-Latin-1 byte values and none for
 451 end of line.  This is useful for reading or writing binary files, tar
 452 files, and other files that must be examined verbatim.
 453
 454   The easiest way to edit a file with no conversion of any kind is with
 455 the @kbd{M-x find-file-literally} command.  This uses @code{binary}, and
 456 also suppresses other SXEmacs features that might convert the file
 457 contents before you see them.  @xref{Visiting}.
 458
 459   The coding system @code{no-conversion} means that the file contains
 460 non-Latin-1 characters stored with the internal SXEmacs encoding.  It
 461 handles end-of-line conversion based on the data encountered, and has
 462 the usual three variants to specify the kind of end-of-line conversion.
 463
 464
 465 @node Recognize Coding, Specify Coding, Coding Systems, Mule
 466 @section Recognizing Coding Systems
 467
 468 @c #### This section is out of date.  The following set-*-coding-system
 469 @c functions are known:
 470
 471 @c set-buffer-file-coding-system
 472 @c set-buffer-file-coding-system-for-read
 473 @c set-buffer-process-coding-system
 474 @c set-console-tty-coding-system
 475 @c set-console-tty-input-coding-system
 476 @c set-console-tty-output-coding-system
 477 @c set-default-buffer-file-coding-system
 478 @c set-default-coding-systems
 479 @c set-default-file-coding-system
 480 @c set-file-coding-system
 481 @c set-file-coding-system-for-read
 482 @c set-keyboard-coding-system
 483 @c set-pathname-coding-system
 484 @c set-process-coding-system
 485 @c set-process-input-coding-system
 486 @c set-process-output-coding-system
 487 @c set-terminal-coding-system
 488
 489 @c Some are marked as broken.  Agenda: (1) Update this section using
 490 @c docstrings.  Note that they may be inaccurate.  (2) Correct the
 491 @c documentation here, updating docstrings at the same time.
 492
 493 @c Document this.
 494
 495 @c set-language-environment-coding-systems
 496
 497 @c What are these?
 498
 499 @c dontusethis-set-value-file-name-coding-system-handler
 500 @c dontusethis-set-value-keyboard-coding-system-handler
 501 @c dontusethis-set-value-terminal-coding-system-handler
 502
 503   Most of the time, SXEmacs can recognize which coding system to use for
 504 any given file--once you have specified your preferences.
 505
 506   Some coding systems can be recognized or distinguished by which byte
 507 sequences appear in the data.  However, there are coding systems that
 508 cannot be distinguished, not even potentially.  For example, there is no
 509 way to distinguish between Latin-1 and Latin-2; they use the same byte
 510 values with different meanings.
 511
 512   SXEmacs handles this situation by means of a priority list of coding
 513 systems.  Whenever SXEmacs reads a file, if you do not specify the coding
 514 system to use, SXEmacs checks the data against each coding system,
 515 starting with the first in priority and working down the list, until it
 516 finds a coding system that fits the data.  Then it converts the file
 517 contents assuming that they are represented in this coding system.
 518
 519   The priority list of coding systems depends on the selected language
 520 environment (@pxref{Language Environments}).  For example, if you use
 521 French, you probably want SXEmacs to prefer Latin-1 to Latin-2; if you
 522 use Czech, you probably want Latin-2 to be preferred.  This is one of
 523 the reasons to specify a language environment.
 524
 525 @findex prefer-coding-system
 526   However, you can alter the priority list in detail with the command
 527 @kbd{M-x prefer-coding-system}.  This command reads the name of a coding
 528 system from the minibuffer, and adds it to the front of the priority
 529 list, so that it is preferred to all others.  If you use this command
 530 several times, each use adds one element to the front of the priority
 531 list.
 532
 533 @vindex file-coding-system-alist
 534   Sometimes a file name indicates which coding system to use for the
 535 file.  The variable @code{file-coding-system-alist} specifies this
 536 correspondence.  There is a special function
 537 @code{modify-coding-system-alist} for adding elements to this list.  For
 538 example, to read and write all @samp{.txt} using the coding system
 539 @code{china-iso-8bit}, you can execute this Lisp expression:
 540
 541 @smallexample
 542 (modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)
 543 @end smallexample
 544
 545 @noindent
 546 The first argument should be @code{file}, the second argument should be
 547 a regular expression that determines which files this applies to, and
 548 the third argument says which coding system to use for these files.
 549
 550 @vindex coding
 551   You can specify the coding system for a particular file using the
 552 @samp{-*-@dots{}-*-} construct at the beginning of a file, or a local
 553 variables list at the end (@pxref{File Variables}).  You do this by
 554 defining a value for the ``variable'' named @code{coding}.  SXEmacs does
 555 not really have a variable @code{coding}; instead of setting a variable,
 556 it uses the specified coding system for the file.  For example,
 557 @samp{-*-mode: C; coding: iso-8859-1;-*-} specifies use of the
 558 iso-8859-1 coding system, as well as C mode.
 559
 560 @vindex buffer-file-coding-system
 561   Once SXEmacs has chosen a coding system for a buffer, it stores that
 562 coding system in @code{buffer-file-coding-system} and uses that coding
 563 system, by default, for operations that write from this buffer into a
 564 file.  This includes the commands @code{save-buffer} and
 565 @code{write-region}.  If you want to write files from this buffer using
 566 a different coding system, you can specify a different coding system for
 567 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify
 568 Coding}).
 569
 570
 571 @node Specify Coding,  , Recognize Coding, Mule
 572 @section Specifying a Coding System
 573
 574   In cases where SXEmacs does not automatically choose the right coding
 575 system, you can use these commands to specify one:
 576
 577 @table @kbd
 578 @item C-x @key{RET} f @var{coding} @key{RET}
 579 Use coding system @var{coding} for the visited file
 580 in the current buffer.
 581
 582 @item C-x @key{RET} c @var{coding} @key{RET}
 583 Specify coding system @var{coding} for the immediately following
 584 command.
 585
 586 @item C-x @key{RET} k @var{coding} @key{RET}
 587 Use coding system @var{coding} for keyboard input.  (This feature is
 588 non-functional and is temporarily disabled.)
 589
 590 @item C-x @key{RET} t @var{coding} @key{RET}
 591 Use coding system @var{coding} for terminal output.
 592
 593 @item C-x @key{RET} p @var{coding} @key{RET}
 594 Use coding system @var{coding} for subprocess input and output
 595 in the current buffer.
 596 @end table
 597
 598 @kindex C-x RET f
 599 @findex set-buffer-file-coding-system
 600   The command @kbd{C-x RET f} (@code{set-buffer-file-coding-system})
 601 specifies the file coding system for the current buffer---in other
 602 words, which coding system to use when saving or rereading the visited
 603 file.  You specify which coding system using the minibuffer.  Since this
 604 command applies to a file you have already visited, it affects only the
 605 way the file is saved.
 606
 607 @kindex C-x RET c
 608 @findex universal-coding-system-argument
 609   Another way to specify the coding system for a file is when you visit
 610 the file.  First use the command @kbd{C-x @key{RET} c}
 611 (@code{universal-coding-system-argument}); this command uses the
 612 minibuffer to read a coding system name.  After you exit the minibuffer,
 613 the specified coding system is used for @emph{the immediately following
 614 command}.
 615
 616   So if the immediately following command is @kbd{C-x C-f}, for example,
 617 it reads the file using that coding system (and records the coding
 618 system for when the file is saved).  Or if the immediately following
 619 command is @kbd{C-x C-w}, it writes the file using that coding system.
 620 Other file commands affected by a specified coding system include
 621 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants of
 622 @kbd{C-x C-f}.
 623
 624   In addition, if you run some file input commands with the precedent
 625 @kbd{C-u}, you can specify coding system to read from minibuffer.  So if
 626 the immediately following command is @kbd{C-x C-f}, for example, it
 627 reads the file using that coding system (and records the coding system
 628 for when the file is saved).  Other file commands affected by a
 629 specified coding system include @kbd{C-x C-i} and @kbd{C-x C-v}, as well
 630 as the other-window variants of @kbd{C-x C-f}.
 631
 632 @vindex default-buffer-file-coding-system
 633   The variable @code{default-buffer-file-coding-system} specifies the
 634 choice of coding system to use when you create a new file.  It applies
 635 when you find a new file, and when you create a buffer and then save it
 636 in a file.  Selecting a language environment typically sets this
 637 variable to a good choice of default coding system for that language
 638 environment.
 639
 640 @kindex C-x RET t
 641 @findex set-terminal-coding-system
 642   The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system})
 643 specifies the coding system for terminal output.  If you specify a
 644 character code for terminal output, all characters output to the
 645 terminal are translated into that coding system.
 646
 647   This feature is useful for certain character-only terminals built to
 648 support specific languages or character sets---for example, European
 649 terminals that support one of the ISO Latin character sets.
 650
 651   By default, output to the terminal is not translated at all.
 652
 653 @kindex C-x RET k
 654 @findex set-keyboard-coding-system
 655   The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system})
 656 specifies the coding system for keyboard input.  Character-code
 657 translation of keyboard input is useful for terminals with keys that
 658 send non-ASCII graphic characters---for example, some terminals designed
 659 for ISO Latin-1 or subsets of it.
 660
 661   By default, keyboard input is not translated at all.
 662
 663   There is a similarity between using a coding system translation for
 664 keyboard input, and using an input method: both define sequences of
 665 keyboard input that translate into single characters.  However, input
 666 methods are designed to be convenient for interactive use by humans, and
 667 the sequences that are translated are typically sequences of ASCII
 668 printing characters.  Coding systems typically translate sequences of
 669 non-graphic characters.
 670
 671 (This feature is non-functional and is temporarily disabled.)
 672
 673 @kindex C-x RET p
 674 @findex set-buffer-process-coding-system
 675   The command @kbd{C-x @key{RET} p} (@code{set-buffer-process-coding-system})
 676 specifies the coding system for input and output to a subprocess.  This
 677 command applies to the current buffer; normally, each subprocess has its
 678 own buffer, and thus you can use this command to specify translation to
 679 and from a particular subprocess by giving the command in the
 680 corresponding buffer.
 681
 682   By default, process input and output are not translated at all.
 683
 684 @vindex file-name-coding-system
 685   The variable @code{file-name-coding-system} specifies a coding system
 686 to use for encoding file names.  If you set the variable to a coding
 687 system name (as a Lisp symbol or a string), SXEmacs encodes file names
 688 using that coding system for all file operations.  This makes it
 689 possible to use non-Latin-1 characters in file names---or, at least,
 690 those non-Latin-1 characters which the specified coding system can
 691 encode.  By default, this variable is @code{nil}, which implies that you
 692 cannot use non-Latin-1 characters in file names.