HOW TO READ & WRITE ARABIC 94.8.4 TAKAHASHI Naoto 1. STARTING UP 1.1 INVOKING MULE You must invoke Mule as an X client if you want to use Arabic. Make sure that the environment variable DISPLAY is properly set. So far only 16 dot font is available for Arabic. First, set you X resources appropriately, then invoke Mule from a shell window with the following command to get enough line spaces. % mule -fsp 0+9 1.2 ENTERING AND LEAVING ARABIC-MODE Hit C-] to enter arabic-mode. Whenever you are in arabic-mode, you are also in visual-mode. Hitting C-] again brings you back from arabic-mode, but you are still in visual-mode. Hitting C-] in visual-mode brings you into arabic-mode. You can exit visual-mode by hitting C-c C-c. See the figure below. C-c C-c +----------------------------------------------+ | +--------------------+ | | | C-c C-c | | V V | | +-------------+ +-----------+ C-] +-----------+ | | C-] |arabic-mode| ------> | | |initial state| ------> | and | |visual-mode| | | |visual-mode| <------ | | +-------------+ +-----------+ C-] +-----------+ The string "Arabic L2R" or "Arabic R2L" in mode-line means that you are in both arabic-mode and visual-mode. If you see "L2R" or "R2L" but not "Arabic" in mode-line, you are in visual-mode but not in arabic-mode. 1.3 DISPLAY DIRECTION Each buffer in Mule has a buffer local variable called "display-direction". If this variable is set to nil (this is the default), the lines begin from the left edge of the screen. On the other hand, if display-direction is non-nil, the lines are aligned to the right and texts are written from right to left. If you are in visual-mode, the value of display-direction is reflected in mode-line: if it is nil "L2R" is displayed; if it is non-nil "R2L" is displayed. In visual-mode, you can set display-direction to nil by typing 'C-c <', and to t by typing 'C-c >'. If you read a file (C-x C-f) which has the extension ".l2r", the buffer automatically goes in visual-mode and display-direction is set to nil. Likewise, if a file has the extension ".r2l", the buffer automatically goes in visual-mode and display-direction is set to t. 2. EDITING ARABIC TEXT 2.1 INPUT In arabic-mode, you can input Arabic characters and Arabic digits from keyboard. To input ASCII characters or ASCII digits, you have to exit arabic-mode by hitting C-]. The translation table is given below. When you are in Arabic-mode, you can see the keyboard layout by C-z. Please note that this table is by no means a fixed one --- it is just a quick hack. Your suggestion on Arabic keyboard layout will be greatly appreciated. translate table in arabic-mode ------------------------------ " isolated hamza a~ madda above alif a' hamza above alif w' hamza above waaw a'' hamza below alif y' hamza above yaa a alif b baa o taa marbuTa t taa c thaa j jiim H Haa K khaa d daal x dhaal r raa z zaay s siin / shiin S Saad D Daad T Taa Z Zaa ` ayn G ghayn f faa q qaaf k kaaf l laam m miim n nuun h haa w waaw A alif maqSura y yaa C chim (Farsi) g gaaf (Farsi) p paa (Farsi) X zhaa (Farsi) _ make connection | cut connection Appropriate ligature is automatically generated whenever a character is input. Special ligature of laam + alif will be generated whenever an alif is input on the left of a laam. If you want to cut the connection between two adjacent Arabic characters, type a `|' (vertical bar) at that point in arabic-mode. An input of a character preceded by a `|' produces a glyph which is not connected to its right adjacent. Typing a `_' (underscore) connects the two characters at that point, if possible. When display-direction is nil (i.e. lines are aligned to left), the cursor stays at the same position after an Arabic character is inserted. It moves to the right after an Arabic digit or an ASCII character is inserted. When display-direction is non-nil (i.e. lines are aligned to right), the cursor moves to the left after an Arabic character is inserted. It stays at the same position after an Arabic digit or an ASCII character is inserted. 2.2 DELETION, KILL & YANK Use C-d to delete the character under the cursor. If you are in arabic-mode, the necessary ligature will be re-generated after the character is deleted. DEL key behave differently according to the value of display-direction: if the value is nil (aligned to left), it deletes a character on the left of the cursor; if the value is non-nil (aligned to right), it deletes a character on the right of the cursor. If the display direction and the input character direction are the same, lastly input character can be deleted with DEL key, no matter what the value of display-direction is. M-d (arabic-kill-word), M-DEL (arabic-backward-kill-word), C-k (arabic-kill-line) and C-w (arabic-kill-region) remove the specified stretch of string and put it in kill-ring. M-w (arabic-copy-region-as-kill) also puts the specified stretch of string in kill-ring, but the original text is left unchanged. The strings in kill ring can be reinserted in buffer by C-y (arabic-yank) and M-y (arabic-yank-pop). Make sure that you are in arabic-mode when you kill or yank something, otherwise ligature is not maintained, or at the worst, unexpected region will be deleted or a garbage string will be inserted in the buffer. 2.3 CURSOR MOTION The following cursor motion commands are supplied in visual-mode and in arabic-mode to handle bi-directional texts easily. All these commands accept an additional prefix numeric argument. key command name function ----------------------------------------------------------------- C-f visual-forward-char move the cursor visually forward by 1 character C-b visual-backward-char move the cursor visually backward by 1 character C-p visual-previous-line move the cursor up by 1 line C-n visual-next-line move the cursor down by 1 line C-a visual-beginning-of-line move the cursor to the visual beginning of line C-e visual-end-of-line move the cursor to the visual end of line M-f visual-forward-word move the cursor visually forward by 1 word M-b visual-backward-word move the cursor visually backward by 1 word M-< visual-beginning-of-buffer move the cursor to the visual beginning of buffer M-> visual-end-of-buffer move the cursor to the visual end of buffer Note that ordinary cursor motion commands (forward-char, backward-char, etc.) behave according to the logical order of the text, whilst the above commands behave according to the visual order. Compare the difference of the two C-f commands. (You can exit visual-mode by typing "C-c C-c".) 2.4 LR COMMANDS Some of you may be confused by the words "forward" and "backward". Here is a summary: display-direction display-direction is nil is non-nil ------------------------------------------------- forward right left backward left right If you are using arrow keys to move the cursor, you may want to move the cursor to left/right no matter what display-direction is. Likewise, you may want the cursor to be put on the left-most column when you hit C-a, and on the right-most column when you hit C-e. In such cases, rewrite the key definitions in visual.el and arabic.el with the following commands. These commands are called "LR commands" because they act according to the absolute direction (left or right) rather than relative direction (forward or backward). ** LR commands in visual-mode ** command name function -------------------------------------------------------- visual-move-to-left-char move the cursor to left by one character visual-move-to-right-char move the cursor to right by one character visual-move-to-left-word move the cursor to left by one word visual-move-to-right-word move the cursor to right by one word visual-left-end-of-line move the cursor to the leftmost column visual-right-end-of-line move the cursor to the rightmost column visual-delete-left-char delete the character on the left of visual point visual-delete-right-char delete the character on the right of visual point visual-kill-left-word kill one word on the left of visual point visual-kill-right-word kill one word on the right of visual point ** LR commands in arabic-mode ** command name function ---------------------------------------------------------- arabic-delete-left-char do visual-delete-left-char and make Arabic ligature arabic-delete-right-char do visual-delete-right-char and make Arabic ligature arabic-kill-left-word do visual-kill-left-word and make Arabic ligature arabic-kill-right-word do visual-kill-right-word and make Arabic ligature 3. HARDCOPY You can use m2ps to get a hardcopy of a file which contains arabic characters. See m2ps.1 for detail. Note that input files to m2ps must be written in *internal* coding system. To save the content of a buffer, use the following command in Mule: C-u C-x C-w _filename_ RET *internal* RET Please note that the current version of m2ps does not support r2l printing direction (flushright mode). If you try to print a file which was created under r2l display direction, it will be printed left-aligned. Furthermore, you may get wrong word order. 4. LIMITATIONS There are many limitations in this release. We need your help. 4.1 NON-SPACING MARKS IN ARABIC Only two non-spacing marks, i.e., madda and hamza, are available in this release. Any other marks, e.g. fatHa (short 'a'), Damma (short 'u'), kasra (short 'i'), shadda (doubling sign), sukuun (no vowel sign), waSla (joining hamza), etc., cannot be displayed. It seems that short vowels and waSla are not necessary to write ordinary Arabic text, but shadda is often marked in Arabic printings. Please let me know if shadda is really indispensable, in that case I will try to implement shadda in some way. 4.2 FILE FORMAT This package uses its own format (coding system) for file I/O. You cannot read the files saved in other format, e.g., ISO 8859-6, ISO 10646, UNICODE, ArabTeX, xaw, etc. As a matter of fact, I do not know what format is mostly used in the world to save Arabic texts. If you have texts saved in certain format and would like to edit them with Mule, please send me the documentation of your format. I will try to implement file I/O routine for that format. 4.3 MISCELLANEOUS LIMITATIONS * Tab does not work if display-direction is non-nil. * transpose commands and rectangle commands do not work in most cases. 5. ADDRESS Bug reports and comments should be sent to this mailing list (mule@etl.go.jp) or directly to me (ntakahas@etl.go.jp). Any kinds of suggestions or demands are greatly appreciated. TAKAHASHI Naoto Electrotechnical Laboratory, Japan ntakahas@etl.go.jp