1 \input texinfo @c -*-texinfo-*-
3 @setfilename internals.info
4 @settitle SXEmacs Internals Manual
8 @dircategory SXEmacs Editor
10 * Internals: (internals). SXEmacs Internals Manual.
13 Copyright @copyright{} 1992 - 1996 Ben Wing.
14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
15 Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation.
16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
17 Copyright @copyright{} 2005, 2006, 2007, 2008 Steve Youngs.
18 Copyright @copyright{} 2007, 2008 Sebastian Freundt
21 Permission is granted to make and distribute verbatim copies of this
22 manual provided the copyright notice and this permission notice are
23 preserved on all copies.
26 Permission is granted to process this file through TeX and print the
27 results, provided the printed document carries copying permission notice
28 identical to this one except for the removal of this paragraph (this
29 paragraph not being relevant to the printed manual).
32 Permission is granted to copy and distribute modified versions of this
33 manual under the conditions for verbatim copying, provided that the
34 entire resulting derived work is distributed under the terms of a
35 permission notice identical to this one.
37 Permission is granted to copy and distribute translations of this manual
38 into another language, under the above conditions for modified versions,
39 except that this permission notice may be stated in a translation
40 approved by the Foundation.
42 Permission is granted to copy and distribute modified versions of this
43 manual under the conditions for verbatim copying, provided also that the
44 section entitled ``GNU General Public License'' is included exactly as
45 in the original, and provided that the entire resulting derived work is
46 distributed under the terms of a permission notice identical to this
49 Permission is granted to copy and distribute translations of this manual
50 into another language, under the above conditions for modified versions,
51 except that the section entitled ``GNU General Public License'' may be
52 included in a translation approved by the Free Software Foundation
53 instead of in the original English.
63 @setchapternewpage odd
67 @title SXEmacs Internals Manual
68 @subtitle Version 1.4, February 2007
71 @author Martin Buchholz
73 @author Matthias Neubauer
74 @author Olivier Galibert
76 @author Sebastian Freundt
81 Copyright @copyright{} 1992 - 1996, 2001 Ben Wing. @*
82 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
83 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
84 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
85 Copyright @copyright{} 2005, 2006, 2007 Steve Youngs.
86 Copyright @copyright{} 2007 Sebastian Freundt
92 Permission is granted to make and distribute verbatim copies of this
93 manual provided the copyright notice and this permission notice are
94 preserved on all copies.
96 Permission is granted to copy and distribute modified versions of this
97 manual under the conditions for verbatim copying, provided also that the
98 section entitled ``GNU General Public License'' is included
99 exactly as in the original, and provided that the entire resulting
100 derived work is distributed under the terms of a permission notice
101 identical to this one.
103 Permission is granted to copy and distribute translations of this manual
104 into another language, under the above conditions for modified versions,
105 except that the section entitled ``GNU General Public License'' may be
106 included in a translation approved by the Free Software Foundation
107 instead of in the original English.
111 @node Top, A History of Emacs, (dir), (dir)
114 This Info file contains v1.5 of the SXEmacs Internals Manual, November, 2005.
118 * A History of Emacs:: Times, dates, important events.
119 * SXEmacs From the Outside:: A broad conceptual overview.
120 * The Lisp Language:: An overview.
121 * SXEmacs From the Perspective of Building::
122 * SXEmacs From the Inside::
123 * The SXEmacs Object System (Abstractly Speaking)::
124 * How Lisp Objects Are Represented in C::
125 * Rules When Writing New C Code::
126 * Regression Testing SXEmacs::
127 * A Summary of the Various SXEmacs Modules::
128 * Allocation of Objects in SXEmacs Lisp::
130 * Events and the Event Loop::
131 * Asynchronous Events; Quit Checking::
132 * Evaluation; Stack Frames; Bindings::
133 * Symbols and Variables::
134 * Buffers and Textual Representation::
135 * MULE Character Sets and Encodings::
136 * The Lisp Reader and Compiler::
138 * Consoles; Devices; Frames; Windows::
139 * The Redisplay Mechanism::
146 * Interface to the X Window System::
147 * Categories:: A categorial approach to access objects
152 --- The Detailed Node Listing ---
156 * Through Version 18:: Unification prevails.
157 * Lucid Emacs:: One version 19 Emacs.
158 * GNU Emacs 19:: The other version 19 Emacs.
159 * GNU Emacs 20:: The other version 20 Emacs.
160 * XEmacs:: The continuation of Lucid Emacs.
162 Rules When Writing New C Code
164 * General Coding Rules::
165 * Writing Lisp Primitives::
166 * Adding Global Lisp Variables::
168 * Techniques for SXEmacs Developers::
172 * Character-Related Data Types::
173 * Working With Character and Byte Positions::
174 * Conversion to and from External Data::
175 * General Guidelines for Writing Mule-Aware Code::
176 * An Example of Mule-Aware Code::
178 Regression Testing SXEmacs
180 A Summary of the Various SXEmacs Modules
182 * Low-Level Modules::
183 * Basic Lisp Modules::
184 * Modules for Standard Editing Operations::
185 * Modules for the Basic Displayable Lisp Objects::
186 * Modules for other Display-Related Lisp Objects::
187 * Modules for the Redisplay Mechanism::
188 * Modules for Interfacing with the File System::
189 * Modules for Other Aspects of the Lisp Interpreter and Object System::
190 * Modules for Interfacing with the Operating System::
191 * Modules for Interfacing with X Windows::
192 * Modules for Internationalization::
193 * Modules for Regression Testing::
195 Allocation of Objects in SXEmacs Lisp
197 * Introduction to Allocation::
198 * Garbage Collection::
200 * Garbage Collection - Step by Step::
201 * Integers and Characters::
202 * Allocation from Frob Blocks::
204 * Low-level allocation::
211 * Compiled Function::
213 Garbage Collection - Step by Step
216 * garbage_collect_1::
219 * sweep_lcrecords_1::
220 * compact_string_chars::
222 * sweep_bit_vectors_1::
227 * Data descriptions::
234 * Address allocation::
239 Events and the Event Loop
241 * Introduction to Events::
243 * Specifics of the Event Gathering Mechanism::
244 * Specifics About the Emacs Event::
246 * Event Stream Callback Routines::
247 * Other Event Loop Functions::
249 * Converting Events::
250 * Dispatching Events; The Command Builder::
252 * Editor-Level Control Flow Modules::
254 Asynchronous Events; Quit Checking
257 * Control-G (Quit) Checking::
259 * Asynchronous Timeouts::
262 Evaluation; Stack Frames; Bindings
265 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
266 * Simple Special Forms::
269 Symbols and Variables
271 * Introduction to Symbols::
275 Buffers and Textual Representation
277 * Introduction to Buffers:: A buffer holds a block of text such as a file.
278 * The Text in a Buffer:: Representation of the text in a buffer.
279 * Buffer Lists:: Keeping track of all buffers.
280 * Markers and Extents:: Tagging locations within a buffer.
281 * Bufbytes and Emchars:: Representation of individual characters.
282 * The Buffer Object:: The Lisp object corresponding to a buffer.
284 MULE Character Sets and Encodings
288 * Internal Mule Encodings::
293 * Japanese EUC (Extended Unix Code)::
296 Internal Mule Encodings
298 * Internal String Encoding::
299 * Internal Character Encoding::
303 * Creating an Lstream:: Creating an lstream object.
304 * Lstream Types:: Different sorts of things that are streamed.
305 * Lstream Functions:: Functions for working with lstreams.
306 * Lstream Methods:: Creating new lstream types.
308 Consoles; Devices; Frames; Windows
310 * Introduction to Consoles; Devices; Frames; Windows::
313 * The Window Object::
315 The Redisplay Mechanism
317 * Critical Redisplay Sections::
319 * Redisplay Piece by Piece::
323 * Introduction to Extents:: Extents are ranges over text, with properties.
324 * Extent Ordering:: How extents are ordered internally.
325 * Format of the Extent Info:: The extent information in a buffer or string.
326 * Zero-Length Extents:: A weird special case.
327 * Mathematics of Extent Ordering:: A rigorous foundation.
328 * Extent Fragments:: Cached information useful for redisplay.
338 @node A History of Emacs, SXEmacs From the Outside, Top, Top
339 @chapter A History of Emacs
340 @cindex history of Emacs, a
341 @cindex Emacs, a history of
342 @cindex Hackers (Steven Levy)
344 @cindex ITS (Incompatible Timesharing System)
345 @cindex Stallman, Richard
350 @cindex Free Software Foundation
352 SXEmacs is a powerful, customizable text editor and development
353 environment. It was forked from the XEmacs 21.4 code base in 2004.
354 XEmacs began as Lucid Emacs, which was in turn derived from GNU Emacs,
355 a program written by Richard Stallman of the Free Software Foundation.
356 GNU Emacs dates back to the 1970's, and was modelled after a package
357 called ``Emacs'', written in 1976, that was a set of macros on top of
358 TECO, an old, old text editor written at MIT on the DEC PDP 10 under
359 one of the earliest time-sharing operating systems, ITS (Incompatible
360 Timesharing System). (ITS dates back well before Unix.) ITS, TECO, and
361 Emacs were products of a group of people at MIT who called themselves
362 ``hackers'', who shared an idealistic belief system about the free
363 exchange of information and were fanatical in their devotion to and
364 time spent with computers. (The hacker subculture dates back to the
365 late 1950's at MIT and is described in detail in Steven Levy's book
366 @cite{Hackers}. This book also includes a lot of information about
367 Stallman himself and the development of Lisp, a programming language
368 developed at MIT that underlies Emacs.)
371 * Through Version 18:: Unification prevails.
372 * Lucid Emacs:: One version 19 Emacs.
373 * GNU Emacs 19:: The other version 19 Emacs.
374 * GNU Emacs 20:: The other version 20 Emacs.
375 * XEmacs:: The continuation of Lucid Emacs.
376 * SXEmacs:: When 2 one true editors isn't enough.
379 @node Through Version 18
380 @section Through Version 18
381 @cindex version 18, through
382 @cindex Gosling, James
383 @cindex Great Usenet Renaming
385 Although the history of the early versions of GNU Emacs is unclear,
386 the history is well-known from the middle of 1985. A time line is:
390 GNU Emacs version 13 (the first public release we know of) was
391 released on March 20, 1985.
393 GNU Emacs version 15 (15.34) was released on May 7, 1985 and
394 shared some code with a version of Emacs written by James Gosling (the
395 same James Gosling who later created the Java language).
397 GNU Emacs version 16 (first released version was 16.56) was released on
398 July 15, 1985. All Gosling code was removed due to potential copyright
399 problems with the code.
401 version 16.57: released on September 16, 1985.
403 versions 16.58, 16.59: released on September 17, 1985.
405 version 16.60: released on September 19, 1985. These later version 16's
406 incorporated patches from the net, esp. for getting Emacs to work under
409 version 17.36 (first official v17 release) released on December 20,
410 1985. Included a TeX-able user manual. First official unpatched
411 version that worked on vanilla System V machines.
413 version 17.43 (second official v17 release) released on January 25,
416 version 17.45 released on January 30, 1986.
418 version 17.46 released on February 4, 1986.
420 version 17.48 released on February 10, 1986.
422 version 17.49 released on February 12, 1986.
424 version 17.55 released on March 18, 1986.
426 version 17.57 released on March 27, 1986.
428 version 17.58 released on April 4, 1986.
430 version 17.61 released on April 12, 1986.
432 version 17.63 released on May 7, 1986.
434 version 17.64 released on May 12, 1986.
436 version 18.24 (a beta version) released on October 2, 1986.
438 version 18.30 (a beta version) released on November 15, 1986.
440 version 18.31 (a beta version) released on November 23, 1986.
442 version 18.32 (a beta version) released on December 7, 1986.
444 version 18.33 (a beta version) released on December 12, 1986.
446 version 18.35 (a beta version) released on January 5, 1987.
448 version 18.36 (a beta version) released on January 21, 1987.
450 January 27, 1987: The Great Usenet Renaming. net.emacs is now
453 version 18.37 (a beta version) released on February 12, 1987.
455 version 18.38 (a beta version) released on March 3, 1987.
457 version 18.39 (a beta version) released on March 14, 1987.
459 version 18.40 (a beta version) released on March 18, 1987.
461 version 18.41 (the first ``official'' release) released on March 22,
464 version 18.45 released on June 2, 1987.
466 version 18.46 released on June 9, 1987.
468 version 18.47 released on June 18, 1987.
470 version 18.48 released on September 3, 1987.
472 version 18.49 released on September 18, 1987.
474 version 18.50 released on February 13, 1988.
476 version 18.51 released on May 7, 1988.
478 version 18.52 released on September 1, 1988.
480 version 18.53 released on February 24, 1989.
482 version 18.54 released on April 26, 1989.
484 version 18.55 released on August 23, 1989. This is the earliest version
485 that is still available by FTP.
487 version 18.56 released on January 17, 1991.
489 version 18.57 released late January, 1991.
491 version 18.58 released ?????.
493 version 18.59 released October 31, 1992.
503 Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
504 C++ and Lisp development environments. It began when Lucid decided they
505 wanted to use Emacs as the editor and cornerstone of their C++
506 development environment (called ``Energize''). They needed many features
507 that were not available in the existing version of GNU Emacs (version
508 18.5something), in particular good and integrated support for GUI
509 elements such as mouse support, multiple fonts, multiple window-system
510 windows, etc. A branch of GNU Emacs called Epoch, written at the
511 University of Illinois, existed that supplied many of these features;
512 however, Lucid needed more than what existed in Epoch. At the time, the
513 Free Software Foundation was working on version 19 of Emacs (this was
514 sometime around 1991), which was planned to have similar features, and
515 so Lucid decided to work with the Free Software Foundation. Their plan
516 was to add features that they needed, and coordinate with the FSF so
517 that the features would get included back into Emacs version 19.
519 Delays in the release of version 19 occurred, however (resulting in it
520 finally being released more than a year after what was initially
521 planned), and Lucid encountered unexpected technical resistance in
522 getting their changes merged back into version 19, so they decided to
523 release their own version of Emacs, which became Lucid Emacs 19.0.
525 @cindex Zawinski, Jamie
526 @cindex Sexton, Harlan
528 @cindex Devin, Matthieu
529 The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
530 and Eric Benson, and the work was later taken over by Jamie Zawinski,
531 who became ``Mr. Lucid Emacs'' for many releases.
533 A time line for Lucid Emacs is
537 version 19.0 shipped with Energize 1.0, April 1992.
539 version 19.1 released June 4, 1992.
541 version 19.2 released June 19, 1992.
543 version 19.3 released September 9, 1992.
545 version 19.4 released January 21, 1993.
547 version 19.5 was a repackaging of 19.4 with a few bug fixes and
548 shipped with Energize 2.0. Never released to the net.
550 version 19.6 released April 9, 1993.
552 version 19.7 was a repackaging of 19.6 with a few bug fixes and
553 shipped with Energize 2.1. Never released to the net.
555 version 19.8 released September 6, 1993.
557 version 19.9 released January 12, 1994.
559 version 19.10 released May 27, 1994.
561 version 19.11 (first XEmacs) released September 13, 1994.
563 version 19.12 released June 23, 1995.
565 version 19.13 released September 1, 1995.
567 version 19.14 released June 23, 1996.
569 version 20.0 released February 9, 1997.
571 version 19.15 released March 28, 1997.
573 version 20.1 (not released to the net) April 15, 1997.
575 version 20.2 released May 16, 1997.
577 version 19.16 released October 31, 1997.
579 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
582 version 20.4 released February 28, 1998.
584 version 21.1.2 released May 14, 1999. (The version naming scheme was
585 changed at this point: [a] the second version number is odd for stable
586 versions, even for beta versions; [b] a third version number is added,
587 replacing the "beta xxx" ending for beta versions and allowing for
588 periodic maintenance releases for stable versions. Therefore, 21.0 was
589 never "officially" released; similarly for 21.2, etc.)
591 version 21.1.3 released June 26, 1999.
593 version 21.1.4 released July 8, 1999.
595 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
597 version 21.1.7 released September 26, 1999.
599 version 21.1.8 released November 2, 1999.
601 version 21.1.9 released February 13, 2000.
603 version 21.1.10 released May 7, 2000.
605 version 21.1.10a released June 24, 2000.
607 version 21.1.11 released July 18, 2000.
609 version 21.1.12 released August 5, 2000.
611 version 21.1.13 released January 7, 2001.
613 version 21.1.14 released January 27, 2001.
617 @section GNU Emacs 19
619 @cindex Emacs 19, GNU
620 @cindex version 19, GNU Emacs
623 About a year after the initial release of Lucid Emacs, the FSF
624 released a beta of their version of Emacs 19 (referred to here as ``GNU
625 Emacs''). By this time, the current version of Lucid Emacs was
626 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
627 19.7.) A time line for GNU Emacs version 19 is
631 version 19.8 (beta) released May 27, 1993.
633 version 19.9 (beta) released May 27, 1993.
635 version 19.10 (beta) released May 30, 1993.
637 version 19.11 (beta) released June 1, 1993.
639 version 19.12 (beta) released June 2, 1993.
641 version 19.13 (beta) released June 8, 1993.
643 version 19.14 (beta) released June 17, 1993.
645 version 19.15 (beta) released June 19, 1993.
647 version 19.16 (beta) released July 6, 1993.
649 version 19.17 (beta) released late July, 1993.
651 version 19.18 (beta) released August 9, 1993.
653 version 19.19 (beta) released August 15, 1993.
655 version 19.20 (beta) released November 17, 1993.
657 version 19.21 (beta) released November 17, 1993.
659 version 19.22 (beta) released November 28, 1993.
661 version 19.23 (beta) released May 17, 1994.
663 version 19.24 (beta) released May 16, 1994.
665 version 19.25 (beta) released June 3, 1994.
667 version 19.26 (beta) released September 11, 1994.
669 version 19.27 (beta) released September 14, 1994.
671 version 19.28 (first ``official'' release) released November 1, 1994.
673 version 19.29 released June 21, 1995.
675 version 19.30 released November 24, 1995.
677 version 19.31 released May 25, 1996.
679 version 19.32 released July 31, 1996.
681 version 19.33 released August 11, 1996.
683 version 19.34 released August 21, 1996.
685 version 19.34b released September 6, 1996.
688 @cindex Mlynarik, Richard
689 In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
690 worse. Lucid soon began incorporating features from GNU Emacs 19 into
691 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
692 working on and using GNU Emacs for a long time (back as far as version
696 @section GNU Emacs 20
698 @cindex Emacs 20, GNU
699 @cindex version 20, GNU Emacs
702 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first
703 release was made in September of that year.
705 A time line for Emacs 20 is
709 version 20.1 released September 17, 1997.
711 version 20.2 released September 20, 1997.
713 version 20.3 released August 19, 1998.
720 @cindex Sun Microsystems
721 @cindex University of Illinois
722 @cindex Illinois, University of
724 @cindex Andreessen, Marc
726 @cindex Buchholz, Martin
727 @cindex Kaplan, Simon
729 @cindex Thompson, Chuck
732 @cindex Amdahl Corporation
733 Around the time that Lucid was developing Energize, Sun Microsystems
734 was developing their own development environment (called ``SPARCWorks'')
735 and also decided to use Emacs. They joined forces with the Epoch team
736 at the University of Illinois and later with Lucid. The maintainer of
737 the last-released version of Epoch was Marc Andreessen, but he dropped
738 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
739 away from a system administration job to become the primary Lucid Emacs
740 author for Epoch and Sun. Chuck's area of specialty became the
741 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
742 a ported version from Epoch and then later rewrote it from scratch).
743 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
744 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
745 contract to fix some event problems but later became a many-year
746 involvement, punctuated by a six-month contract with Amdahl Corporation.
748 @cindex rename to XEmacs
749 In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
750 not favorable to either company); the first release called XEmacs was
751 version 19.11. In June 1994, Lucid folded and Jamie quit to work for
752 the newly formed Mosaic Communications Corp., later Netscape
753 Communications Corp. (co-founded by the same Marc Andreessen, who had
754 quit his Epoch job to work on a graphical browser for the World Wide
755 Web). Chuck then become the primary maintainer of XEmacs, and put out
756 versions 19.11 through 19.14 in conjunction with Ben. For 19.12 and
757 19.13, Chuck added the new redisplay and many other display improvements
758 and Ben added MULE support (support for Asian and other languages) and
759 redesigned most of the internal Lisp subsystems to better support the
760 MULE work and the various other features being added to XEmacs. After
761 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
763 @cindex MULE merged XEmacs appears
764 Soon after 19.13 was released, work began in earnest on the MULE
765 internationalization code and the source tree was divided into two
766 development paths. The MULE version was initially called 19.20, but was
767 soon renamed to 20.0. In 1996 Martin Buchholz of Sun Microsystems took
768 over the care and feeding of it and worked on it in parallel with the
769 19.14 development that was occurring at the same time. After much work
770 by Martin, it was decided to release 20.0 ahead of 19.15 in February
771 1997. The source tree remained divided until 20.2 when the version 19
772 source was finally retired at version 19.16.
775 @cindex Buchholz, Martin
777 @cindex Niksic, Hrvoje
778 @cindex XEmacs goes it alone
779 In 1997, Sun finally dropped all pretense of support for XEmacs and
780 Martin Buchholz left the company in November. Since then, and mostly
781 for the previous year, because Steve Baur was never paid to work on
782 XEmacs, XEmacs has existed solely on the contributions of volunteers
783 from the Free Software Community. Starting from 1997, Hrvoje Niksic and
784 Kyle Jones have figured prominently in XEmacs development.
786 @cindex merging attempts
787 Many attempts have been made to merge XEmacs and GNU Emacs, but they
788 have consistently failed.
790 A more detailed history is contained in the SXEmacs About page.
792 A time line for XEmacs is
796 version 19.11 (first XEmacs) released September 13, 1994.
798 version 19.12 released June 23, 1995.
800 version 19.13 released September 1, 1995.
802 version 19.14 released June 23, 1996.
804 version 20.0 released February 9, 1997.
806 version 19.15 released March 28, 1997.
808 version 20.1 (not released to the net) April 15, 1997.
810 version 20.2 released May 16, 1997.
812 version 19.16 released October 31, 1997.
814 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
817 version 20.4 released February 28, 1998.
819 version 21.0.60 released December 10, 1998. (The version naming scheme was
820 changed at this point: [a] the second version number is odd for stable
821 versions, even for beta versions; [b] a third version number is added,
822 replacing the "beta xxx" ending for beta versions and allowing for
823 periodic maintenance releases for stable versions. Therefore, 21.0 was
824 never "officially" released; similarly for 21.2, etc.)
826 version 21.0.61 released January 4, 1999.
828 version 21.0.63 released February 3, 1999.
830 version 21.0.64 released March 1, 1999.
832 version 21.0.65 released March 5, 1999.
834 version 21.0.66 released March 12, 1999.
836 version 21.0.67 released March 25, 1999.
838 version 21.1.2 released May 14, 1999. (This is the followup to 21.0.67.
839 The second version number was bumped to indicate the beginning of the
842 version 21.1.3 released June 26, 1999.
844 version 21.1.4 released July 8, 1999.
846 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
848 version 21.1.7 released September 26, 1999.
850 version 21.1.8 released November 2, 1999.
852 version 21.1.9 released February 13, 2000.
854 version 21.1.10 released May 7, 2000.
856 version 21.1.10a released June 24, 2000.
858 version 21.1.11 released July 18, 2000.
860 version 21.1.12 released August 5, 2000.
862 version 21.1.13 released January 7, 2001.
864 version 21.1.14 released January 27, 2001.
866 version 21.2.9 released February 3, 1999.
868 version 21.2.10 released February 5, 1999.
870 version 21.2.11 released March 1, 1999.
872 version 21.2.12 released March 5, 1999.
874 version 21.2.13 released March 12, 1999.
876 version 21.2.14 released May 14, 1999.
878 version 21.2.15 released June 4, 1999.
880 version 21.2.16 released June 11, 1999.
882 version 21.2.17 released June 22, 1999.
884 version 21.2.18 released July 14, 1999.
886 version 21.2.19 released July 30, 1999.
888 version 21.2.20 released November 10, 1999.
890 version 21.2.21 released November 28, 1999.
892 version 21.2.22 released November 29, 1999.
894 version 21.2.23 released December 7, 1999.
896 version 21.2.24 released December 14, 1999.
898 version 21.2.25 released December 24, 1999.
900 version 21.2.26 released December 31, 1999.
902 version 21.2.27 released January 18, 2000.
904 version 21.2.28 released February 7, 2000.
906 version 21.2.29 released February 16, 2000.
908 version 21.2.30 released February 21, 2000.
910 version 21.2.31 released February 23, 2000.
912 version 21.2.32 released March 20, 2000.
914 version 21.2.33 released May 1, 2000.
916 version 21.2.34 released May 28, 2000.
918 version 21.2.35 released July 19, 2000.
920 version 21.2.36 released October 4, 2000.
922 version 21.2.37 released November 14, 2000.
924 version 21.2.38 released December 5, 2000.
926 version 21.2.39 released December 31, 2000.
928 version 21.2.40 released January 8, 2001.
930 version 21.2.41 released January 17, 2001.
932 version 21.2.42 released January 20, 2001.
934 version 21.2.43 released January 26, 2001.
936 version 21.2.44 released February 8, 2001.
938 version 21.2.45 released February 23, 2001.
940 version 21.2.46 released March 21, 2001.
942 version 21.2.47 released April 14, 2001.
945 At this point another change in the version numbering scheme occurred.
946 From now on, even numbered minor versions are the stable (and gamma)
947 releases, and odd numbered minor versions are beta releases. It was the
948 same numbering scheme that the Linux kernel used (prior to 2.6.x
951 XEmacs release time line (stable/gamma 21.4.0 to present day)
955 version 21.4.0 released April 16, 2001. (21.2.47 beta, promoted to gamma)
957 version 21.4.1 released April 19, 2001.
959 version 21.4.2 released May 10, 2001.
961 version 21.4.3 released May 17, 2001.
963 version 21.4.4 released July 28, 2001.
965 version 21.4.5 released October 23, 2001.
967 version 21.4.6 released December 17, 2001.
969 version 21.4.7 released May 4, 2002.
971 version 21.4.8 released May 9, 2002.
973 version 21.4.9 released August 23, 2002.
975 version 21.4.10 released November 2, 2002.
977 version 21.4.11 released January 3, 2003.
979 version 21.4.12 released January 15, 2003.
981 version 21.4.13 released May 25, 2003.
983 version 21.4.14 released September 3, 2003. (gamma promoted to stable)
985 version 21.4.15 released February 2, 2004.
987 version 21.4.16 released December 5, 2004. (SXEmacs forked from here)
989 version 21.4.17 released February 6, 2005.
992 XEmacs release time line (beta 21.5.0 to present day)
996 version 21.5.0 released April 18, 2001. (continuation of 21.2.47 beta)
998 version 21.5.1 released May 9, 2001.
1000 version 21.5.2 released July 28, 2001.
1002 version 21.5.3 released September 7, 2001.
1004 version 21.5.4 released January 8, 2002.
1006 version 21.5.5 released March 5, 2002.
1008 version 21.5.6 released April 5, 2002.
1010 version 21.5.7 released July 2, 2002.
1012 version 21.5.8 released July 27, 2002.
1014 version 21.5.9 released August 30, 2002.
1016 version 21.5.10 released January 4, 2003.
1018 version 21.5.11 released February 16, 2003.
1020 version 21.5.12 released April 24, 2003.
1022 version 21.5.13 released May 10, 2003.
1024 version 21.5.14 released June 1, 2003.
1026 version 21.5.15 released September 3, 2003.
1028 version 21.5.16 released September 26, 2003.
1030 version 21.5.17 released March 22, 2004.
1032 version 21.5.18 released October 22, 2004.
1034 version 21.5.19 released February 18, 2005.
1036 version 21.5.20 released March 11, 2005.
1038 version 21.5.21 released May 28, 2005.
1040 version 21.5.22 released September 14, 2005.
1042 version 21.5.23 released October 26, 2005.
1048 @cindex Youngs, Steve
1049 @cindex Freundt, Sebastian
1050 @cindex Zajcev, Evgeny
1051 @cindex Ferreira, Nelson
1053 Somewhere back in the late part of 2001 Steve Youngs was starting to
1054 become more and more dissatisfied and disillusioned with the direction
1055 that the XEmacs project was taking. Then late one, particularly dark,
1056 night, after consuming way too much coffee and smoking far too many
1057 cigarettes (there could have been alcohol involved too, the details are
1058 kind of fuzzy now...), he started having thoughts. Thoughts that, quite
1059 frankly, made him question his own sanity. The crazy notion of forking
1060 the XEmacs project was born that night.
1062 It would be more than 2 years before Steve would even speak of the
1063 idea. I think he was scared that the men in the white coats would come
1064 and take him away if he did. But eventually he found some like-minded
1065 people and in the latter half of 2004 the SXEmacs Project became a
1066 reality. At this point we still hadn't decided from where in the XEmacs
1067 code base we would fork from.
1069 And then, on December 6th, 2004 at 03:24 (GMT) the XEmacs 21.4.16 source
1070 code was imported into SXEmacs' main source repository. Three weeks
1071 later, SXEmacs was announced to the world
1072 @uref{http://www.sxemacs.org/pipermail/sxemacs-devel/2004-December/000224.html,
1075 It should be noted that even though some feathers were ruffled by the
1076 fork, the dust settled very quickly. There is @emph{no} animosity
1077 between the two projects. As a matter of fact, the SXEmacs maintainer
1078 (Steve Youngs) is still an active member of the XEmacs Review Board.
1080 @cindex time line, SXEmacs
1081 A time line for SXEmacs releases to date is...
1085 sxemacs--main--22.1.0--version-0 released December 30, 2004.
1087 sxemacs--main--22.1.1--version-0 released January 31, 2005.
1089 sxemacs--main--22.1.2--version-0 released May 16, 2005.
1091 sxemacs--main--22.1.3--version-0 released December 22, 2005.
1093 sxemacs--main--22.1.4--version-0 released May 1, 2006.
1095 sxemacs--main--22.1.6--version-0 released December 6, 2006.
1097 sxemacs--main--22.1.7--version-0 released August 25, 2007.
1099 sxemacs--main--22.1.8--version-0 released February 9, 2008.
1101 sxemacs--main--22.1.9--version-0 released June 6, 2008.
1104 @node SXEmacs From the Outside, The Lisp Language, A History of Emacs, Top
1105 @chapter SXEmacs From the Outside
1106 @cindex SXEmacs from the outside
1107 @cindex outside, SXEmacs from the
1108 @cindex read-eval-print
1110 SXEmacs appears to the outside world as an editor, but it is really a
1111 Lisp environment. At its heart is a Lisp interpreter; it also
1112 ``happens'' to contain many specialized object types (e.g. buffers,
1113 windows, frames, events) that are useful for implementing an editor.
1114 Some of these objects (in particular windows and frames) have
1115 displayable representations, and SXEmacs provides a function
1116 @code{redisplay()} that ensures that the display of all such objects
1117 matches their internal state. Most of the time, a standard Lisp
1118 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
1119 code, execute it, and print the results''. SXEmacs has a similar loop:
1125 dispatch the event (i.e. ``do it'')
1130 Reading an event is done using the Lisp function @code{next-event},
1131 which waits for something to happen (typically, the user presses a key
1132 or moves the mouse) and returns an event object describing this.
1133 Dispatching an event is done using the Lisp function
1134 @code{dispatch-event}, which looks up the event in a keymap object (a
1135 particular kind of object that associates an event with a Lisp function)
1136 and calls that function. The function ``does'' what the user has
1137 requested by changing the state of particular frame objects, buffer
1138 objects, etc. Finally, @code{redisplay()} is called, which updates the
1139 display to reflect those changes just made. Thus is an ``editor'' born.
1141 @cindex bridge, playing
1142 @cindex taxes, doing
1143 @cindex pi, calculating
1144 Note that you do not have to use SXEmacs as an editor; you could just
1145 as well make it do your taxes, compute pi, play bridge, etc. You'd just
1146 have to write functions to do those operations in Lisp.
1148 @node The Lisp Language, SXEmacs From the Perspective of Building, SXEmacs From the Outside, Top
1149 @chapter The Lisp Language
1150 @cindex Lisp language, the
1153 @cindex Lisp vs. Java
1154 @cindex Java vs. Lisp
1155 @cindex dynamic scoping
1156 @cindex scoping, dynamic
1157 @cindex dynamic types
1158 @cindex types, dynamic
1161 @cindex Gosling, James
1163 Lisp is a general-purpose language that is higher-level than C and in
1164 many ways more powerful than C. Powerful dialects of Lisp such as
1165 Common Lisp are probably much better languages for writing very large
1166 applications than is C. (Unfortunately, for many non-technical
1167 reasons C and its successor C++ have become the dominant languages for
1168 application development. These languages are both inadequate for
1169 extremely large applications, which is evidenced by the fact that newer,
1170 larger programs are becoming ever harder to write and are requiring ever
1171 more programmers despite great increases in C development environments;
1172 and by the fact that, although hardware speeds and reliability have been
1173 growing at an exponential rate, most software is still generally
1174 considered to be slow and buggy.)
1176 The new Java language holds promise as a better general-purpose
1177 development language than C. Java has many features in common with
1178 Lisp that are not shared by C (this is not a coincidence, since
1179 Java was designed by James Gosling, a former Lisp hacker). This
1180 will be discussed more later.
1182 For those used to C, here is a summary of the basic differences between
1187 Lisp has an extremely regular syntax. Every function, expression,
1188 and control statement is written in the form
1191 (@var{func} @var{arg1} @var{arg2} ...)
1194 This is as opposed to C, which writes functions as
1197 func(@var{arg1}, @var{arg2}, ...)
1200 but writes expressions involving operators as (e.g.)
1203 @var{arg1} + @var{arg2}
1206 and writes control statements as (e.g.)
1209 while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
1212 Lisp equivalents of the latter two would be
1215 (+ @var{arg1} @var{arg2} ...)
1221 (while @var{expr} @var{statement1} @var{statement2} ...)
1225 Lisp is a safe language. Assuming there are no bugs in the Lisp
1226 interpreter/compiler, it is impossible to write a program that ``core
1227 dumps'' or otherwise causes the machine to execute an illegal
1228 instruction. This is very different from C, where perhaps the most
1229 common outcome of a bug is exactly such a crash. A corollary of this is that
1230 the C operation of casting a pointer is impossible (and unnecessary) in
1231 Lisp, and that it is impossible to access memory outside the bounds of
1235 Programs and data are written in the same form. The
1236 parenthesis-enclosing form described above for statements is the same
1237 form used for the most common data type in Lisp, the list. Thus, it is
1238 possible to represent any Lisp program using Lisp data types, and for
1239 one program to construct Lisp statements and then dynamically
1240 @dfn{evaluate} them, or cause them to execute.
1243 All objects are @dfn{dynamically typed}. This means that part of every
1244 object is an indication of what type it is. A Lisp program can
1245 manipulate an object without knowing what type it is, and can query an
1246 object to determine its type. This means that, correspondingly,
1247 variables and function parameters can hold objects of any type and are
1248 not normally declared as being of any particular type. This is opposed
1249 to the @dfn{static typing} of C, where variables can hold exactly one
1250 type of object and must be declared as such, and objects do not contain
1251 an indication of their type because it's implicit in the variables they
1252 are stored in. It is possible in C to have a variable hold different
1253 types of objects (e.g. through the use of @code{void *} pointers or
1254 variable-argument functions), but the type information must then be
1255 passed explicitly in some other fashion, leading to additional program
1259 Allocated memory is automatically reclaimed when it is no longer in use.
1260 This operation is called @dfn{garbage collection} and involves looking
1261 through all variables to see what memory is being pointed to, and
1262 reclaiming any memory that is not pointed to and is thus
1263 ``inaccessible'' and out of use. This is as opposed to C, in which
1264 allocated memory must be explicitly reclaimed using @code{free()}. If
1265 you simply drop all pointers to memory without freeing it, it becomes
1266 ``leaked'' memory that still takes up space. Over a long period of
1267 time, this can cause your program to grow and grow until it runs out of
1271 Lisp has built-in facilities for handling errors and exceptions. In C,
1272 when an error occurs, usually either the program exits entirely or the
1273 routine in which the error occurs returns a value indicating this. If
1274 an error occurs in a deeply-nested routine, then every routine currently
1275 called must unwind itself normally and return an error value back up to
1276 the next routine. This means that every routine must explicitly check
1277 for an error in all the routines it calls; if it does not do so,
1278 unexpected and often random behavior results. This is an extremely
1279 common source of bugs in C programs. An alternative would be to do a
1280 non-local exit using @code{longjmp()}, but that is often very dangerous
1281 because the routines that were exited past had no opportunity to clean
1282 up after themselves and may leave things in an inconsistent state,
1283 causing a crash shortly afterwards.
1285 Lisp provides mechanisms to make such non-local exits safe. When an
1286 error occurs, a routine simply signals that an error of a particular
1287 class has occurred, and a non-local exit takes place. Any routine can
1288 trap errors occurring in routines it calls by registering an error
1289 handler for some or all classes of errors. (If no handler is registered,
1290 a default handler, generally installed by the top-level event loop, is
1291 executed; this prints out the error and continues.) Routines can also
1292 specify cleanup code (called an @dfn{unwind-protect}) that will be
1293 called when control exits from a block of code, no matter how that exit
1294 occurs---i.e. even if a function deeply nested below it causes a
1295 non-local exit back to the top level.
1297 Note that this facility has appeared in some recent vintages of C, in
1298 particular Visual C++ and other PC compilers written for the Microsoft
1302 In Emacs Lisp, local variables are @dfn{dynamically scoped}. This means
1303 that if you declare a local variable in a particular function, and then
1304 call another function, that subfunction can ``see'' the local variable
1305 you declared. This is actually considered a bug in Emacs Lisp and in
1306 all other early dialects of Lisp, and was corrected in Common Lisp. (In
1307 Common Lisp, you can still declare dynamically scoped variables if you
1308 want to---they are sometimes useful---but variables by default are
1309 @dfn{lexically scoped} as in C.)
1312 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
1313 early dialect of Lisp developed at MIT (no relation to the Macintosh
1314 computer). There is a Common Lisp compatibility package available for
1315 Emacs that provides many of the features of Common Lisp.
1317 The Java language is derived in many ways from C, and shares a similar
1318 syntax, but has the following features in common with Lisp (and different
1323 Java is a safe language, like Lisp.
1325 Java provides garbage collection, like Lisp.
1327 Java has built-in facilities for handling errors and exceptions, like
1330 Java has a type system that combines the best advantages of both static
1331 and dynamic typing. Objects (except very simple types) are explicitly
1332 marked with their type, as in dynamic typing; but there is a hierarchy
1333 of types and functions are declared to accept only certain types, thus
1334 providing the increased compile-time error-checking of static typing.
1337 The Java language also has some negative attributes:
1341 Java uses the edit/compile/run model of software development. This
1342 makes it hard to use interactively. For example, to use Java like
1343 @code{bc} it is necessary to write a special purpose, albeit tiny,
1344 application. In Emacs Lisp, a calculator comes built-in without any
1345 effort - one can always just type an expression in the @code{*scratch*}
1348 Java tries too hard to enforce, not merely enable, portability, making
1349 ordinary access to standard OS facilities painful. Java has an
1350 @dfn{agenda}. I think this is why @code{chdir} is not part of standard
1351 Java, which is inexcusable.
1354 Unfortunately, there is no perfect language. Static typing allows a
1355 compiler to catch programmer errors and produce more efficient code, but
1356 makes programming more tedious and less fun. For the foreseeable future,
1357 an Ideal Editing and Programming Environment (and that is what SXEmacs
1358 aspires to) will be programmable in multiple languages: high level ones
1359 like Lisp for user customization and prototyping, and lower level ones
1360 for infrastructure and industrial strength applications. If I had my
1361 way, SXEmacs would be friendly towards the Python, Scheme, C++, ML,
1362 etc... communities. But there are serious technical difficulties to
1363 achieving that goal.
1365 The word @dfn{application} in the previous paragraph was used
1366 intentionally. SXEmacs implements an API for programs written in Lisp
1367 that makes it a full-fledged application platform, very much like an OS
1370 @node SXEmacs From the Perspective of Building, SXEmacs From the Inside, The Lisp Language, Top
1371 @chapter SXEmacs From the Perspective of Building
1372 @cindex SXEmacs from the perspective of building
1373 @cindex building, SXEmacs from the perspective of
1375 The heart of SXEmacs is the Lisp environment, which is written in C.
1376 This is contained in the @file{src/} subdirectory. Underneath
1377 @file{src/} are two subdirectories of header files: @file{s/} (header
1378 files for particular operating systems) and @file{m/} (header files for
1379 particular machine types). In practice the distinction between the two
1380 types of header files is blurred. These header files define or undefine
1381 certain preprocessor constants and macros to indicate particular
1382 characteristics of the associated machine or operating system. As part
1383 of the configure process, one @file{s/} file and one @file{m/} file is
1384 identified for the particular environment in which SXEmacs is being
1387 SXEmacs also contains a great deal of Lisp code. This implements the
1388 operations that make SXEmacs useful as an editor as well as just a Lisp
1389 environment, and also contains many add-on packages that allow SXEmacs to
1390 browse directories, act as a mail and Usenet news reader, compile Lisp
1391 code, etc. There is actually more Lisp code than C code associated with
1392 SXEmacs, but much of the Lisp code is peripheral to the actual operation
1393 of the editor. The Lisp code all lies in subdirectories underneath the
1394 @file{lisp/} directory.
1396 The @file{lwlib/} directory contains C code that implements a
1397 generalized interface onto different X widget toolkits and also
1398 implements some widgets of its own that behave like Motif widgets but
1399 are faster, free, and in some cases more powerful. The code in this
1400 directory compiles into a library and is mostly independent from SXEmacs.
1402 The @file{etc/} directory contains various data files associated with
1403 SXEmacs. Some of them are actually read by SXEmacs at startup; others
1404 merely contain useful information of various sorts.
1406 The @file{lib-src/} directory contains C code for various auxiliary
1407 programs that are used in connection with SXEmacs. Some of them are used
1408 during the build process; others are used to perform certain functions
1409 that cannot conveniently be placed in the SXEmacs executable (e.g. the
1410 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1411 which must be setgid to @file{mail} on many systems; and the
1412 @file{gnuclient} program, which allows an external script to communicate
1413 with a running SXEmacs process).
1415 The @file{man/} directory contains the sources for the SXEmacs
1416 documentation. It is mostly in a form called Texinfo, which can be
1417 converted into either a printed document (by passing it through @TeX{})
1418 or into on-line documentation called @dfn{info files}.
1420 The @file{info/} directory contains the results of formatting the SXEmacs
1421 documentation as @dfn{info files}, for on-line use. These files are
1422 used when you enter the Info system using @kbd{C-h i} or through the
1425 The other directories contain various miscellaneous code and information
1426 that is not normally used or needed.
1428 The first step of building involves running the @file{configure} program
1429 and passing it various parameters to specify any optional features you
1430 want and compiler arguments and such, as described in the @file{INSTALL}
1431 file. This determines what the build environment is, chooses the
1432 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1433 determine many details about your environment, such as which library
1434 functions are available and exactly how they work. The reason for
1435 running these tests is that it allows SXEmacs to be compiled on a much
1436 wider variety of platforms than those that the SXEmacs developers happen
1437 to be familiar with, including various sorts of hybrid platforms. This
1438 is especially important now that many operating systems give you a great
1439 deal of control over exactly what features you want installed, and allow
1440 for easy upgrading of parts of a system without upgrading the rest. It
1441 would be impossible to pre-determine and pre-specify the information for
1442 all possible configurations.
1444 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1445 since they contain unmaintainable platform-specific hard-coded
1446 information. SXEmacs has been moving in the direction of having all
1447 system-specific information be determined dynamically by
1448 @file{configure}. Perhaps someday we can @code{rm -rf src/s src/m}.
1450 When configure is done running, it generates @file{Makefile}s and
1451 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1452 the features of your system) from template files. You then run
1453 @file{make}, which compiles the auxiliary code and programs in
1454 @file{lib-src/} and @file{lwlib/} and the main SXEmacs executable in
1455 @file{src/}. The result of compiling and linking is an executable
1456 called @file{temacs}, which is @emph{not} the final SXEmacs executable.
1457 @file{temacs} by itself is not intended to function as an editor or even
1458 display any windows on the screen, and if you simply run it, it will
1459 exit immediately. The @file{Makefile} runs @file{temacs} with certain
1460 options that cause it to initialize itself, read in a number of basic
1461 Lisp files, and then dump itself out into a new executable called
1462 @file{xemacs}. This new executable has been pre-initialized and
1463 contains pre-digested Lisp code that is necessary for the editor to
1464 function (this includes most basic editing functions,
1465 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1466 primitives; some initialization code that is called when certain
1467 objects, such as frames, are created; and all of the standard
1468 keybindings and code for the actions they result in). This executable,
1469 @file{xemacs}, is the executable that you run to use the SXEmacs editor.
1471 Although @file{temacs} is not intended to be run as an editor, it can,
1472 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1473 This is useful when the dumping procedure described above is broken, or
1474 when using certain program debugging tools such as Purify. These tools
1475 get mighty confused by the tricks played by the SXEmacs build process,
1476 such as allocation memory in one process, and freeing it in the next.
1478 @node SXEmacs From the Inside, The SXEmacs Object System (Abstractly Speaking), SXEmacs From the Perspective of Building, Top
1479 @chapter SXEmacs From the Inside
1480 @cindex SXEmacs from the inside
1481 @cindex inside, SXEmacs from the
1483 Internally, SXEmacs is quite complex, and can be very confusing. To
1484 simplify things, it can be useful to think of SXEmacs as containing an
1485 event loop that ``drives'' everything, and a number of other subsystems,
1486 such as a Lisp engine and a redisplay mechanism. Each of these other
1487 subsystems exists simultaneously in SXEmacs, and each has a certain
1488 state. The flow of control continually passes in and out of these
1489 different subsystems in the course of normal operation of the editor.
1491 It is important to keep in mind that, most of the time, the editor is
1492 ``driven'' by the event loop. Except during initialization and batch
1493 mode, all subsystems are entered directly or indirectly through the
1494 event loop, and ultimately, control exits out of all subsystems back up
1495 to the event loop. This cycle of entering a subsystem, exiting back out
1496 to the event loop, and starting another iteration of the event loop
1497 occurs once each keystroke, mouse motion, etc.
1499 If you're trying to understand a particular subsystem (other than the
1500 event loop), think of it as a ``daemon'' process or ``servant'' that is
1501 responsible for one particular aspect of a larger system, and
1502 periodically receives commands or environment changes that cause it to
1503 do something. Ultimately, these commands and environment changes are
1504 always triggered by the event loop. For example:
1508 The window and frame mechanism is responsible for keeping track of what
1509 windows and frames exist, what buffers are in them, etc. It is
1510 periodically given commands (usually from the user) to make a change to
1511 the current window/frame state: i.e. create a new frame, delete a
1515 The buffer mechanism is responsible for keeping track of what buffers
1516 exist and what text is in them. It is periodically given commands
1517 (usually from the user) to insert or delete text, create a buffer, etc.
1518 When it receives a text-change command, it notifies the redisplay
1522 The redisplay mechanism is responsible for making sure that windows and
1523 frames are displayed correctly. It is periodically told (by the event
1524 loop) to actually ``do its job'', i.e. snoop around and see what the
1525 current state of the environment (mostly of the currently-existing
1526 windows, frames, and buffers) is, and make sure that state matches
1527 what's actually displayed. It keeps lots and lots of information around
1528 (such as what is actually being displayed currently, and what the
1529 environment was last time it checked) so that it can minimize the work
1530 it has to do. It is also helped along in that whenever a relevant
1531 change to the environment occurs, the redisplay mechanism is told about
1532 this, so it has a pretty good idea of where it has to look to find
1533 possible changes and doesn't have to look everywhere.
1536 The Lisp engine is responsible for executing the Lisp code in which most
1537 user commands are written. It is entered through a call to @code{eval}
1538 or @code{funcall}, which occurs as a result of dispatching an event from
1539 the event loop. The functions it calls issue commands to the buffer
1540 mechanism, the window/frame subsystem, etc.
1543 The Lisp allocation subsystem is responsible for keeping track of Lisp
1544 objects. It is given commands from the Lisp engine to allocate objects,
1545 garbage collect, etc.
1550 The important idea here is that there are a number of independent
1551 subsystems each with its own responsibility and persistent state, just
1552 like different employees in a company, and each subsystem is
1553 periodically given commands from other subsystems. Commands can flow
1554 from any one subsystem to any other, but there is usually some sort of
1555 hierarchy, with all commands originating from the event subsystem.
1557 SXEmacs is entered in @code{main()}, which is in @file{emacs.c}. When
1558 this is called the first time (in a properly-invoked @file{temacs}), it
1563 It does some very basic environment initializations, such as determining
1564 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1565 and setting up signal handlers.
1567 It initializes the entire Lisp interpreter.
1569 It sets the initial values of many built-in variables (including many
1570 variables that are visible to Lisp programs), such as the global keymap
1571 object and the built-in faces (a face is an object that describes the
1572 display characteristics of text). This involves creating Lisp objects
1573 and thus is dependent on step (2).
1575 It performs various other initializations that are relevant to the
1576 particular environment it is running in, such as retrieving environment
1577 variables, determining the current date and the user who is running the
1578 program, examining its standard input, creating any necessary file
1581 At this point, the C initialization is complete. A Lisp program that
1582 was specified on the command line (usually @file{loadup.el}) is called
1583 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1584 @file{loadup.el} loads all of the other Lisp files that are needed for
1585 the operation of the editor, calls the @code{dump-emacs} function to
1586 write out @file{xemacs}, and then kills the temacs process.
1589 When @file{xemacs} is then run, it only redoes steps (1) and (4)
1590 above; all variables already contain the values they were set to when
1591 the executable was dumped, and all memory that was allocated with
1592 @code{malloc()} is still around. (SXEmacs knows whether it is being run
1593 as @file{xemacs} or @file{temacs} because it sets the global variable
1594 @code{initialized} to 1 after step (4) above.) At this point,
1595 @file{xemacs} calls a Lisp function to do any further initialization,
1596 which includes parsing the command-line (the C code can only do limited
1597 command-line parsing, which includes looking for the @samp{-batch} and
1598 @samp{-l} flags and a few other flags that it needs to know about before
1599 initialization is complete), creating the first frame (or @dfn{window}
1600 in standard window-system parlance), running the user's init file
1601 (usually the file @file{.emacs} in the user's home directory), etc. The
1602 function to do this is usually called @code{normal-top-level};
1603 @file{loadup.el} tells the C code about this function by setting its
1604 name as the value of the Lisp variable @code{top-level}.
1606 When the Lisp initialization code is done, the C code enters the event
1607 loop, and stays there for the duration of the SXEmacs process. The code
1608 for the event loop is contained in @file{cmdloop.c}, and is called
1609 @code{Fcommand_loop_1()}. Note that this event loop could very well be
1610 written in Lisp, and in fact a Lisp version exists; but apparently,
1611 doing this makes SXEmacs run noticeably slower.
1613 Notice how much of the initialization is done in Lisp, not in C.
1614 In general, SXEmacs tries to move as much code as is possible
1615 into Lisp. Code that remains in C is code that implements the
1616 Lisp interpreter itself, or code that needs to be very fast, or
1617 code that needs to do system calls or other such stuff that
1618 needs to be done in C, or code that needs to have access to
1619 ``forbidden'' structures. (One conscious aspect of the design of
1620 Lisp under SXEmacs is a clean separation between the external
1621 interface to a Lisp object's functionality and its internal
1622 implementation. Part of this design is that Lisp programs
1623 are forbidden from accessing the contents of the object other
1624 than through using a standard API. In this respect, SXEmacs Lisp
1625 is similar to modern Lisp dialects but differs from GNU Emacs,
1626 which tends to expose the implementation and allow Lisp
1627 programs to look at it directly. The major advantage of
1628 hiding the implementation is that it allows the implementation
1629 to be redesigned without affecting any Lisp programs, including
1630 those that might want to be ``clever'' by looking directly at
1631 the object's contents and possibly manipulating them.)
1633 Moving code into Lisp makes the code easier to debug and maintain and
1634 makes it much easier for people who are not SXEmacs developers to
1635 customize SXEmacs, because they can make a change with much less chance
1636 of obscure and unwanted interactions occurring than if they were to
1639 @node The SXEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, SXEmacs From the Inside, Top
1640 @chapter The SXEmacs Object System (Abstractly Speaking)
1641 @cindex SXEmacs object system (abstractly speaking), the
1642 @cindex object system (abstractly speaking), the SXEmacs
1644 At the heart of the Lisp interpreter is its management of objects.
1645 SXEmacs Lisp contains many built-in objects, some of which are
1646 simple and others of which can be very complex; and some of which
1647 are very common, and others of which are rarely used or are only
1648 used internally. (Since the Lisp allocation system, with its
1649 automatic reclamation of unused storage, is so much more convenient
1650 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1651 in its internal operations.)
1653 The basic Lisp objects are
1657 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1658 reason for this is described below when the internal Lisp object
1659 representation is described.
1661 Same precision as a double in C.
1663 A simple container for two Lisp objects, used to implement lists and
1664 most other data structures in Lisp.
1666 An object representing a single character of text; chars behave like
1667 integers in many ways but are logically considered text rather than
1668 numbers and have a different read syntax. (the read syntax for a char
1669 contains the char itself or some textual encoding of it---for example,
1670 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1671 ISO-2022 encoding standard---rather than the numerical representation
1672 of the char; this way, if the mapping between chars and integers
1673 changes, which is quite possible for Kanji characters and other extended
1674 characters, the same character will still be created. Note that some
1675 primitives confuse chars and integers. The worst culprit is @code{eq},
1676 which makes a special exception and considers a char to be @code{eq} to
1677 its integer equivalent, even though in no other case are objects of two
1678 different types @code{eq}. The reason for this monstrosity is
1679 compatibility with existing code; the separation of char from integer
1680 came fairly recently.)
1682 An object that contains Lisp objects and is referred to by name;
1683 symbols are used to implement variables and named functions
1684 and to provide the equivalent of preprocessor constants in C.
1686 A one-dimensional array of Lisp objects providing constant-time access
1687 to any of the objects; access to an arbitrary object in a vector is
1688 faster than for lists, but the operations that can be done on a vector
1691 Self-explanatory; behaves much like a vector of chars
1692 but has a different read syntax and is stored and manipulated
1695 A vector of bits; similar to a string in spirit.
1696 @item compiled-function
1697 An object containing compiled Lisp code, known as @dfn{byte code}.
1699 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1703 Note that there is no basic ``function'' type, as in more powerful
1704 versions of Lisp (where it's called a @dfn{closure}). SXEmacs Lisp does
1705 not provide the closure semantics implemented by Common Lisp and Scheme.
1706 The guts of a function in SXEmacs Lisp are represented in one of four
1707 ways: a symbol specifying another function (when one function is an
1708 alias for another), a list (whose first element must be the symbol
1709 @code{lambda}) containing the function's source code, a
1710 compiled-function object, or a subr object. (In other words, given a
1711 symbol specifying the name of a function, calling @code{symbol-function}
1712 to retrieve the contents of the symbol's function cell will return one
1713 of these types of objects.)
1715 SXEmacs Lisp also contains numerous specialized objects used to implement
1720 Stores text like a string, but is optimized for insertion and deletion
1721 and has certain other properties that can be set.
1723 An object with various properties whose displayable representation is a
1724 @dfn{window} in window-system parlance.
1726 A section of a frame that displays the contents of a buffer;
1727 often called a @dfn{pane} in window-system parlance.
1728 @item window-configuration
1729 An object that represents a saved configuration of windows in a frame.
1731 An object representing a screen on which frames can be displayed;
1732 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1735 An object specifying the appearance of text or graphics; it has
1736 properties such as font, foreground color, and background color.
1738 An object that refers to a particular position in a buffer and moves
1739 around as text is inserted and deleted to stay in the same relative
1740 position to the text around it.
1742 Similar to a marker but covers a range of text in a buffer; can also
1743 specify properties of the text, such as a face in which the text is to
1744 be displayed, whether the text is invisible or unmodifiable, etc.
1746 Generated by calling @code{next-event} and contains information
1747 describing a particular event happening in the system, such as the user
1748 pressing a key or a process terminating.
1750 An object that maps from events (described using lists, vectors, and
1751 symbols rather than with an event object because the mapping is for
1752 classes of events, rather than individual events) to functions to
1753 execute or other events to recursively look up; the functions are
1754 described by name, using a symbol, or using lists to specify the
1757 An object that describes the appearance of an image (e.g. pixmap) on
1758 the screen; glyphs can be attached to the beginning or end of extents
1759 and in some future version of SXEmacs will be able to be inserted
1760 directly into a buffer.
1762 An object that describes a connection to an externally-running process.
1765 There are some other, less-commonly-encountered general objects:
1769 An object that maps from an arbitrary Lisp object to another arbitrary
1770 Lisp object, using hashing for fast lookup.
1772 A limited form of hash-table that maps from strings to symbols; obarrays
1773 are used to look up a symbol given its name and are not actually their
1774 own object type but are kludgily represented using vectors with hidden
1775 fields (this representation derives from GNU Emacs).
1777 A complex object used to specify the value of a display property; a
1778 default value is given and different values can be specified for
1779 particular frames, buffers, windows, devices, or classes of device.
1781 An object that maps from chars or classes of chars to arbitrary Lisp
1782 objects; internally char tables use a complex nested-vector
1783 representation that is optimized to the way characters are represented
1786 An object that maps from ranges of integers to arbitrary Lisp objects.
1789 And some strange special-purpose objects:
1793 @itemx coding-system
1794 Objects used when MULE, or multi-lingual/Asian-language, support is
1796 @item color-instance
1797 @itemx font-instance
1798 @itemx image-instance
1799 An object that encapsulates a window-system resource; instances are
1800 mostly used internally but are exposed on the Lisp level for cleanness
1801 of the specifier model and because it's occasionally useful for Lisp
1802 program to create or query the properties of instances.
1804 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1805 window-system child window that is drawn into by an external process;
1806 this object should be integrated into the glyph system but isn't yet,
1807 and may change form when this is done.
1808 @item toolbar-button
1809 An object used in conjunction with the toolbar.
1812 And objects that are only used internally:
1816 A generic object for encapsulating arbitrary memory; this allows you the
1817 generality of @code{malloc()} and the convenience of the Lisp object
1820 A buffering I/O stream, used to provide a unified interface to anything
1821 that can accept output or provide input, such as a file descriptor, a
1822 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1823 it's a Lisp object to make its memory management more convenient.
1824 @item char-table-entry
1825 Subsidiary objects in the internal char-table representation.
1826 @item extent-auxiliary
1829 Various special-purpose objects that are basically just used to
1830 encapsulate memory for particular subsystems, similar to the more
1831 general ``opaque'' object.
1832 @item symbol-value-forward
1833 @itemx symbol-value-buffer-local
1834 @itemx symbol-value-varalias
1835 @itemx symbol-value-lisp-magic
1836 Special internal-only objects that are placed in the value cell of a
1837 symbol to indicate that there is something special with this variable --
1838 e.g. it has no value, it mirrors another variable, or it mirrors some C
1839 variable; there is really only one kind of object, called a
1840 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1841 semi-different object types.
1844 @cindex permanent objects
1845 @cindex temporary objects
1846 Some types of objects are @dfn{permanent}, meaning that once created,
1847 they do not disappear until explicitly destroyed, using a function such
1848 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1849 Others will disappear once they are not longer used, through the garbage
1850 collection mechanism. Buffers, frames, windows, devices, and processes
1851 are among the objects that are permanent. Note that some objects can go
1852 both ways: Faces can be created either way; extents are normally
1853 permanent, but detached extents (extents not referring to any text, as
1854 happens to some extents when the text they are referring to is deleted)
1855 are temporary. Note that some permanent objects, such as faces and
1856 coding systems, cannot be deleted. Note also that windows are unique in
1857 that they can be @emph{undeleted} after having previously been
1858 deleted. (This happens as a result of restoring a window configuration.)
1861 Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1862 specifying an object of that type in Lisp code. When you load a Lisp
1863 file, or type in code to be evaluated, what really happens is that the
1864 function @code{read} is called, which reads some text and creates an object
1865 based on the syntax of that text; then @code{eval} is called, which
1866 possibly does something special; then this loop repeats until there's
1867 no more text to read. (@code{eval} only actually does something special
1868 with symbols, which causes the symbol's value to be returned,
1869 similar to referencing a variable; and with conses [i.e. lists],
1870 which cause a function invocation. All other values are returned
1879 converts to an integer whose value is 17297.
1885 converts to a float whose value is 1.983e-4, or .0001983.
1891 converts to a char that represents the lowercase letter b.
1897 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1898 particular Kanji character when using an ISO2022-based coding system for
1899 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1900 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1901 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1902 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1903 of characters [subtract 33 from the ASCII value of each character to get
1904 the corresponding index]; @samp{ESC (} is a class of escape sequences
1905 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1906 to US ASCII''. It is a coincidence that the letter @samp{B} is used to
1907 denote both Japanese Kanji and US ASCII. If the first @samp{B} were
1908 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1909 from the GB2312 character set.)
1915 converts to a string.
1921 converts to a symbol whose name is @code{"foobar"}. This is done by
1922 looking up the string equivalent in the global variable
1923 @code{obarray}, whose contents should be an obarray. If no symbol
1924 is found, a new symbol with the name @code{"foobar"} is automatically
1925 created and added to @code{obarray}; this process is called
1926 @dfn{interning} the symbol.
1933 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1939 converts to a three-element list containing the specified objects
1940 (note that a list is actually a set of nested conses; see the
1941 SXEmacs Lisp Reference).
1947 converts to a three-element vector containing the specified objects.
1953 converts to a compiled-function object (the actual contents are not
1954 shown since they are not relevant here; look at a file that ends with
1955 @file{.elc} for examples).
1961 converts to a bit-vector.
1964 #s(hash-table ... ...)
1967 converts to a hash table (the actual contents are not shown).
1970 #s(range-table ... ...)
1973 converts to a range table (the actual contents are not shown).
1976 #s(char-table ... ...)
1979 converts to a char table (the actual contents are not shown).
1981 Note that the @code{#s()} syntax is the general syntax for structures,
1982 which are not really implemented in SXEmacs Lisp but should be.
1984 When an object is printed out (using @code{print} or a related
1985 function), the read syntax is used, so that the same object can be read
1988 The other objects do not have read syntaxes, usually because it does not
1989 really make sense to create them in this fashion (i.e. processes, where
1990 it doesn't make sense to have a subprocess created as a side effect of
1991 reading some Lisp code), or because they can't be created at all
1992 (e.g. subrs). Permanent objects, as a rule, do not have a read syntax;
1993 nor do most complex objects, which contain too much state to be easily
1994 initialized through a read syntax.
1996 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The SXEmacs Object System (Abstractly Speaking), Top
1997 @chapter How Lisp Objects Are Represented in C
1998 @cindex Lisp objects are represented in C, how
1999 @cindex objects are represented in C, how Lisp
2000 @cindex represented in C, how Lisp objects are
2002 Lisp objects are represented in C using a 32-bit or 64-bit machine word
2003 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
2004 most other processors use 32-bit Lisp objects). The representation
2005 stuffs a pointer together with a tag, as follows:
2008 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
2009 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
2011 <---------------------------------------------------------> <->
2012 a pointer to a structure, or an integer tag
2015 A tag of 00 is used for all pointer object types, a tag of 10 is used
2016 for characters, and the other two tags 01 and 11 are joined together to
2017 form the integer object type. This representation gives us 31 bit
2018 integers and 30 bit characters, while pointers are represented directly
2019 without any bit masking or shifting. This representation, though,
2020 assumes that pointers to structs are always aligned to multiples of 4,
2021 so the lower 2 bits are always zero.
2023 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
2024 used for the Lisp object can vary. It is a simple type (@code{long} on
2025 the DEC Alpha, @code{int} on other machines).
2027 Various macros are used to convert between Lisp_Objects and the
2028 corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()},
2029 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
2030 masking and cast it to the appropriate type. @code{XINT()} needs to be
2031 a bit tricky so that negative numbers are properly sign-extended. Since
2032 integers are stored left-shifted, if the right-shift operator does an
2033 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
2034 than shifting in a zero, so that it mimics a divide-by-two even for
2035 negative numbers) the shift to remove the tag bit is enough. This is
2036 the case on all the systems we support.
2038 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
2039 macros become more complicated---they check the tag bits and/or the
2040 type field in the first four bytes of a record type to ensure that the
2041 object is really of the correct type. This is great for catching places
2042 where an incorrect type is being dereferenced---this typically results
2043 in a pointer being dereferenced as the wrong type of structure, with
2044 unpredictable (and sometimes not easily traceable) results.
2046 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
2047 object. These macros are of the form @code{XSET@var{TYPE}
2048 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
2049 than just used in an expression. The reason for this is that standard C
2050 doesn't let you ``construct'' a structure (but GCC does). Granted, this
2051 sometimes isn't too convenient; for the case of integers, at least, you
2052 can use the function @code{make_int()}, which constructs and
2053 @emph{returns} an integer Lisp object. Note that the
2054 @code{XSET@var{TYPE}()} macros are also affected by
2055 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
2056 right type in the case of record types, where the type is contained in
2059 The C programmer is responsible for @strong{guaranteeing} that a
2060 Lisp_Object is the correct type before using the @code{X@var{TYPE}}
2061 macros. This is especially important in the case of lists. Use
2062 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
2063 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not
2064 Lisp code. On the other hand, if SXEmacs has an internal logic error,
2065 it's better to crash immediately, so sprinkle @code{assert()}s and
2066 ``unreachable'' @code{abort()}s liberally about the source code. Where
2067 performance is an issue, use @code{type_checking_assert},
2068 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
2069 nothing unless the corresponding configure error checking flag was
2072 @node Rules When Writing New C Code, Regression Testing SXEmacs, How Lisp Objects Are Represented in C, Top
2073 @chapter Rules When Writing New C Code
2074 @cindex writing new C code, rules when
2075 @cindex C code, rules when writing new
2076 @cindex code, rules when writing new C
2078 The SXEmacs C Code is extremely complex and intricate, and there are many
2079 rules that are more or less consistently followed throughout the code.
2080 Many of these rules are not obvious, so they are explained here. It is
2081 of the utmost importance that you follow them. If you don't, you may
2082 get something that appears to work, but which will crash in odd
2083 situations, often in code far away from where the actual breakage is.
2086 * A Readers Guide to SXEmacs Coding Conventions::
2087 * General Coding Rules::
2088 * Writing Lisp Primitives::
2089 * Writing Good Comments::
2090 * Adding Global Lisp Variables::
2091 * Proper Use of Unsigned Types::
2093 * Techniques for SXEmacs Developers::
2096 @node A Readers Guide to SXEmacs Coding Conventions
2097 @section A Readers Guide to SXEmacs Coding Conventions
2098 @cindex coding conventions
2099 @cindex readers guide
2100 @cindex coding rules, naming
2102 Of course the low-level implementation language of SXEmacs is C, but much
2103 of that uses the Lisp engine to do its work. However, because the code
2104 is ``inside'' of the protective containment shell around the ``reactor
2105 core,'' you'll see lots of complex ``plumbing'' needed to do the work
2106 and ``safety mechanisms,'' whose failure results in a meltdown. This
2107 section provides a quick overview (or review) of the various components
2108 of the implementation of Lisp objects.
2110 Two typographic conventions help to identify C objects that implement
2111 Lisp objects. The first is that capitalized identifiers, especially
2112 beginning with the letters @samp{Q}, @samp{V}, @samp{F}, and @samp{S},
2113 for C variables and functions, and C macros with beginning with the
2114 letter @samp{X}, are used to implement Lisp. The second is that where
2115 Lisp uses the hyphen @samp{-} in symbol names, the corresponding C
2116 identifiers use the underscore @samp{_}. Of course, since SXEmacs Lisp
2117 contains interfaces to many external libraries, those external names
2118 will follow the coding conventions their authors chose, and may overlap
2119 the ``SXEmacs name space.'' However these cases are usually pretty
2122 All Lisp objects are handled indirectly. The @code{Lisp_Object}
2123 type is usually a pointer to a structure, except for a very small number
2124 of types with immediate representations (currently characters and
2125 integers). However, these types cannot be directly operated on in C
2126 code, either, so they can also be considered indirect. Types that do
2127 not have an immediate representation always have a C typedef
2128 @code{Lisp_@var{type}} for a corresponding structure.
2129 @c #### mention l(c)records here?
2131 In older code, it was common practice to pass around pointers to
2132 @code{Lisp_@var{type}}, but this is now deprecated in favor of using
2133 @code{Lisp_Object} for all function arguments and return values that are
2134 Lisp objects. The @code{X@var{type}} macro is used to extract the
2135 pointer and cast it to @code{(Lisp_@var{type} *)} for the desired type.
2137 @strong{Convention}: macros whose names begin with @samp{X} operate on
2138 @code{Lisp_Object}s and do no type-checking. Many such macros are type
2139 extractors, but others implement Lisp operations in C (@emph{e.g.},
2140 @code{XCAR} implements the Lisp @code{car} function). These are unsafe,
2141 and must only be used where types of all data have already been checked.
2142 Such macros are only applied to @code{Lisp_Object}s. In internal
2143 implementations where the pointer has already been converted, the
2144 structure is operated on directly using the C @code{->} member access
2147 The @code{@var{type}P}, @code{CHECK_@var{type}}, and
2148 @code{CONCHECK_@var{type}} macros are used to test types. The first
2149 returns a Boolean value, and the latter signal errors. (The
2150 @samp{CONCHECK} variety allows execution to be CONtinued under some
2151 circumstances, thus the name.) Functions which expect to be passed user
2152 data invariably call @samp{CHECK} macros on arguments.
2154 There are many types of specialized Lisp objects implemented in C, but
2155 the most pervasive type is the @dfn{symbol}. Symbols are used as
2156 identifiers, variables, and functions.
2158 @strong{Convention}: Global variables whose names begin with @samp{Q}
2159 are constants whose value is a symbol. The name of the variable should
2160 be derived from the name of the symbol using the same rules as for Lisp
2161 primitives. Such variables allow the C code to check whether a
2162 particular @code{Lisp_Object} is equal to a given symbol. Symbols are
2163 Lisp objects, so these variables may be passed to Lisp primitives. (An
2164 alternative to the use of @samp{Q...} variables is to call the
2165 @code{intern} function at initialization in the
2166 @code{vars_of_@var{module}} function, which is hardly less efficient.)
2168 @strong{Convention}: Global variables whose names begin with @samp{V}
2169 are variables that contain Lisp objects. The convention here is that
2170 all global variables of type @code{Lisp_Object} begin with @samp{V}, and
2171 no others do (not even integer and boolean variables that have Lisp
2172 equivalents). Most of the time, these variables have equivalents in
2173 Lisp, which are defined via the @samp{DEFVAR} family of macros, but some
2174 don't. Since the variable's value is a @code{Lisp_Object}, it can be
2175 passed to Lisp primitives.
2177 The implementation of Lisp primitives is more complex.
2178 @strong{Convention}: Global variables with names beginning with @samp{S}
2179 contain a structure that allows the Lisp engine to identify and call a C
2180 function. In modern versions of SXEmacs, these identifiers are almost
2181 always completely hidden in the @code{DEFUN} and @code{SUBR} macros, but
2182 you will encounter them if you look at very old versions of SXEmacs or at
2183 GNU Emacs. @strong{Convention}: Functions with names beginning with
2184 @samp{F} implement Lisp primitives. Of course all their arguments and
2185 their return values must be Lisp_Objects. (This is hidden in the
2186 @code{DEFUN} macro.)
2189 @node General Coding Rules
2190 @section General Coding Rules
2191 @cindex coding rules, general
2193 @xref{Coding Style,,,sppm}, is a good preamble for this section.
2195 Every module includes @file{<config.h>} (angle brackets so that
2196 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
2197 the same directory as the C sources) and @file{lisp.h}. @file{config.h}
2198 must always be included before any other header files (including
2199 system header files) to ensure that certain tricks played by various
2200 @file{s/} and @file{m/} files work out correctly.
2202 When including header files, always use angle brackets, not double
2203 quotes, except when the file to be included is always in the same
2204 directory as the including file. If either file is a generated file,
2205 then that is not likely to be the case. In order to understand why we
2206 have this rule, imagine what happens when you do a build in the source
2207 directory using @samp{./configure} and another build in another
2208 directory using @samp{../work/configure}. There will be two different
2209 @file{config.h} files. Which one will be used if you @samp{#include
2212 Almost every module contains a @code{syms_of_*()} function and a
2213 @code{vars_of_*()} function. The former declares any Lisp primitives
2214 you have defined and defines any symbols you will be using. The latter
2215 declares any global Lisp variables you have added and initializes global
2216 C variables in the module. @strong{Important}: There are stringent
2217 requirements on exactly what can go into these functions. See the
2218 comment in @file{emacs.c}. The reason for this is to avoid obscure
2219 unwanted interactions during initialization. If you don't follow these
2220 rules, you'll be sorry! If you want to do anything that isn't allowed,
2221 create a @code{complex_vars_of_*()} function for it. Doing this is
2222 tricky, though: you have to make sure your function is called at the
2223 right time so that all the initialization dependencies work out.
2225 Declare each function of these kinds in @file{symsinit.h}. Make sure
2226 it's called in the appropriate place in @file{emacs.c}. You never need
2227 to include @file{symsinit.h} directly, because it is included by
2230 @strong{All global and static variables that are to be modifiable must
2231 be declared uninitialized.} This means that you may not use the
2232 ``declare with initializer'' form for these variables, such as @code{int
2233 some_variable = 0;}. The reason for this has to do with some kludges
2234 done during the dumping process: If possible, the initialized data
2235 segment is re-mapped so that it becomes part of the (unmodifiable) code
2236 segment in the dumped executable. This allows this memory to be shared
2237 among multiple running SXEmacs processes. SXEmacs is careful to place as
2238 much constant data as possible into initialized variables during the
2239 @file{temacs} phase.
2241 @cindex copy-on-write
2242 @strong{Please note:} This kludge only works on a few systems nowadays,
2243 and is rapidly becoming irrelevant because most modern operating systems
2244 provide @dfn{copy-on-write} semantics. All data is initially shared
2245 between processes, and a private copy is automatically made (on a
2246 page-by-page basis) when a process first attempts to write to a page of
2249 Formerly, there was a requirement that static variables not be declared
2250 inside of functions. This had to do with another hack along the same
2251 vein as what was just described: old USG systems put statically-declared
2252 variables in the initialized data space, so those header files had a
2253 @code{#define static} declaration. (That way, the data-segment remapping
2254 described above could still work.) This fails badly on static variables
2255 inside of functions, which suddenly become automatic variables;
2256 therefore, you weren't supposed to have any of them. This awful kludge
2257 has been removed in SXEmacs because
2261 almost all of the systems that used this kludge ended up having
2262 to disable the data-segment remapping anyway;
2264 the only systems that didn't were extremely outdated ones;
2266 this hack completely messed up inline functions.
2269 The C source code makes heavy use of C preprocessor macros. One popular
2273 #define FOO(var, value) do @{ \
2274 Lisp_Object FOO_value = (value); \
2275 ... /* compute using FOO_value */ \
2280 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
2281 statement semantics, so that it can safely be used within an @code{if}
2282 statement in C, for example. Multiple evaluation is prevented by
2283 copying a supplied argument into a local variable, so that
2284 @code{FOO(var,fun(1))} only calls @code{fun} once.
2286 Lisp lists are popular data structures in the C code as well as in
2287 Elisp. There are two sets of macros that iterate over lists.
2288 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
2289 supplied by the user, and cannot be trusted to be acyclic and
2290 @code{nil}-terminated. A @code{malformed-list} or @code{circular-list} error
2291 will be generated if the list being iterated over is not entirely
2292 kosher. @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
2293 safe, and can be used only on trusted lists.
2295 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
2296 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
2297 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
2298 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
2299 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
2302 @node Writing Lisp Primitives
2303 @section Writing Lisp Primitives
2304 @cindex writing Lisp primitives
2305 @cindex Lisp primitives, writing
2306 @cindex primitives, writing Lisp
2308 Lisp primitives are Lisp functions implemented in C. The details of
2309 interfacing the C function so that Lisp can call it are handled by a few
2310 C macros. The only way to really understand how to write new C code is
2311 to read the source, but we can explain some things here.
2313 An example of a special form is the definition of @code{prog1}, from
2314 @file{eval.c}. (An ordinary function would have the same general
2317 @cindex garbage collection protection
2320 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
2321 Similar to `progn', but the value of the first form is returned.
2322 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
2323 The value of FIRST is saved during evaluation of the remaining args,
2324 whose values are discarded.
2328 /* This function can GC */
2329 REGISTER Lisp_Object val, form, tail;
2330 struct gcpro gcpro1;
2332 val = Feval (XCAR (args));
2336 LIST_LOOP_3 (form, XCDR (args), tail)
2345 Let's start with a precise explanation of the arguments to the
2346 @code{DEFUN} macro. Here is a template for them:
2350 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
2359 This string is the name of the Lisp symbol to define as the function
2360 name; in the example above, it is @code{"prog1"}.
2363 This is the C function name for this function. This is the name that is
2364 used in C code for calling the function. The name is, by convention,
2365 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
2366 Lisp name changed to underscores. Thus, to call this function from C
2367 code, call @code{Fprog1}. Remember that the arguments are of type
2368 @code{Lisp_Object}; various macros and functions for creating values of
2369 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
2371 Primitives whose names are special characters (e.g. @code{+} or
2372 @code{<}) are named by spelling out, in some fashion, the special
2373 character: e.g. @code{Fplus()} or @code{Flss()}. Primitives whose names
2374 begin with normal alphanumeric characters but also contain special
2375 characters are spelled out in some creative way, e.g. @code{let*}
2376 becomes @code{FletX()}.
2378 Each function also has an associated structure that holds the data for
2379 the subr object that represents the function in Lisp. This structure
2380 conveys the Lisp symbol name to the initialization routine that will
2381 create the symbol and store the subr object as its definition. The C
2382 variable name of this structure is always @samp{S} prepended to the
2383 @var{fname}. You hardly ever need to be aware of the existence of this
2384 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
2388 This is the minimum number of arguments that the function requires. The
2389 function @code{prog1} allows a minimum of one argument.
2392 This is the maximum number of arguments that the function accepts, if
2393 there is a fixed maximum. Alternatively, it can be @code{UNEVALLED},
2394 indicating a special form that receives unevaluated arguments, or
2395 @code{MANY}, indicating an unlimited number of evaluated arguments (the
2396 C equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY}
2397 are macros. If @var{max_args} is a number, it may not be less than
2398 @var{min_args} and it may not be greater than 8. (If you need to add a
2399 function with more than 8 arguments, use the @code{MANY} form. Resist
2400 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}. If
2401 you do it anyways, make sure to also add another clause to the switch
2402 statement in @code{primitive_funcall().})
2405 This is an interactive specification, a string such as might be used as
2406 the argument of @code{interactive} in a Lisp function. In the case of
2407 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
2408 cannot be called interactively. A value of @code{""} indicates a
2409 function that should receive no arguments when called interactively.
2412 This is the documentation string. It is written just like a
2413 documentation string for a function defined in Lisp; in particular, the
2414 first line should be a single sentence. Note how the documentation
2415 string is enclosed in a comment, none of the documentation is placed on
2416 the same lines as the comment-start and comment-end characters, and the
2417 comment-start characters are on the same line as the interactive
2418 specification. @file{make-docfile}, which scans the C files for
2419 documentation strings, is very particular about what it looks for, and
2420 will not properly extract the doc string if it's not in this exact format.
2422 In order to make both @file{etags} and @file{make-docfile} happy, make
2423 sure that the @code{DEFUN} line contains the @var{lname} and
2424 @var{fname}, and that the comment-start characters for the doc string
2425 are on the same line as the interactive specification, and put a newline
2426 directly after them (and before the comment-end characters).
2429 This is the comma-separated list of arguments to the C function. For a
2430 function with a fixed maximum number of arguments, provide a C argument
2431 for each Lisp argument. In this case, unlike regular C functions, the
2432 types of the arguments are not declared; they are simply always of type
2435 The names of the C arguments will be used as the names of the arguments
2436 to the Lisp primitive as displayed in its documentation, modulo the same
2437 concerns described above for @code{F...} names (in particular,
2438 underscores in the C arguments become dashes in the Lisp arguments).
2440 There is one additional kludge: A trailing `_' on the C argument is
2441 discarded when forming the Lisp argument. This allows C language
2442 reserved words (like @code{default}) or global symbols (like
2443 @code{dirname}) to be used as argument names without compiler warnings
2446 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2447 @w{@dfn{special form}}; its arguments are not evaluated. Instead it
2448 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2449 unevaluated arguments, conventionally named @code{(args)}.
2451 When a Lisp function has no upper limit on the number of arguments,
2452 specify @w{@var{max_args} = @code{MANY}}. In this case its implementation in
2453 C actually receives exactly two arguments: the number of Lisp arguments
2454 (an @code{int}) and the address of a block containing their values (a
2455 @w{@code{Lisp_Object *}}). In this case only are the C types specified
2456 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2460 Within the function @code{Fprog1} itself, note the use of the macros
2461 @code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect''
2462 a variable from garbage collection---to inform the garbage collector
2463 that it must look in that variable and regard the object pointed at by
2464 its contents as an accessible object. This is necessary whenever you
2465 call @code{Feval} or anything that can directly or indirectly call
2466 @code{Feval} (this includes the @code{QUIT} macro!). At such a time,
2467 any Lisp object that you intend to refer to again must be protected
2468 somehow. @code{UNGCPRO} cancels the protection of the variables that
2469 are protected in the current function. It is necessary to do this
2472 The macro @code{GCPRO1} protects just one local variable. If you want
2473 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2474 not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2476 These macros implicitly use local variables such as @code{gcpro1}; you
2477 must declare these explicitly, with type @code{struct gcpro}. Thus, if
2478 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2480 @cindex caller-protects (@code{GCPRO} rule)
2481 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2482 only responsible for protecting those Lisp objects that you create. Any
2483 objects passed to you as arguments should have been protected by whoever
2484 created them, so you don't in general have to protect them.
2486 In particular, the arguments to any Lisp primitive are always
2487 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2488 bytecode. So only a few Lisp primitives that are called frequently from
2489 C code, such as @code{Fprogn} protect their arguments as a service to
2490 their caller. You don't need to protect your arguments when writing a
2493 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2494 SXEmacs coding. It is @strong{extremely} important that you get this
2495 right and use a great deal of discipline when writing this code.
2496 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2498 What @code{DEFUN} actually does is declare a global structure of type
2499 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2500 contains information about the primitive (e.g. a pointer to the
2501 function, its minimum and maximum allowed arguments, a string describing
2502 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2503 using the @code{F...} name. The Lisp subr object that is the function
2504 definition of a primitive (i.e. the object in the function slot of the
2505 symbol that names the primitive) actually points to this @samp{SF}
2506 structure; when @code{Feval} encounters a subr, it looks in the
2507 structure to find out how to call the C function.
2509 Defining the C function is not enough to make a Lisp primitive
2510 available; you must also create the Lisp symbol for the primitive (the
2511 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2512 object in its function cell. (If you don't do this, the primitive won't
2513 be seen by Lisp code.) The code looks like this:
2516 DEFSUBR (@var{fname});
2520 Here @var{fname} is the same name you used as the second argument to
2523 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2524 at the end of the module. If no such function exists, create it and
2525 make sure to also declare it in @file{symsinit.h} and call it from the
2526 appropriate spot in @code{main()}. @xref{General Coding Rules}.
2528 Note that C code cannot call functions by name unless they are defined
2529 in C. The way to call a function written in Lisp from C is to use
2530 @code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since
2531 the Lisp function @code{funcall} accepts an unlimited number of
2532 arguments, in C it takes two: the number of Lisp-level arguments, and a
2533 one-dimensional array containing their values. The first Lisp-level
2534 argument is the Lisp function to call, and the rest are the arguments to
2535 pass to it. Since @code{Ffuncall} can call the evaluator, you must
2536 protect pointers from garbage collection around the call to
2537 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2538 its parameters, so you don't have to protect any pointers passed as
2541 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2542 provide handy ways to call a Lisp function conveniently with a fixed
2543 number of arguments. They work by calling @code{Ffuncall}.
2545 @file{eval.c} is a very good file to look through for examples;
2546 @file{lisp.h} contains the definitions for important macros and
2549 @node Writing Good Comments
2550 @section Writing Good Comments
2551 @cindex writing good comments
2552 @cindex comments, writing good
2554 Comments are a lifeline for programmers trying to understand tricky
2555 code. In general, the less obvious it is what you are doing, the more
2556 you need a comment, and the more detailed it needs to be. You should
2557 always be on guard when you're writing code for stuff that's tricky, and
2558 should constantly be putting yourself in someone else's shoes and asking
2559 if that person could figure out without much difficulty what's going
2560 on. (Assume they are a competent programmer who understands the
2561 essentials of how the SXEmacs code is structured but doesn't know much
2562 about the module you're working on or any algorithms you're using.) If
2563 you're not sure whether they would be able to, add a comment. Always
2564 err on the side of more comments, rather than less.
2566 Generally, when making comments, there is no need to attribute them with
2567 your name or initials. This especially goes for small,
2568 easy-to-understand, non-opinionated ones. Also, comments indicating
2569 where, when, and by whom a file was changed are @emph{strongly}
2570 discouraged, and in general will be removed as they are discovered.
2571 This is exactly what @file{ChangeLogs} are there for. However, it can
2572 occasionally be useful to mark exactly where (but not when or by whom)
2573 changes are made, particularly when making small changes to a file
2574 imported from elsewhere. These marks help when later on a newer version
2575 of the file is imported and the changes need to be merged. (If
2576 everything were always kept in CVS, there would be no need for this.
2577 But in practice, this often doesn't happen, or the CVS repository is
2578 later on lost or unavailable to the person doing the update.)
2580 When putting in an explicit opinion in a comment, you should
2581 @emph{always} attribute it with your name, and optionally the date.
2582 This also goes for long, complex comments explaining in detail the
2583 workings of something -- by putting your name there, you make it
2584 possible for someone who has questions about how that thing works to
2585 determine who wrote the comment so they can write to them. Preferably,
2586 use your actual name and not your initials, unless your initials are
2587 generally recognized (e.g. @samp{jwz}). You can use only your first
2588 name if it's obvious who you are; otherwise, give first and last name.
2589 If you're not a regular contributor, you might consider putting your
2590 email address in -- it may be in the ChangeLog, but after awhile
2591 ChangeLogs have a tendency of disappearing or getting
2592 muddled. (E.g. your comment may get copied somewhere else or even into
2593 another program, and tracking down the proper ChangeLog may be very
2596 If you come across an opinion that is not or no longer valid, or you
2597 come across any comment that no longer applies but you want to keep it
2598 around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
2599 afterwards explaining why the preceding comment is no longer valid. Put
2600 your name on this comment, as explained above.
2602 Just as comments are a lifeline to programmers, incorrect comments are
2603 death. If you come across an incorrect comment, @strong{immediately}
2604 correct it or flag it as incorrect, as described in the previous
2605 paragraph. Whenever you work on a section of code, @emph{always} make
2606 sure to update any comments to be correct -- or, at the very least, flag
2609 To indicate a "todo" or other problem, use four pound signs --
2612 @node Adding Global Lisp Variables
2613 @section Adding Global Lisp Variables
2614 @cindex global Lisp variables, adding
2615 @cindex variables, adding global Lisp
2617 Global variables whose names begin with @samp{Q} are constants whose
2618 value is a symbol of a particular name. The name of the variable should
2619 be derived from the name of the symbol using the same rules as for Lisp
2620 primitives. These variables are initialized using a call to
2621 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2622 interns a symbol, sets the C variable to the resulting Lisp object, and
2623 calls @code{staticpro()} on the C variable to tell the
2624 garbage-collection mechanism about this variable. What
2625 @code{staticpro()} does is add a pointer to the variable to a large
2626 global array; when garbage-collection happens, all pointers listed in
2627 the array are used as starting points for marking Lisp objects. This is
2628 important because it's quite possible that the only current reference to
2629 the object is the C variable. In the case of symbols, the
2630 @code{staticpro()} doesn't matter all that much because the symbol is
2631 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2632 However, it's possible that a naughty user could do something like
2633 uninterning the symbol out of @code{obarray} or even setting
2634 @code{obarray} to a different value [although this is likely to make
2637 @strong{Please note:} It is potentially deadly if you declare a
2638 @samp{Q...} variable in two different modules. The two calls to
2639 @code{defsymbol()} are no problem, but some linkers will complain about
2640 multiply-defined symbols. The most insidious aspect of this is that
2641 often the link will succeed anyway, but then the resulting executable
2642 will sometimes crash in obscure ways during certain operations!
2644 To avoid this problem, declare any symbols with common names (such as
2645 @code{text}) that are not obviously associated with this particular
2646 module in the file @file{general-slots.h}. The ``-slots'' suffix
2647 indicates that this is a file that is included multiple times in
2648 @file{general.c}. Redefinition of preprocessor macros allows the
2649 effects to be different in each context, so this is actually more
2650 convenient and less error-prone than doing it in your module.
2652 Global variables whose names begin with @samp{V} are variables that
2653 contain Lisp objects. The convention here is that all global variables
2654 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2655 (including integer and boolean variables that have Lisp
2656 equivalents). Most of the time, these variables have equivalents in
2657 Lisp, but some don't. Those that do are declared this way by a call to
2658 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2659 module. What this does is create a special @dfn{symbol-value-forward}
2660 Lisp object that contains a pointer to the C variable, intern a symbol
2661 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2662 its value to the symbol-value-forward Lisp object; it also calls
2663 @code{staticpro()} on the C variable to tell the garbage-collection
2664 mechanism about the variable. When @code{eval} (or actually
2665 @code{symbol-value}) encounters this special object in the process of
2666 retrieving a variable's value, it follows the indirection to the C
2667 variable and gets its value. @code{setq} does similar things so that
2668 the C variable gets changed.
2670 Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2671 initialize it in the @code{vars_of_*()} function; otherwise it will end
2672 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2673 this is probably not what you want. Also, if the variable is not
2674 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2675 C variable in the @code{vars_of_*()} function. Otherwise, the
2676 garbage-collection mechanism won't know that the object in this variable
2677 is in use, and will happily collect it and reuse its storage for another
2678 Lisp object, and you will be the one who's unhappy when you can't figure
2679 out how your variable got overwritten.
2681 @node Proper Use of Unsigned Types
2682 @section Proper Use of Unsigned Types
2683 @cindex unsigned types, proper use of
2684 @cindex types, proper use of unsigned
2686 Avoid using @code{unsigned int} and @code{unsigned long} whenever
2687 possible. Unsigned types are viral -- any arithmetic or comparisons
2688 involving mixed signed and unsigned types are automatically converted to
2689 unsigned, which is almost certainly not what you want. Many subtle and
2690 hard-to-find bugs are created by careless use of unsigned types. In
2691 general, you should almost @emph{never} use an unsigned type to hold a
2692 regular quantity of any sort. The only exceptions are
2696 When there's a reasonable possibility you will actually need all 32 or
2697 64 bits to store the quantity.
2699 When calling existing API's that require unsigned types. In this case,
2700 you should still do all manipulation using signed types, and do the
2701 conversion at the very threshold of the API call.
2703 In existing code that you don't want to modify because you don't
2706 In bit-field structures.
2709 Other reasonable uses of @code{unsigned int} and @code{unsigned long}
2710 are representing non-quantities -- e.g. bit-oriented flags and such.
2712 @node Coding for Mule
2713 @section Coding for Mule
2714 @cindex coding for Mule
2715 @cindex Mule, coding for
2717 Although Mule support is not compiled by default in SXEmacs, many people
2718 are using it, and we consider it crucial that new code works correctly
2719 with multibyte characters. This is not hard; it is only a matter of
2720 following several simple user-interface guidelines. Even if you never
2721 compile with Mule, with a little practice you will find it quite easy
2722 to code Mule-correctly.
2724 Note that these guidelines are not necessarily tied to the current Mule
2725 implementation; they are also a good idea to follow on the grounds of
2726 code generalization for future I18N work.
2729 * Character-Related Data Types::
2730 * Working With Character and Byte Positions::
2731 * Conversion to and from External Data::
2732 * General Guidelines for Writing Mule-Aware Code::
2733 * An Example of Mule-Aware Code::
2736 @node Character-Related Data Types
2737 @subsection Character-Related Data Types
2738 @cindex character-related data types
2739 @cindex data types, character-related
2741 First, let's review the basic character-related datatypes used by
2742 SXEmacs. Note that the separate @code{typedef}s are not mandatory in the
2743 current implementation (all of them boil down to @code{unsigned char} or
2744 @code{int}), but they improve clarity of code a great deal, because one
2745 glance at the declaration can tell the intended use of the variable.
2750 An @code{Emchar} holds a single Emacs character.
2752 Obviously, the equality between characters and bytes is lost in the Mule
2753 world. Characters can be represented by one or more bytes in the
2754 buffer, and @code{Emchar} is the C type large enough to hold any
2757 Without Mule support, an @code{Emchar} is equivalent to an
2758 @code{unsigned char}.
2762 The data representing the text in a buffer or string is logically a set
2765 SXEmacs does not work with the same character formats all the time; when
2766 reading characters from the outside, it decodes them to an internal
2767 format, and likewise encodes them when writing. @code{Bufbyte} (in fact
2768 @code{unsigned char}) is the basic unit of SXEmacs internal buffers and
2769 strings format. A @code{Bufbyte *} is the type that points at text
2770 encoded in the variable-width internal encoding.
2772 One character can correspond to one or more @code{Bufbyte}s. In the
2773 current Mule implementation, an ASCII character is represented by the
2774 same @code{Bufbyte}, and other characters are represented by a sequence
2775 of two or more @code{Bufbyte}s.
2777 Without Mule support, there are exactly 256 characters, implicitly
2778 Latin-1, and each character is represented using one @code{Bufbyte}, and
2779 there is a one-to-one correspondence between @code{Bufbyte}s and
2786 A @code{Bufpos} represents a character position in a buffer or string.
2787 A @code{Charcount} represents a number (count) of characters.
2788 Logically, subtracting two @code{Bufpos} values yields a
2789 @code{Charcount} value. Although all of these are @code{typedef}ed to
2790 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2791 it clear what sort of position is being used.
2793 @code{Bufpos} and @code{Charcount} values are the only ones that are
2794 ever visible to Lisp.
2800 A @code{Bytind} represents a byte position in a buffer or string. A
2801 @code{Bytecount} represents the distance between two positions, in bytes.
2802 The relationship between @code{Bytind} and @code{Bytecount} is the same
2803 as the relationship between @code{Bufpos} and @code{Charcount}.
2809 When dealing with the outside world, SXEmacs works with @code{Extbyte}s,
2810 which are equivalent to @code{unsigned char}. Obviously, an
2811 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
2812 and Extcounts are not all that frequent in SXEmacs code.
2815 @node Working With Character and Byte Positions
2816 @subsection Working With Character and Byte Positions
2817 @cindex character and byte positions, working with
2818 @cindex byte positions, working with character and
2819 @cindex positions, working with character and byte
2821 Now that we have defined the basic character-related types, we can look
2822 at the macros and functions designed for work with them and for
2823 conversion between them. Most of these macros are defined in
2824 @file{buffer.h}, and we don't discuss all of them here, but only the
2825 most important ones. Examining the existing code is the best way to
2829 @item MAX_EMCHAR_LEN
2830 @cindex MAX_EMCHAR_LEN
2831 This preprocessor constant is the maximum number of buffer bytes to
2832 represent an Emacs character in the variable width internal encoding.
2833 It is useful when allocating temporary strings to keep a known number of
2834 characters. For instance:
2842 /* Allocate place for @var{cclen} characters. */
2843 Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2848 If you followed the previous section, you can guess that, logically,
2849 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2850 a @code{Bytecount} value.
2852 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2853 Without Mule, it is 1.
2855 @item charptr_emchar
2856 @itemx set_charptr_emchar
2857 @cindex charptr_emchar
2858 @cindex set_charptr_emchar
2859 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2860 returns the @code{Emchar} stored at that position. If it were a
2861 function, its prototype would be:
2864 Emchar charptr_emchar (Bufbyte *p);
2867 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2868 position. It returns the number of bytes stored:
2871 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2874 It is important to note that @code{set_charptr_emchar} is safe only for
2875 appending a character at the end of a buffer, not for overwriting a
2876 character in the middle. This is because the width of characters
2877 varies, and @code{set_charptr_emchar} cannot resize the string if it
2878 writes, say, a two-byte character where a single-byte character used to
2881 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2882 example, which copies characters from buffer @var{buf} to a temporary
2889 for (pos = beg; pos < end; pos++)
2891 Emchar c = BUF_FETCH_CHAR (buf, pos);
2892 p += set_charptr_emchar (buf, c);
2898 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2899 and increment the counter, at the same time.
2905 These two macros increment and decrement a @code{Bufbyte} pointer,
2906 respectively. They will adjust the pointer by the appropriate number of
2907 bytes according to the byte length of the character stored there. Both
2908 macros assume that the memory address is located at the beginning of a
2911 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2912 simply expand to @code{p++} and @code{p--}, respectively.
2914 @item bytecount_to_charcount
2915 @cindex bytecount_to_charcount
2916 Given a pointer to a text string and a length in bytes, return the
2917 equivalent length in characters.
2920 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2923 @item charcount_to_bytecount
2924 @cindex charcount_to_bytecount
2925 Given a pointer to a text string and a length in characters, return the
2926 equivalent length in bytes.
2929 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2932 @item charptr_n_addr
2933 @cindex charptr_n_addr
2934 Return a pointer to the beginning of the character offset @var{cc} (in
2935 characters) from @var{p}.
2938 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2942 @node Conversion to and from External Data
2943 @subsection Conversion to and from External Data
2944 @cindex conversion to and from external data
2945 @cindex external data, conversion to and from
2947 When an external function, such as a C library function, returns a
2948 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2949 This is because these returned strings may contain 8bit characters which
2950 can be misinterpreted by SXEmacs, and cause a crash. Likewise, when
2951 exporting a piece of internal text to the outside world, you should
2952 always convert it to an appropriate external encoding, lest the internal
2953 stuff (such as the infamous \201 characters) leak out.
2955 The interface to conversion between the internal and external
2956 representations of text are the numerous conversion macros defined in
2957 @file{buffer.h}. There used to be a fixed set of external formats
2958 supported by these macros, but now any coding system can be used with
2959 these macros. The coding system alias mechanism is used to create the
2960 following logical coding systems, which replace the fixed external
2961 formats. The (dontusethis-set-symbol-value-handler) mechanism was
2962 enhanced to make this possible (more work on that is needed - like
2963 remove the @code{dontusethis-} prefix).
2967 This is the simplest format and is what we use in the absence of a more
2968 appropriate format. This converts according to the @code{binary} coding
2973 On input, bytes 0--255 are converted into (implicitly Latin-1)
2974 characters 0--255. A non-Mule xemacs doesn't really know about
2975 different character sets and the fonts to display them, so the bytes can
2976 be treated as text in different 1-byte encodings by simply setting the
2977 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
2978 editor if, for example, different fonts are used to display text in
2979 different buffers, faces, or windows. The specifier mechanism gives the
2980 user complete control over this kind of behavior.
2982 On output, characters 0--255 are converted into bytes 0--255 and other
2983 characters are converted into `~'.
2987 Format used for filenames. This is user-definable via either the
2988 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2989 obsolete) variables.
2992 Format used for the external Unix environment---@code{argv[]}, stuff
2993 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2994 Currently this is the same as Qfile_name. The two should be
2995 distinguished for clarity and possible future separation.
2998 Compound--text format. This is the standard X11 format used for data
2999 stored in properties, selections, and the like. This is an 8-bit
3000 no-lock-shift ISO2022 coding system. This is a real coding system,
3001 unlike Qfile_name, which is user-definable.
3004 There are two fundamental macros to convert between external and
3007 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
3008 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
3009 each of these receives are a source type, a source, a sink type, a sink,
3010 and a coding system (or a symbol naming a coding system).
3012 A typical call looks like
3014 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
3017 which means that the contents of the lisp string @code{str} are written
3018 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
3019 the function returns. The conversion will be done using the
3020 @code{file-name} coding system, which will be controlled by the user
3021 indirectly by setting or binding the variable
3022 @code{file-name-coding-system}.
3024 Some sources and sinks require two C variables to specify. We use some
3025 preprocessor magic to allow different source and sink types, and even
3026 different numbers of arguments to specify different types of sources and
3029 So we can have a call that looks like
3031 TO_INTERNAL_FORMAT (DATA, (ptr, len),
3036 The parenthesized argument pairs are required to make the preprocessor
3039 Here are the different source and sink types:
3042 @item @code{DATA, (ptr, len),}
3043 input data is a fixed buffer of size @var{len} at address @var{ptr}
3044 @item @code{ALLOCA, (ptr, len),}
3045 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
3046 @item @code{MALLOC, (ptr, len),}
3047 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
3048 @item @code{C_STRING_ALLOCA, ptr,}
3049 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
3050 @item @code{C_STRING_MALLOC, ptr,}
3051 equivalent to @code{MALLOC (ptr, len_ignored)} on output
3052 @item @code{C_STRING, ptr,}
3053 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
3054 @item @code{LISP_STRING, string,}
3055 input or output is a Lisp_Object of type string
3056 @item @code{LISP_BUFFER, buffer,}
3057 output is written to @code{(point)} in lisp buffer @var{buffer}
3058 @item @code{LISP_LSTREAM, lstream,}
3059 input or output is a Lisp_Object of type lstream
3060 @item @code{LISP_OPAQUE, object,}
3061 input or output is a Lisp_Object of type opaque
3064 Often, the data is being converted to a '\0'-byte-terminated string,
3065 which is the format required by many external system C APIs. For these
3066 purposes, a source type of @code{C_STRING} or a sink type of
3067 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
3068 Otherwise, we should try to keep SXEmacs '\0'-byte-clean, which means
3069 using (ptr, len) pairs.
3071 The sinks to be specified must be lvalues, unless they are the lisp
3072 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
3074 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
3075 resulting text is stored in a stack-allocated buffer, which is
3076 automatically freed on returning from the function. However, the sink
3077 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
3078 memory. The caller is responsible for freeing this memory using
3081 Note that it doesn't make sense for @code{LISP_STRING} to be a source
3082 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
3083 You'll get an assertion failure if you try.
3086 @node General Guidelines for Writing Mule-Aware Code
3087 @subsection General Guidelines for Writing Mule-Aware Code
3088 @cindex writing Mule-aware code, general guidelines for
3089 @cindex Mule-aware code, general guidelines for writing
3090 @cindex code, general guidelines for writing Mule-aware
3092 This section contains some general guidance on how to write Mule-aware
3093 code, as well as some pitfalls you should avoid.
3096 @item Never use @code{char} and @code{char *}.
3097 In SXEmacs, the use of @code{char} and @code{char *} is almost always a
3098 mistake. If you want to manipulate an Emacs character from ``C'', use
3099 @code{Emchar}. If you want to examine a specific octet in the internal
3100 format, use @code{Bufbyte}. If you want a Lisp-visible character, use a
3101 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move
3102 through the internal text, use @code{Bufbyte *}. Also note that you
3103 almost certainly do not need @code{Emchar *}.
3105 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
3106 The whole point of using different types is to avoid confusion about the
3107 use of certain variables. Lest this effect be nullified, you need to be
3108 careful about using the right types.
3110 @item Always convert external data
3111 It is extremely important to always convert external data, because
3112 SXEmacs can crash if unexpected 8bit sequences are copied to its internal
3115 This means that when a system function, such as @code{readdir}, returns
3116 a string, you may need to convert it using one of the conversion macros
3117 described in the previous chapter, before passing it further to Lisp.
3119 Actually, most of the basic system functions that accept '\0'-terminated
3120 string arguments, like @code{stat()} and @code{open()}, have been
3121 @strong{encapsulated} so that they are they @code{always} do internal to
3122 external conversion themselves. This means you must pass internally
3123 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
3124 these functions. This is actually a design bug, since it unexpectedly
3125 changes the semantics of the system functions. A better design would be
3126 to provide separate versions of these system functions that accepted
3127 Lisp_Objects which were lisp strings in place of their current
3128 @code{char *} arguments.
3131 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
3134 Also note that many internal functions, such as @code{make_string},
3135 accept Bufbytes, which removes the need for them to convert the data
3136 they receive. This increases efficiency because that way external data
3137 needs to be decoded only once, when it is read. After that, it is
3138 passed around in internal format.
3141 @node An Example of Mule-Aware Code
3142 @subsection An Example of Mule-Aware Code
3143 @cindex code, an example of Mule-aware
3144 @cindex Mule-aware code, an example of
3146 As an example of Mule-aware code, we will analyze the @code{string}
3147 function, which conses up a Lisp string from the character arguments it
3148 receives. Here is the definition, pasted from @code{alloc.c}:
3152 DEFUN ("string", Fstring, 0, MANY, 0, /*
3153 Concatenate all the argument characters and make the result a string.
3155 (int nargs, Lisp_Object *args))
3157 Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
3158 Bufbyte *p = storage;
3160 for (; nargs; nargs--, args++)
3162 Lisp_Object lisp_char = *args;
3163 CHECK_CHAR_COERCE_INT (lisp_char);
3164 p += set_charptr_emchar (p, XCHAR (lisp_char));
3166 return make_string (storage, p - storage);
3171 Now we can analyze the source line by line.
3173 Obviously, string will be as long as there are arguments to the
3174 function. This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
3175 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
3176 @code{Emchar}s to fit in the string.
3178 Then, the loop checks that each element is a character, converting
3179 integers in the process. Like many other functions in SXEmacs, this
3180 function silently accepts integers where characters are expected, for
3181 historical and compatibility reasons. Unless you know what you are
3182 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)}
3183 extracts the @code{Emchar} from the @code{Lisp_Object}, and
3184 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
3187 Other instructive examples of correct coding under Mule can be found all
3188 over the SXEmacs code. For starters, I recommend
3189 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
3190 understood this section of the manual and studied the examples, you can
3191 proceed writing new Mule-aware code.
3193 @node Techniques for SXEmacs Developers
3194 @section Techniques for SXEmacs Developers
3195 @cindex techniques for SXEmacs developers
3196 @cindex developers, techniques for SXEmacs
3200 To make a purified SXEmacs, do: @code{make puremacs}.
3201 To make a quantified SXEmacs, do: @code{make quantmacs}.
3203 You simply can't dump Quantified and Purified images (unless using the
3204 portable dumper). Purify gets confused when xemacs frees memory in one
3205 process that was allocated in a @emph{different} process on a different
3206 machine!. Run it like so:
3208 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
3211 @cindex error checking
3212 Before you go through the trouble, are you compiling with all
3213 debugging and error-checking off? If not, try that first. Be warned
3214 that while Quantify is directly responsible for quite a few
3215 optimizations which have been made to SXEmacs, doing a run which
3216 generates results which can be acted upon is not necessarily a trivial
3219 Also, if you're still willing to do some runs make sure you configure
3220 with the @samp{--quantify} flag. That will keep Quantify from starting
3221 to record data until after the loadup is completed and will shut off
3222 recording right before it shuts down (which generates enough bogus data
3223 to throw most results off). It also enables three additional elisp
3224 commands: @code{quantify-start-recording-data},
3225 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
3227 If you want to make SXEmacs faster, target your favorite slow benchmark,
3228 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
3229 out where the cycles are going. In many cases you can localize the
3230 problem (because a particular new feature or even a single patch
3231 elicited it). Don't hesitate to use brute force techniques like a
3232 global counter incremented at strategic places, especially in
3233 combination with other performance indications (@emph{e.g.}, degree of
3234 buffer fragmentation into extents).
3240 Make the garbage collector faster. Figure out how to write an
3241 incremental garbage collector.
3243 Write a compiler that takes bytecode and spits out C code.
3244 Unfortunately, you will then need a C compiler and a more fully
3245 developed module system.
3249 Speed up syntax highlighting. It was suggested that ``maybe moving some
3250 of the syntax highlighting capabilities into C would make a
3251 difference.'' Wrong idea, I think. When processing one 400kB file a
3252 particular low-level routine was being called 40 @emph{million} times
3253 simply for @emph{one} call to @code{newline-and-indent}. Syntax
3254 highlighting needs to be rewritten to use a reliable, fast parser, then
3255 to trust the pre-parsed structure, and only do re-highlighting locally
3256 to a text change. Modern machines are fast enough to implement such
3257 parsers in Lisp; but no machine will ever be fast enough to deal with
3258 quadratic (or worse) algorithms!
3260 Implement tail recursion in Emacs Lisp (hard!).
3263 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function
3264 calls in elisp are especially expensive. Iterating over a long list is
3265 going to be 30 times faster implemented in C than in Elisp.
3267 Heavily used small code fragments need to be fast. The traditional way
3268 to implement such code fragments in C is with macros. But macros in C
3269 are known to be broken.
3271 @cindex macro hygiene
3272 Macro arguments that are repeatedly evaluated may suffer from repeated
3273 side effects or suboptimal performance.
3275 Variable names used in macros may collide with caller's variables,
3276 causing (at least) unwanted compiler warnings.
3278 In order to solve these problems, and maintain statement semantics, one
3279 should use the @code{do @{ ... @} while (0)} trick while trying to
3280 reference macro arguments exactly once using local variables.
3282 Let's take a look at this poor macro definition:
3285 #define MARK_OBJECT(obj) \
3286 if (!marked_p (obj)) mark_object (obj), did_mark = 1
3289 This macro evaluates its argument twice, and also fails if used like this:
3291 if (flag) MARK_OBJECT (obj); else do_something();
3294 A much better definition is
3297 #define MARK_OBJECT(obj) do @{ \
3298 Lisp_Object mo_obj = (obj); \
3299 if (!marked_p (mo_obj)) \
3301 mark_object (mo_obj); \
3307 Notice the elimination of double evaluation by using the local variable
3308 with the obscure name. Writing safe and efficient macros requires great
3309 care. The one problem with macros that cannot be portably worked around
3310 is, since a C block has no value, a macro used as an expression rather
3311 than a statement cannot use the techniques just described to avoid
3312 multiple evaluation.
3314 @cindex inline functions
3315 In most cases where a macro has function semantics, an inline function
3316 is a better implementation technique. Modern compiler optimizers tend
3317 to inline functions even if they have no @code{inline} keyword, and
3318 configure magic ensures that the @code{inline} keyword can be safely
3319 used as an additional compiler hint. Inline functions used in a single
3320 .c files are easy. The function must already be defined to be
3321 @code{static}. Just add another @code{inline} keyword to the
3326 heavily_used_small_function (int arg)
3332 Inline functions in header files are trickier, because we would like to
3333 make the following optimization if the function is @emph{not} inlined
3334 (for example, because we're compiling for debugging). We would like the
3335 function to be defined externally exactly once, and each calling
3336 translation unit would create an external reference to the function,
3337 instead of including a definition of the inline function in the object
3338 code of every translation unit that uses it. This optimization is
3339 currently only available for gcc. But you don't have to worry about the
3340 trickiness; just define your inline functions in header files using this
3345 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
3347 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
3353 The declaration right before the definition is to prevent warnings when
3354 compiling with @code{gcc -Wmissing-declarations}. I consider issuing
3355 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
3357 @cindex inline functions, headers
3358 @cindex header files, inline functions
3359 Every header which contains inline functions, either directly by using
3360 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
3361 be added to @file{inline.c}'s includes to make the optimization
3362 described above work. (Optimization note: if all INLINE_HEADER
3363 functions are in fact inlined in all translation units, then the linker
3364 can just discard @code{inline.o}, since it contains only unreferenced code).
3366 To get started debugging SXEmacs, take a look at the @file{gdbinit} file
3367 in the @file{src} directory. See the section in the SXEmacs FAQ on How
3368 to Debug an SXEmacs problem with a debugger.
3370 After making source code changes, run @code{make check} to ensure that
3371 you haven't introduced any regressions. If you want to make xemacs more
3372 reliable, please improve the test suite in @file{tests/automated}.
3374 Did you make sure you didn't introduce any new compiler warnings?
3376 Before submitting a patch, please try compiling at least once with
3379 configure --enable-mule --enable-debug
3382 Here are things to know when you create a new source file:
3386 All @file{.c} files should @code{#include <config.h>} first. Almost all
3387 @file{.c} files should @code{#include "lisp.h"} second.
3390 Generated header files should be included using the @samp{#include <...>}
3391 syntax, not the @samp{#include "..."} syntax. The generated headers are:
3393 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
3395 The basic rule is that you should assume builds using @samp{--srcdir}
3396 and the @samp{#include <...>} syntax needs to be used when the
3397 to-be-included generated file is in a potentially different directory
3398 @emph{at compile time}. The non-obvious C rule is that
3399 @samp{#include "..."} means to search for the included file in the same
3400 directory as the including file, @emph{not} in the current directory.
3401 Normally this is not a problem but when building with @samp{--srcdir},
3402 @file{make} will search the @samp{VPATH} for you, while the C compiler
3403 knows nothing about it.
3406 Header files should @emph{not} include @samp{<config.h>} and
3407 @samp{"lisp.h"}. It is the responsibility of the @file{.c} files that
3412 @cindex Lisp object types, creating
3413 @cindex creating Lisp object types
3414 @cindex object types, creating Lisp
3415 Here is a checklist of things to do when creating a new lisp object type
3424 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
3426 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
3428 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
3430 add definitions of macros like @code{CHECK_@var{FOO}} and
3431 @code{@var{FOO}P} to @file{@var{foo}.h}
3433 add the new type index to @code{enum lrecord_type}
3435 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
3437 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
3441 @node Regression Testing SXEmacs, A Summary of the Various SXEmacs Modules, Rules When Writing New C Code, Top
3442 @chapter Regression Testing SXEmacs
3443 @cindex testing, regression
3445 The source directory @file{tests/automated} contains SXEmacs' automated
3446 test suite. The usual way of running all the tests is running
3447 @code{make check} from the top-level source directory.
3449 The test suite is unfinished and it's still lacking some essential
3450 features. It is nevertheless recommended that you run the tests to
3451 confirm that SXEmacs behaves correctly.
3453 If you want to run a specific test case, you can do it from the
3454 command-line like this:
3457 $ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE
3460 If something goes wrong, you can run the test suite interactively by
3461 loading @file{test-harness.el} into a running SXEmacs and typing
3462 @kbd{M-x test-emacs-test-file RET <filename> RET}. You will see a log of
3463 passed and failed tests, which should allow you to investigate the
3464 source of the error and ultimately fix the bug.
3466 Adding a new test file is trivial: just create a new file here and it
3467 will be run. There is no need to byte-compile any of the files in
3468 this directory---the test-harness will take care of any necessary
3471 Look at the existing test cases for the examples of coding test cases.
3472 It all boils down to your imagination and judicious use of the macros
3473 @code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and
3474 @code{Check-Message}.
3476 Here's a simple example checking case-sensitive and case-insensitive
3477 comparisons from @file{case-tests.el}.
3481 (insert "Test Buffer")
3482 (let ((case-fold-search t))
3483 (goto-char (point-min))
3484 (Assert (eq (search-forward "test buffer" nil t) 12))
3485 (goto-char (point-min))
3486 (Assert (eq (search-forward "Test buffer" nil t) 12))
3487 (goto-char (point-min))
3488 (Assert (eq (search-forward "Test Buffer" nil t) 12))
3490 (setq case-fold-search nil)
3491 (goto-char (point-min))
3492 (Assert (not (search-forward "test buffer" nil t)))
3493 (goto-char (point-min))
3494 (Assert (not (search-forward "Test buffer" nil t)))
3495 (goto-char (point-min))
3496 (Assert (eq (search-forward "Test Buffer" nil t) 12))))
3499 This example could be inserted in a file in @file{tests/automated}, and
3500 it would be a complete test, automatically executed when you run
3501 @kbd{make check} after building SXEmacs. More complex tests may require
3502 substantial temporary scaffolding to create the environment that elicits
3503 the bugs, but the top-level Makefile and @file{test-harness.el} handle
3504 the running and collection of results from the @code{Assert},
3505 @code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message}
3508 In general, you should avoid using functionality from packages in your
3509 tests, because you can't be sure that everyone will have the required
3510 package. However, if you've got a test that works, by all means add it.
3511 Simply wrap the test in an appropriate test, add a notice that the test
3512 was skipped, and update the @code{skipped-test-reasons} hashtable.
3513 Here's an example from @file{syntax-tests.el}:
3516 ;; Test forward-comment at buffer boundaries
3519 ;; try to use exactly what you need: featurep, boundp, fboundp
3520 (if (not (fboundp 'c-mode))
3522 ;; We should provide a standard function for this boilerplate,
3523 ;; probably called `Skip-Test' -- check for that API with C-h f
3524 (let* ((reason "c-mode unavailable")
3525 (count (gethash reason skipped-test-reasons)))
3526 (puthash reason (if (null count) 1 (1+ count))
3527 skipped-test-reasons)
3528 (Print-Skip "comment and parse-partial-sexp tests" reason))
3530 ;; and here's the test code
3532 (insert "// comment\n")
3533 (forward-comment -2)
3534 (Assert (eq (point) (point-min)))
3535 (let ((point (point)))
3536 (insert "/* comment */")
3539 (Assert (eq (point) (point-max)))
3540 (parse-partial-sexp point (point-max)))))
3543 @code{Skip-Test} is intended for use with features that are normally
3544 present in typical configurations. For truly optional features, or
3545 tests that apply to one of several alternative implementations (eg, to
3546 GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
3547 silently omit the test.
3550 @node A Summary of the Various SXEmacs Modules, Allocation of Objects in SXEmacs Lisp, Regression Testing SXEmacs, Top
3551 @chapter A Summary of the Various SXEmacs Modules
3552 @cindex modules, a summary of the various SXEmacs
3554 @c Holy crap! You have got to be kidding. Somebody, PLEASE update
3556 This is accurate as of XEmacs 20.0.
3559 * Low-Level Modules::
3560 * Basic Lisp Modules::
3561 * Modules for Standard Editing Operations::
3562 * Modules for the Basic Displayable Lisp Objects::
3563 * Modules for other Display-Related Lisp Objects::
3564 * Modules for the Redisplay Mechanism::
3565 * Modules for Interfacing with the File System::
3566 * Modules for Other Aspects of the Lisp Interpreter and Object System::
3567 * Modules for Interfacing with the Operating System::
3568 * Modules for Interfacing with X Windows::
3569 * Modules for Internationalization::
3570 * Modules for Regression Testing::
3573 @node Low-Level Modules
3574 @section Low-Level Modules
3575 @cindex low-level modules
3576 @cindex modules, low-level
3582 This is automatically generated from @file{config.h.in} based on the
3583 results of configure tests and user-selected optional features and
3584 contains preprocessor definitions specifying the nature of the
3585 environment in which SXEmacs is being compiled.
3593 This is automatically generated from @file{paths.h.in} based on supplied
3594 configure values, and allows for non-standard installed configurations
3595 of the SXEmacs directories. It's currently broken, though.
3604 @file{emacs.c} contains @code{main()} and other code that performs the most
3605 basic environment initializations and handles shutting down the SXEmacs
3606 process (this includes @code{kill-emacs}, the normal way that SXEmacs is
3607 exited; @code{dump-emacs}, which is used during the build process to
3608 write out the SXEmacs executable; @code{run-emacs-from-temacs}, which can
3609 be used to start SXEmacs directly when temacs has finished loading all
3610 the Lisp code; and emergency code to handle crashes [SXEmacs tries to
3611 auto-save all files before it crashes]).
3613 Low-level code that directly interacts with the Unix signal mechanism,
3614 however, is in @file{signal.c}. Note that this code does not handle system
3615 dependencies in interfacing to signals; that is handled using the
3616 @file{syssignal.h} header file, described in section J below.
3640 These modules contain code dumping out the SXEmacs executable on various
3641 different systems. (This process is highly machine-specific and
3642 requires intimate knowledge of the executable format and the memory map
3643 of the process.) Only one of these modules is actually used; this is
3644 chosen by @file{configure}.
3654 These modules are used in conjunction with the dump mechanism. On some
3655 systems, an alternative version of the C startup code (the actual code
3656 that receives control from the operating system when the process is
3657 started, and which calls @code{main()}) is required so that the dumping
3658 process works properly; @file{crt0.c} provides this.
3660 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
3661 very last file linked, respectively. (Actually, this is not really true.
3662 @file{lastfile.c} should be after all Emacs modules whose initialized
3663 data should be made constant, and before all other Emacs files and all
3664 libraries. In particular, the allocation modules @file{gmalloc.c},
3665 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
3666 all of the files that implement Xt widget classes @emph{must} be placed
3667 after @file{lastfile.c} because they contain various structures that
3668 must be statically initialized and into which Xt writes at various
3669 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
3670 that are used to determine the start and end of SXEmacs' initialized
3671 data space when dumping.
3686 These handle basic C allocation of memory. @file{alloca.c} is an emulation of
3687 the stack allocation function @code{alloca()} on machines that lack
3688 this. (SXEmacs makes extensive use of @code{alloca()} in its code.)
3690 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
3691 functions @code{malloc()}, @code{realloc()} and @code{free()}. They are
3692 often used in place of the standard system-provided @code{malloc()}
3693 because they usually provide a much faster implementation, at the
3694 expense of additional memory use. @file{gmalloc.c} is a newer implementation
3695 that is much more memory-efficient for large allocations than @file{malloc.c},
3696 and should always be preferred if it works. (At one point, @file{gmalloc.c}
3697 didn't work on some systems where @file{malloc.c} worked; but this should be
3700 @cindex relocating allocator
3701 @file{ralloc.c} is the @dfn{relocating allocator}. It provides
3702 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
3703 that allocate memory that can be dynamically relocated in memory. The
3704 advantage of this is that allocated memory can be shuffled around to
3705 place all the free memory at the end of the heap, and the heap can then
3706 be shrunk, releasing the memory back to the operating system. The use
3707 of this can be controlled with the configure option @code{--rel-alloc};
3708 if enabled, memory allocated for buffers will be relocatable, so that if
3709 a very large file is visited and the buffer is later killed, the memory
3710 can be released to the operating system. (The disadvantage of this
3711 mechanism is that it can be very slow. On systems with the
3712 @code{mmap()} system call, the SXEmacs version of @file{ralloc.c} uses
3713 this to move memory around without actually having to block-copy it,
3714 which can speed things up; but it can still cause noticeable performance
3717 @file{free-hook.c} contains some debugging functions for checking for invalid
3718 arguments to @code{free()}.
3720 @file{vm-limit.c} contains some functions that warn the user when memory is
3721 getting low. These are callback functions that are called by @file{gmalloc.c}
3722 and @file{malloc.c} at appropriate times.
3724 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
3725 page in virtual memory. @file{mem-limits.h} provides a uniform interface for
3726 retrieving the total amount of available virtual memory. Both are
3727 similar in spirit to the @file{sys*.h} files described in section J, below.
3737 These implement a couple of basic C data types to facilitate memory
3738 allocation. The @code{Blocktype} type efficiently manages the
3739 allocation of fixed-size blocks by minimizing the number of times that
3740 @code{malloc()} and @code{free()} are called. It allocates memory in
3741 large chunks, subdivides the chunks into blocks of the proper size, and
3742 returns the blocks as requested. When blocks are freed, they are placed
3743 onto a linked list, so they can be efficiently reused. This data type
3744 is not much used in SXEmacs currently, because it's a fairly new
3747 @cindex dynamic array
3748 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
3749 similar to a standard C array but has no fixed limit on the number of
3750 elements it can contain. Dynamic arrays can hold elements of any type,
3751 and when you add a new element, the array automatically resizes itself
3752 if it isn't big enough. Dynarrs are extensively used in the redisplay
3761 This module is used in connection with inline functions (available in
3762 some compilers). Often, inline functions need to have a corresponding
3763 non-inline function that does the same thing. This module is where they
3764 reside. It contains no actual code, but defines some special flags that
3765 cause inline functions defined in header files to be rendered as actual
3766 functions. It then includes all header files that contain any inline
3767 function definitions, so that each one gets a real function equivalent.
3776 These functions provide a system for doing internal consistency checks
3777 during code development. This system is not currently used; instead the
3778 simpler @code{assert()} macro is used along with the various checks
3779 provided by the @samp{--error-check-*} configuration options.
3787 This is not currently used.
3791 @node Basic Lisp Modules
3792 @section Basic Lisp Modules
3793 @cindex Lisp modules, basic
3794 @cindex modules, basic Lisp
3803 These are the basic header files for all SXEmacs modules. Each module
3804 includes @file{lisp.h}, which brings the other header files in.
3805 @file{lisp.h} contains the definitions of the structures and extractor
3806 and constructor macros for the basic Lisp objects and various other
3807 basic definitions for the Lisp environment, as well as some
3808 general-purpose definitions (e.g. @code{min()} and @code{max()}).
3809 @file{lisp.h} includes @file{lisp-disunion.h}. These files define the
3810 typedef of the Lisp object itself (as described above) and the low-level
3811 macros that hide the actual implementation of the Lisp object. All
3812 extractor and constructor macros for particular types of Lisp objects
3813 are defined in terms of these low-level macros.
3815 As a general rule, all typedefs should go into the typedefs section of
3816 @file{lisp.h} rather than into a module-specific header file even if the
3817 structure is defined elsewhere. This allows function prototypes that
3818 use the typedef to be placed into other header files. Forward structure
3819 declarations (i.e. a simple declaration like @code{struct foo;} where
3820 the structure itself is defined elsewhere) should be placed into the
3821 typedefs section as necessary.
3823 @file{lrecord.h} contains the basic structures and macros that implement
3824 all record-type Lisp objects---i.e. all objects whose type is a field
3825 in their C structure, which includes all objects except the few most
3828 @file{lisp.h} contains prototypes for most of the exported functions in
3829 the various modules. Lisp primitives defined using @code{DEFUN} that
3830 need to be called by C code should be declared using @code{EXFUN}.
3831 Other function prototypes should be placed either into the appropriate
3832 section of @code{lisp.h}, or into a module-specific header file,
3833 depending on how general-purpose the function is and whether it has
3834 special-purpose argument types requiring definitions not in
3835 @file{lisp.h}.) All initialization functions are prototyped in
3844 The large module @file{alloc.c} implements all of the basic allocation and
3845 garbage collection for Lisp objects. The most commonly used Lisp
3846 objects are allocated in chunks, similar to the Blocktype data type
3847 described above; others are allocated in individually @code{malloc()}ed
3848 blocks. This module provides the foundation on which all other aspects
3849 of the Lisp environment sit, and is the first module initialized at
3852 Note that @file{alloc.c} provides a series of generic functions that are
3853 not dependent on any particular object type, and interfaces to
3854 particular types of objects using a standardized interface of
3855 type-specific methods. This scheme is a fundamental principle of
3856 object-oriented programming and is heavily used throughout SXEmacs. The
3857 great advantage of this is that it allows for a clean separation of
3858 functionality into different modules---new classes of Lisp objects, new
3859 event interfaces, new device types, new stream interfaces, etc. can be
3860 added transparently without affecting code anywhere else in SXEmacs.
3861 Because the different subsystems are divided into general and specific
3862 code, adding a new subtype within a subsystem will in general not
3863 require changes to the generic subsystem code or affect any of the other
3864 subtypes in the subsystem; this provides a great deal of robustness to
3873 This module contains all of the functions to handle the flow of control.
3874 This includes the mechanisms of defining functions, calling functions,
3875 traversing stack frames, and binding variables; the control primitives
3876 and other special forms such as @code{while}, @code{if}, @code{eval},
3877 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3878 non-local exits, unwind-protects, and exception handlers; entering the
3879 debugger; methods for the subr Lisp object type; etc. It does
3880 @emph{not} include the @code{read} function, the @code{print} function,
3881 or the handling of symbols and obarrays.
3883 @file{backtrace.h} contains some structures related to stack frames and the
3892 This module implements the Lisp reader and the @code{read} function,
3893 which converts text into Lisp objects, according to the read syntax of
3894 the objects, as described above. This is similar to the parser that is
3895 a part of all compilers.
3903 This module implements the Lisp print mechanism and the @code{print}
3904 function and related functions. This is the inverse of the Lisp reader
3905 -- it converts Lisp objects to a printed, textual representation.
3906 (Hopefully something that can be read back in using @code{read} to get
3907 an equivalent object.)
3917 @file{symbols.c} implements the handling of symbols, obarrays, and
3918 retrieving the values of symbols. Much of the code is devoted to
3919 handling the special @dfn{symbol-value-magic} objects that define
3920 special types of variables---this includes buffer-local variables,
3921 variable aliases, variables that forward into C variables, etc. This
3922 module is initialized extremely early (right after @file{alloc.c}),
3923 because it is here that the basic symbols @code{t} and @code{nil} are
3924 created, and those symbols are used everywhere throughout SXEmacs.
3926 @file{symeval.h} contains the definitions of symbol structures and the
3927 @code{DEFVAR_LISP()} and related macros for declaring variables.
3937 These modules implement the methods and standard Lisp primitives for all
3938 the basic Lisp object types other than symbols (which are described
3939 above). @file{data.c} contains all the predicates (primitives that return
3940 whether an object is of a particular type); the integer arithmetic
3941 functions; and the basic accessor and mutator primitives for the various
3942 object types. @file{fns.c} contains all the standard predicates for working
3943 with sequences (where, abstractly speaking, a sequence is an ordered set
3944 of objects, and can be represented by a list, string, vector, or
3945 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3946 bulk of the operation of @code{equal} is comparing sequences.
3947 @file{floatfns.c} contains methods and primitives for floats and floating-point
3957 @file{bytecode.c} implements the byte-code interpreter and
3958 compiled-function objects, and @file{bytecode.h} contains associated
3959 structures. Note that the byte-code @emph{compiler} is written in Lisp.
3964 @node Modules for Standard Editing Operations
3965 @section Modules for Standard Editing Operations
3966 @cindex modules for standard editing operations
3967 @cindex editing operations, modules for standard
3975 @file{buffer.c} implements the @dfn{buffer} Lisp object type. This
3976 includes functions that create and destroy buffers; retrieve buffers by
3977 name or by other properties; manipulate lists of buffers (remember that
3978 buffers are permanent objects and stored in various ordered lists);
3979 retrieve or change buffer properties; etc. It also contains the
3980 definitions of all the built-in buffer-local variables (which can be
3981 viewed as buffer properties). It does @emph{not} contain code to
3982 manipulate buffer-local variables (that's in @file{symbols.c}, described
3983 above); or code to manipulate the text in a buffer.
3985 @file{buffer.h} defines the structures associated with a buffer and the various
3986 macros for retrieving text from a buffer and special buffer positions
3987 (e.g. @code{point}, the default location for text insertion). It also
3988 contains macros for working with buffer positions and converting between
3989 their representations as character offsets and as byte offsets (under
3990 MULE, they are different, because characters can be multi-byte). It is
3991 one of the largest header files.
3993 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3994 the built-in buffer-local variables. It is its own header file because
3995 it is included many times in @file{buffer.c}, as a way of iterating over all
3996 the built-in buffer-local variables.
4005 @file{insdel.c} contains low-level functions for inserting and deleting text in
4006 a buffer, keeping track of changed regions for use by redisplay, and
4007 calling any before-change and after-change functions that may have been
4008 registered for the buffer. It also contains the actual functions that
4009 convert between byte offsets and character offsets.
4011 @file{insdel.h} contains associated headers.
4019 This module implements the @dfn{marker} Lisp object type, which
4020 conceptually is a pointer to a text position in a buffer that moves
4021 around as text is inserted and deleted, so as to remain in the same
4022 relative position. This module doesn't actually move the markers around
4023 -- that's handled in @file{insdel.c}. This module just creates them and
4024 implements the primitives for working with them. As markers are simple
4025 objects, this does not entail much.
4027 Note that the standard arithmetic primitives (e.g. @code{+}) accept
4028 markers in place of integers and automatically substitute the value of
4029 @code{marker-position} for the marker, i.e. an integer describing the
4030 current buffer position of the marker.
4039 This module implements the @dfn{extent} Lisp object type, which is like
4040 a marker that works over a range of text rather than a single position.
4041 Extents are also much more complex and powerful than markers and have a
4042 more efficient (and more algorithmically complex) implementation. The
4043 implementation is described in detail in comments in @file{extents.c}.
4045 The code in @file{extents.c} works closely with @file{insdel.c} so that
4046 extents are properly moved around as text is inserted and deleted.
4047 There is also code in @file{extents.c} that provides information needed
4048 by the redisplay mechanism for efficient operation. (Remember that
4049 extents can have display properties that affect [sometimes drastically,
4050 as in the @code{invisible} property] the display of the text they
4059 @file{editfns.c} contains the standard Lisp primitives for working with
4060 a buffer's text, and calls the low-level functions in @file{insdel.c}.
4061 It also contains primitives for working with @code{point} (the default
4062 buffer insertion location).
4064 @file{editfns.c} also contains functions for retrieving various
4065 characteristics from the external environment: the current time, the
4066 process ID of the running SXEmacs process, the name of the user who ran
4067 this SXEmacs process, etc. It's not clear why this code is in
4079 These modules implement the basic @dfn{interactive} commands,
4080 i.e. user-callable functions. Commands, as opposed to other functions,
4081 have special ways of getting their parameters interactively (by querying
4082 the user), as opposed to having them passed in a normal function
4083 invocation. Many commands are not really meant to be called from other
4084 Lisp functions, because they modify global state in a way that's often
4085 undesired as part of other Lisp functions.
4087 @file{callint.c} implements the mechanism for querying the user for
4088 parameters and calling interactive commands. The bulk of this module is
4089 code that parses the interactive spec that is supplied with an
4090 interactive command.
4092 @file{cmds.c} implements the basic, most commonly used editing commands:
4093 commands to move around the current buffer and insert and delete
4094 characters. These commands are implemented using the Lisp primitives
4095 defined in @file{editfns.c}.
4097 @file{commands.h} contains associated structure definitions and prototypes.
4107 @file{search.c} implements the Lisp primitives for searching for text in
4108 a buffer, and some of the low-level algorithms for doing this. In
4109 particular, the fast fixed-string Boyer-Moore search algorithm is
4110 implemented in @file{search.c}. The low-level algorithms for doing
4111 regular-expression searching, however, are implemented in @file{regex.c}
4112 and @file{regex.h}. These two modules are largely independent of
4113 SXEmacs, and are similar to (and based upon) the regular-expression
4114 routines used in @file{grep} and other GNU utilities.
4122 @file{doprnt.c} implements formatted-string processing, similar to
4123 @code{printf()} command in C.
4131 This module implements the undo mechanism for tracking buffer changes.
4132 Most of this could be implemented in Lisp.
4136 @node Modules for the Basic Displayable Lisp Objects
4137 @section Modules for the Basic Displayable Lisp Objects
4138 @cindex modules for the basic displayable Lisp objects
4139 @cindex displayable Lisp objects, modules for the basic
4140 @cindex Lisp objects, modules for the basic displayable
4141 @cindex objects, modules for the basic displayable Lisp
4154 These modules implement the @dfn{console} Lisp object type. A console
4155 contains multiple display devices, but only one keyboard and mouse.
4156 Most of the time, a console will contain exactly one device.
4158 Consoles are the top of a lisp object inclusion hierarchy. Consoles
4159 contain devices, which contain frames, which contain windows.
4171 These modules implement the @dfn{device} Lisp object type. This
4172 abstracts a particular screen or connection on which frames are
4173 displayed. As with Lisp objects, event interfaces, and other
4174 subsystems, the device code is separated into a generic component that
4175 contains a standardized interface (in the form of a set of methods) onto
4176 particular device types.
4178 The device subsystem defines all the methods and provides method
4179 services for not only device operations but also for the frame, window,
4180 menubar, scrollbar, toolbar, and other displayable-object subsystems.
4181 The reason for this is that all of these subsystems have the same
4182 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
4194 Each device contains one or more frames in which objects (e.g. text) are
4195 displayed. A frame corresponds to a window in the window system;
4196 usually this is a top-level window but it could potentially be one of a
4197 number of overlapping child windows within a top-level window, using the
4198 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
4201 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
4202 provide the generic and device-type-specific operations on frames
4203 (e.g. raising, lowering, resizing, moving, etc.).
4212 @cindex window (in Emacs)
4214 Each frame consists of one or more non-overlapping @dfn{windows} (better
4215 known as @dfn{panes} in standard window-system terminology) in which a
4216 buffer's text can be displayed. Windows can also have scrollbars
4217 displayed around their edges.
4219 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
4220 object type and provide code to manage windows. Since windows have no
4221 associated resources in the window system (the window system knows only
4222 about the frame; no child windows or anything are used for SXEmacs
4223 windows), there is no device-type-specific code here; all of that code
4224 is part of the redisplay mechanism or the code for particular object
4225 types such as scrollbars.
4229 @node Modules for other Display-Related Lisp Objects
4230 @section Modules for other Display-Related Lisp Objects
4231 @cindex modules for other display-related Lisp objects
4232 @cindex display-related Lisp objects, modules for other
4233 @cindex Lisp objects, modules for other display-related
4303 This file provides C support for syntax highlighting---i.e.
4304 highlighting different syntactic constructs of a source file in
4305 different colors, for easy reading. The C support is provided so that
4308 As of 21.4.10, bugs introduced at the very end of the 21.2 series in the
4309 ``syntax properties'' code were fixed, and highlighting is acceptably
4310 quick again. However, presumably more improvements are possible, and
4311 the places to look are probably here, in the defun-traversing code, and
4312 in @file{syntax.c}, in the comment-traversing code.
4322 These modules decode GIF-format image files, for use with glyphs.
4323 These files were removed due to Unisys patent infringement concerns.
4327 @node Modules for the Redisplay Mechanism
4328 @section Modules for the Redisplay Mechanism
4329 @cindex modules for the redisplay mechanism
4330 @cindex redisplay mechanism, modules for the
4341 These files provide the redisplay mechanism. As with many other
4342 subsystems in SXEmacs, there is a clean separation between the general
4343 and device-specific support.
4345 @file{redisplay.c} contains the bulk of the redisplay engine. These
4346 functions update the redisplay structures (which describe how the screen
4347 is to appear) to reflect any changes made to the state of any
4348 displayable objects (buffer, frame, window, etc.) since the last time
4349 that redisplay was called. These functions are highly optimized to
4350 avoid doing more work than necessary (since redisplay is called
4351 extremely often and is potentially a huge time sink), and depend heavily
4352 on notifications from the objects themselves that changes have occurred,
4353 so that redisplay doesn't explicitly have to check each possible object.
4354 The redisplay mechanism also contains a great deal of caching to further
4355 speed things up; some of this caching is contained within the various
4356 displayable objects.
4358 @file{redisplay-output.c} goes through the redisplay structures and converts
4359 them into calls to device-specific methods to actually output the screen
4362 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
4363 of these redisplay output methods, for X frames and TTY frames,
4372 This module contains various functions and Lisp primitives for
4373 converting between buffer positions and screen positions. These
4374 functions call the redisplay mechanism to do most of the work, and then
4375 examine the redisplay structures to get the necessary information. This
4386 These files contain functions for working with the termcap (BSD-style)
4387 and terminfo (System V style) databases of terminal capabilities and
4388 escape sequences, used when SXEmacs is displaying in a TTY.
4397 These files provide some miscellaneous TTY-output functions and should
4398 probably be merged into @file{redisplay-tty.c}.
4402 @node Modules for Interfacing with the File System
4403 @section Modules for Interfacing with the File System
4404 @cindex modules for interfacing with the file system
4405 @cindex interfacing with the file system, modules for
4406 @cindex file system, modules for interfacing with the
4413 These modules implement the @dfn{stream} Lisp object type. This is an
4414 internal-only Lisp object that implements a generic buffering stream.
4415 The idea is to provide a uniform interface onto all sources and sinks of
4416 data, including file descriptors, stdio streams, chunks of memory, Lisp
4417 buffers, Lisp strings, etc. That way, I/O functions can be written to
4418 the stream interface and can transparently handle all possible sources
4419 and sinks. (For example, the @code{read} function can read data from a
4420 file, a string, a buffer, or even a function that is called repeatedly
4421 to return data, without worrying about where the data is coming from or
4422 what-size chunks it is returned in.)
4425 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
4426 streams'') to distinguish them from other kinds of streams, e.g. stdio
4427 streams and C++ I/O streams.
4429 Similar to other subsystems in SXEmacs, lstreams are separated into
4430 generic functions and a set of methods for the different types of
4431 lstreams. @file{lstream.c} provides implementations of many different
4432 types of streams; others are provided, e.g., in @file{file-coding.c}.
4440 This implements the basic primitives for interfacing with the file
4441 system. This includes primitives for reading files into buffers,
4442 writing buffers into files, checking for the presence or accessibility
4443 of files, canonicalizing file names, etc. Note that these primitives
4444 are usually not invoked directly by the user: There is a great deal of
4445 higher-level Lisp code that implements the user commands such as
4446 @code{find-file} and @code{save-buffer}. This is similar to the
4447 distinction between the lower-level primitives in @file{editfns.c} and
4448 the higher-level user commands in @file{commands.c} and
4457 This file provides functions for detecting clashes between different
4458 processes (e.g. SXEmacs and some external process, or two different
4459 SXEmacs processes) modifying the same file. (SXEmacs can optionally use
4460 the @file{lock/} subdirectory to provide a form of ``locking'' between
4461 different SXEmacs processes.) This module is also used by the low-level
4462 functions in @file{insdel.c} to ensure that, if the first modification
4463 is being made to a buffer whose corresponding file has been externally
4464 modified, the user is made aware of this so that the buffer can be
4465 synched up with the external changes if necessary.
4472 This file provides some miscellaneous functions that construct a
4473 @samp{rwxr-xr-x}-type permissions string (as might appear in an
4474 @file{ls}-style directory listing) given the information returned by the
4475 @code{stat()} system call.
4484 These files implement the SXEmacs interface to directory searching. This
4485 includes a number of primitives for determining the files in a directory
4486 and for doing filename completion. (Remember that generic completion is
4487 handled by a different mechanism, in @file{minibuf.c}.)
4489 @file{ndir.h} is a header file used for the directory-searching
4490 emulation functions provided in @file{sysdep.c} (see section J below),
4491 for systems that don't provide any directory-searching functions. (On
4492 those systems, directories can be read directly as files, and parsed.)
4500 This file provides an implementation of the @code{realpath()} function
4501 for expanding symbolic links, on systems that don't implement it or have
4502 a broken implementation.
4506 @node Modules for Other Aspects of the Lisp Interpreter and Object System
4507 @section Modules for Other Aspects of the Lisp Interpreter and Object System
4508 @cindex modules for other aspects of the Lisp interpreter and object system
4509 @cindex Lisp interpreter and object system, modules for other aspects of the
4510 @cindex interpreter and object system, modules for other aspects of the Lisp
4511 @cindex object system, modules for other aspects of the Lisp interpreter and
4520 These files provide two implementations of hash tables. Files
4521 @file{hash.c} and @file{hash.h} provide a generic C implementation of
4522 hash tables which can stand independently of SXEmacs. Files
4523 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
4524 hash tables that can store only Lisp objects, and knows about Lispy
4525 things like garbage collection, and implement the @dfn{hash-table} Lisp
4534 This module implements the @dfn{specifier} Lisp object type. This is
4535 primarily used for displayable properties, and allows for values that
4536 are specific to a particular buffer, window, frame, device, or device
4537 class, as well as a default value existing. This is used, for example,
4538 to control the height of the horizontal scrollbar or the appearance of
4539 the @code{default}, @code{bold}, or other faces. The specifier object
4540 consists of a number of specifications, each of which maps from a
4541 buffer, window, etc. to a value. The function @code{specifier-instance}
4542 looks up a value given a window (from which a buffer, frame, and device
4552 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
4553 Lisp object type, which maps from characters or certain sorts of
4554 character ranges to Lisp objects. The implementation of this object
4555 type is optimized for the internal representation of characters. Char
4556 tables come in different types, which affect the allowed object types to
4557 which a character can be mapped and also dictate certain other
4558 properties of the char table.
4561 @file{casetab.c} implements one sort of char table, the @dfn{case
4562 table}, which maps characters to other characters of possibly different
4563 case. These are used by SXEmacs to implement case-changing primitives
4564 and to do case-insensitive searching.
4574 This module implements @dfn{syntax tables}, another sort of char table
4575 that maps characters into syntax classes that define the syntax of these
4576 characters (e.g. a parenthesis belongs to a class of @samp{open}
4577 characters that have corresponding @samp{close} characters and can be
4578 nested). This module also implements the Lisp @dfn{scanner}, a set of
4579 primitives for scanning over text based on syntax tables. This is used,
4580 for example, to find the matching parenthesis in a command such as
4581 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
4584 @c #### Break this out into a separate node somewhere!
4585 Syntax codes are implemented as bitfields in an int. Bits 0-6 contain
4586 the syntax code itself, bit 7 is a special prefix flag used for Lisp,
4587 and bits 16-23 contain comment syntax flags. From the Lisp programmer's
4588 point of view, there are 11 flags: 2 styles X 2 characters X @{start,
4589 end@} flags for two-character comment delimiters, 2 style flags for
4590 one-character comment delimiters, and the prefix flag.
4592 Internally, however, the characters used in multi-character delimiters
4593 will have non-comment-character syntax classes (@emph{e.g.}, the
4594 @samp{/} in C's @samp{/*} comment-start delimiter has ``punctuation''
4595 (here meaning ``operator-like'') class in C modes). Thus in a mixed
4596 comment style, such as C++'s @samp{//} to end of line, is represented by
4597 giving @samp{/} the ``punctuation'' class and the ``style b first
4598 character of start sequence'' and ``style b second character of start
4599 sequence'' flags. The fact that class is @emph{not} punctuation allows
4600 the syntax scanner to recognize that this is a multi-character
4601 delimiter. The @samp{newline} character is given (single-character)
4602 ``comment-end'' @emph{class} and the ``style b first character of end
4603 sequence'' @emph{flag}. The ``comment-end'' class allows the scanner to
4604 determine that no second character is needed to terminate the comment.
4606 There used to be a syntax class @samp{Sextword}. A character of
4607 @samp{Sextword} class is a word-constituent but a word boundary may
4608 exist between two such characters. Ken'ichi HANDA <handa@@etl.go.jp>
4609 explains the purpose of the Sextword syntax category:
4612 Japanese words are not separated by spaces, which makes finding word
4613 boundaries very difficult. Theoretically it's impossible without
4614 using natural language processing techniques. But, by defining
4615 pseudo-words as below (much simplified for letting you understand it
4616 easily) for Japanese, we can have a convenient forward-word function
4620 A Japanese word is a sequence of characters that consists of
4621 zero or more Kanji characters followed by zero or more
4622 Hiragana characters.
4625 Then, the problem is that now we can't say that a sequence of
4626 word-constituents makes up a word. For instance, both Hiragana "A"
4627 and Kanji "KAN" are word-constituents but the sequence of these two
4628 letters can't be a single word.
4630 So, we introduced Sextword for Japanese letters.
4633 There seems to have been some controversy about this category, as it has
4634 been removed, readded, and removed again. Currently neither GNU Emacs
4635 (21.3.99) nor XEmacs (21.5.17) seems to use it.
4642 This module implements various Lisp primitives for upcasing, downcasing
4643 and capitalizing strings or regions of buffers.
4651 This module implements the @dfn{range table} Lisp object type, which
4652 provides for a mapping from ranges of integers to arbitrary Lisp
4662 This module implements the @dfn{opaque} Lisp object type, an
4663 internal-only Lisp object that encapsulates an arbitrary block of memory
4664 so that it can be managed by the Lisp allocation system. To create an
4665 opaque object, you call @code{make_opaque()}, passing a pointer to a
4666 block of memory. An object is created that is big enough to hold the
4667 memory, which is copied into the object's storage. The object will then
4668 stick around as long as you keep pointers to it, after which it will be
4669 automatically reclaimed.
4672 Opaque objects can also have an arbitrary @dfn{mark method} associated
4673 with them, in case the block of memory contains other Lisp objects that
4674 need to be marked for garbage-collection purposes. (If you need other
4675 object methods, such as a finalize method, you should just go ahead and
4676 create a new Lisp object type---it's not hard.)
4684 This function provides a few primitives for doing dynamic abbreviation
4685 expansion. In SXEmacs, most of the code for this has been moved into
4686 Lisp. Some C code remains for speed and because the primitive
4687 @code{self-insert-command} (which is executed for all self-inserting
4688 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
4689 is itself in C only for speed.)
4697 This function provides primitives for retrieving the documentation
4698 strings of functions and variables. These documentation strings contain
4699 certain special markers that get dynamically expanded (e.g. a
4700 reverse-lookup is performed on some named functions to retrieve their
4701 current key bindings). Some documentation strings (in particular, for
4702 the built-in primitives and pre-loaded Lisp functions) are stored
4703 externally in a file @file{DOC} in the @file{lib-src/} directory and
4704 need to be fetched from that file. (Part of the build stage involves
4705 building this file, and another part involves constructing an index for
4706 this file and embedding it into the executable, so that the functions in
4707 @file{doc.c} do not have to search the entire @file{DOC} file to find
4708 the appropriate documentation string.)
4716 This function provides a Lisp primitive that implements the MD5 secure
4717 hashing scheme, used to create a large hash value of a string of data such that
4718 the data cannot be derived from the hash value. This is used for
4719 various security applications on the Internet.
4724 @node Modules for Interfacing with the Operating System
4725 @section Modules for Interfacing with the Operating System
4726 @cindex modules for interfacing with the operating system
4727 @cindex interfacing with the operating system, modules for
4728 @cindex operating system, modules for interfacing with the
4736 These modules allow SXEmacs to spawn and communicate with subprocesses
4737 and network connections.
4739 @cindex synchronous subprocesses
4740 @cindex subprocesses, synchronous
4741 @file{callproc.c} implements (through the @code{call-process}
4742 primitive) what are called @dfn{synchronous subprocesses}. This means
4743 that SXEmacs runs a program, waits till it's done, and retrieves its
4744 output. A typical example might be calling the @file{ls} program to get
4745 a directory listing.
4747 @cindex asynchronous subprocesses
4748 @cindex subprocesses, asynchronous
4749 @file{process.c} and @file{process.h} implement @dfn{asynchronous
4750 subprocesses}. This means that SXEmacs starts a program and then
4751 continues normally, not waiting for the process to finish. Data can be
4752 sent to the process or retrieved from it as it's running. This is used
4753 for the @code{shell} command (which provides a front end onto a shell
4754 program such as @file{csh}), the mail and news readers implemented in
4755 SXEmacs, etc. The result of calling @code{start-process} to start a
4756 subprocess is a process object, a particular kind of object used to
4757 communicate with the subprocess. You can send data to the process by
4758 passing the process object and the data to @code{send-process}, and you
4759 can specify what happens to data retrieved from the process by setting
4760 properties of the process object. (When the process sends data, SXEmacs
4761 receives a process event, which says that there is data ready. When
4762 @code{dispatch-event} is called on this event, it reads the data from
4763 the process and does something with it, as specified by the process
4764 object's properties. Typically, this means inserting the data into a
4765 buffer or calling a function.) Another property of the process object is
4766 called the @dfn{sentinel}, which is a function that is called when the
4769 @cindex network connections
4770 Process objects are also used for network connections (connections to a
4771 process running on another machine). Network connections are started
4772 with @code{open-network-stream} but otherwise work just like
4775 @cindex network server
4776 Process objects are used for network server connections (connections
4777 to SXEmacs from processes running on the same or another
4778 machine). Network server connections are composed of a listening
4779 process started with @code{open-network-server-stream}. When a inbound
4780 connection comes, it is accepted and an individual process is created
4781 for each connection accepted, which after that will work just like a
4782 network connection started with @code{open-network-stream}. In order
4783 to gain control of the accepted connection process
4784 @code{open-network-server-stream} accepts as parameters a defun that
4785 is called upon accept for special connection setup, like setting up an
4786 SSL layer or exchanging credentials; and the specifications for @dfn{filter}
4787 and @dfn{sentinel} functions for the accepted connection.
4795 These modules implement most of the low-level, messy operating-system
4796 interface code. This includes various device control (ioctl) operations
4797 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
4798 is fairly system-dependent; thus the name of this module), and emulation
4799 of standard library functions and system calls on systems that don't
4800 provide them or have broken versions.
4816 These header files provide consistent interfaces onto system-dependent
4817 header files and system calls. The idea is that, instead of including a
4818 standard header file like @file{<sys/param.h>} (which may or may not
4819 exist on various systems) or having to worry about whether all system
4820 provide a particular preprocessor constant, or having to deal with the
4821 four different paradigms for manipulating signals, you just include the
4822 appropriate @file{sys*.h} header file, which includes all the right
4823 system header files, defines and missing preprocessor constants,
4824 provides a uniform interface onto system calls, etc.
4826 @file{sysdir.h} provides a uniform interface onto directory-querying
4827 functions. (In some cases, this is in conjunction with emulation
4828 functions in @file{sysdep.c}.)
4830 @file{sysfile.h} includes all the necessary header files for standard
4831 system calls (e.g. @code{read()}), ensures that all necessary
4832 @code{open()} and @code{stat()} preprocessor constants are defined, and
4833 possibly (usually) substitutes sugared versions of @code{read()},
4834 @code{write()}, etc. that automatically restart interrupted I/O
4837 @file{sysfloat.h} includes the necessary header files for floating-point
4840 @file{sysproc.h} includes the necessary header files for calling
4841 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4842 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4843 manipulations are available.
4845 @file{syspwd.h} includes the necessary header files for obtaining
4846 information from @file{/etc/passwd} (the functions are emulated under
4849 @file{syssignal.h} includes the necessary header files for
4850 signal-handling and provides a uniform interface onto the different
4851 signal-handling and signal-blocking paradigms.
4853 @file{systime.h} includes the necessary header files and provides
4854 uniform interfaces for retrieving the time of day, setting file
4855 access/modification times, getting the amount of time used by the SXEmacs
4858 @file{systty.h} buffers against the infinitude of different ways of
4861 @file{syswait.h} provides a uniform way of retrieving the exit status
4862 from a @code{wait()}ed-on process (some systems use a union, others use
4879 These files implement the ability to play various sounds on some types
4880 of computers. You have to configure your SXEmacs with sound support in
4881 order to get this capability.
4883 @file{sound.c} provides the generic interface. It implements various
4884 Lisp primitives and variables that let you specify which sounds should
4885 be played in certain conditions. (The conditions are identified by
4886 symbols, which are passed to @code{ding} to make a sound. Various
4887 standard functions call this function at certain times; if sound support
4888 does not exist, a simple beep results.
4890 @cindex native sound
4891 @cindex sound, native
4892 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4893 @file{linuxplay.c} interface to the machine's speaker for various
4894 different kind of machines. This is called @dfn{native} sound.
4896 @cindex sound, network
4897 @cindex network sound
4899 @file{nas.c} interfaces to a computer somewhere else on the network
4900 using the NAS (Network Audio Server) protocol, playing sounds on that
4901 machine. This allows you to run SXEmacs on a remote machine, with its
4902 display set to your local machine, and have the sounds be made on your
4903 local machine, provided that you have a NAS server running on your local
4906 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4907 additional functions for playing sound on a Sun SPARC but are not
4916 This module provides the ability to retrieve the system's current load
4917 average. (The way to do this is highly system-specific, unfortunately,
4918 and requires a lot of special-case code.)
4926 This module provides a small amount of code used internally at Sun to
4927 keep statistics on the usage of XEmacs.
4938 These files provide replacement functions and prototypes to fix numerous
4939 bugs in early releases of SunOS 4.1.
4947 This module provides some terminal-control code necessary on versions of
4952 @node Modules for Interfacing with X Windows
4953 @section Modules for Interfacing with X Windows
4954 @cindex modules for interfacing with X Windows
4955 @cindex interfacing with X Windows, modules for
4956 @cindex X Windows, modules for interfacing with
4962 A file generated from @file{Emacs.ad}, which contains SXEmacs-supplied
4963 fallback resources (so that SXEmacs has pretty defaults).
4973 These modules implement an Xt widget class that encapsulates a frame.
4974 This is for ease in integrating with Xt. The EmacsFrame widget covers
4975 the entire X window except for the menubar; the scrollbars are
4976 positioned on top of the EmacsFrame widget.
4978 @strong{Warning:} Abandon hope, all ye who enter here. This code took
4979 an ungodly amount of time to get right, and is likely to fall apart
4980 mercilessly at the slightest change. Such is life under Xt.
4990 These modules implement a simple Xt manager (i.e. composite) widget
4991 class that simply lets its children set whatever geometry they want.
4992 It's amazing that Xt doesn't provide this standardly, but on second
4993 thought, it makes sense, considering how amazingly broken Xt is.
5003 These modules implement two Xt widget classes that are subclasses of
5004 the TopLevelShell and TransientShell classes. This is necessary to deal
5005 with more brokenness that Xt has sadistically thrust onto the backs of
5015 These modules provide functions for maintenance and caching of GC's
5016 (graphics contexts) under the X Window System. This code is junky and
5017 needs to be rewritten.
5029 This module provides an interface to the X Window System's concept of
5030 @dfn{selections}, the standard way for X applications to communicate
5042 These header files are similar in spirit to the @file{sys*.h} files and buffer
5043 against different implementations of Xt and Motif.
5047 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
5049 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
5051 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
5053 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
5063 These files provide an emulation of the Xmu library for those systems
5064 (i.e. HPUX) that don't provide it as a standard part of X.
5069 ExternalClient-Xlib.c
5082 @cindex external widget
5083 These files provide the @dfn{external widget} interface, which allows an
5084 SXEmacs frame to appear as a widget in another application. To do this,
5085 you have to configure with @samp{--external-widget}.
5087 @file{ExternalShell*} provides the server (SXEmacs) side of the
5090 @file{ExternalClient*} provides the client (other application) side of
5091 the connection. These files are not compiled into SXEmacs but are
5092 compiled into libraries that are then linked into your application.
5094 @file{extw-*} is common code that is used for both the client and server.
5096 Don't touch this code; something is liable to break if you do.
5100 @node Modules for Internationalization
5101 @section Modules for Internationalization
5102 @cindex modules for internationalization
5103 @cindex internationalization, modules for
5118 These files implement the MULE (Asian-language) support. Note that MULE
5119 actually provides a general interface for all sorts of languages, not
5120 just Asian languages (although they are generally the most complicated
5121 to support). This code is still in beta.
5123 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
5124 SXEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset}
5125 Lisp object type, which encapsulates a character set (an ordered one- or
5126 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
5129 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
5130 type, which encapsulates a method of converting between different
5131 encodings. An encoding is a representation of a stream of characters,
5132 possibly from multiple character sets, using a stream of bytes or words,
5133 and defines (e.g.) which escape sequences are used to specify particular
5134 character sets, how the indices for a character are converted into bytes
5135 (sometimes this involves setting the high bit; sometimes complicated
5136 rearranging of the values takes place, as in the Shift-JIS encoding),
5139 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
5140 interpreter. CCL is similar in spirit to Lisp byte code and is used to
5141 implement converters for custom encodings.
5143 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
5144 external programs used to implement the Canna and WNN input methods,
5145 respectively. This is currently in beta.
5147 @file{mule-mcpath.c} provides some functions to allow for pathnames
5148 containing extended characters. This code is fragmentary, obsolete, and
5149 completely non-working. Instead, @code{pathname-coding-system} is used
5150 to specify conversions of names of files and directories. The standard
5151 C I/O functions like @samp{open()} are wrapped so that conversion occurs
5154 @file{mule.c} contains a few miscellaneous things. It currently seems
5155 to be unused and probably should be removed.
5163 This provides some miscellaneous internationalization code for
5164 implementing message translation and interfacing to the Ximp input
5165 method. None of this code is currently working.
5173 This contains leftover code from an earlier implementation of
5174 Asian-language support, and is not currently used.
5179 @node Modules for Regression Testing
5180 @section Modules for Regression Testing
5181 @cindex modules for regression testing
5182 @cindex regression testing, modules for
5187 byte-compiler-tests.el
5203 @file{test-harness.el} defines the macros @code{Assert},
5204 @code{Check-Error}, @code{Check-Error-Message}, and
5205 @code{Check-Message}. The other files are test files, testing various
5210 @node Allocation of Objects in SXEmacs Lisp, Dumping, A Summary of the Various SXEmacs Modules, Top
5211 @chapter Allocation of Objects in SXEmacs Lisp
5212 @cindex allocation of objects in SXEmacs Lisp
5213 @cindex objects in SXEmacs Lisp, allocation of
5214 @cindex Lisp objects, allocation of in SXEmacs
5217 * Introduction to Allocation::
5218 * Garbage Collection::
5220 * Garbage Collection - Step by Step::
5221 * Integers and Characters::
5222 * Allocation from Frob Blocks::
5224 * Low-level allocation::
5231 * Compiled Function::
5234 @node Introduction to Allocation
5235 @section Introduction to Allocation
5236 @cindex allocation, introduction to
5238 Emacs Lisp, like all Lisps, has garbage collection. This means that
5239 the programmer never has to explicitly free (destroy) an object; it
5240 happens automatically when the object becomes inaccessible. Most
5241 experts agree that garbage collection is a necessity in a modern,
5242 high-level language. Its omission from C stems from the fact that C was
5243 originally designed to be a nice abstract layer on top of assembly
5244 language, for writing kernels and basic system utilities rather than
5247 Lisp objects can be created by any of a number of Lisp primitives.
5248 Most object types have one or a small number of basic primitives
5249 for creating objects. For conses, the basic primitive is @code{cons};
5250 for vectors, the primitives are @code{make-vector} and @code{vector}; for
5251 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
5252 Some Lisp objects, especially those that are primarily used internally,
5253 have no corresponding Lisp primitives. Every Lisp object, though,
5254 has at least one C primitive for creating it.
5256 Recall from section (VII) that a Lisp object, as stored in a 32-bit or
5257 64-bit word, has a few tag bits, and a ``value'' that occupies the
5258 remainder of the bits. We can separate the different Lisp object types
5259 into three broad categories:
5263 (a) Those for whom the value directly represents the contents of the
5264 Lisp object. Only two types are in this category: integers and
5265 characters. No special allocation or garbage collection is necessary
5266 for such objects. Lisp objects of these types do not need to be
5270 In the remaining two categories, the type is stored in the object
5271 itself. The tag for all such objects is the generic @dfn{lrecord}
5272 (Lisp_Type_Record) tag. The first bytes of the object's structure are an
5273 integer (actually a char) characterising the object's type and some
5274 flags, in particular the mark bit used for garbage collection. A
5275 structure describing the type is accessible thru the
5276 lrecord_implementation_table indexed with said integer. This structure
5277 includes the method pointers and a pointer to a string naming the type.
5281 (b) Those lrecords that are allocated in frob blocks (see above). This
5282 includes the objects that are most common and relatively small, and
5283 includes conses, strings, subrs, floats, compiled functions, symbols,
5284 extents, events, and markers. With the cleanup of frob blocks done in
5285 19.12, it's not terribly hard to add more objects to this category, but
5286 it's a bit trickier than adding an object type to type (c) (esp. if the
5287 object needs a finalization method), and is not likely to save much
5288 space unless the object is small and there are many of them. (In fact,
5289 if there are very few of them, it might actually waste space.)
5291 (c) Those lrecords that are individually @code{malloc()}ed. These are
5292 called @dfn{lcrecords}. All other types are in this category. Adding a
5293 new type to this category is comparatively easy, and all types added
5294 since 19.8 (when the current allocation scheme was devised, by Richard
5295 Mlynarik), with the exception of the character type, have been in this
5299 Note that bit vectors are a bit of a special case. They are
5300 simple lrecords as in category (b), but are individually @code{malloc()}ed
5301 like vectors. You can basically view them as exactly like vectors
5302 except that their type is stored in lrecord fashion rather than
5303 in directly-tagged fashion.
5306 @node Garbage Collection
5307 @section Garbage Collection
5308 @cindex garbage collection
5310 @cindex mark and sweep
5311 Garbage collection is simple in theory but tricky to implement.
5312 Emacs Lisp uses the oldest garbage collection method, called
5313 @dfn{mark and sweep}. Garbage collection begins by starting with
5314 all accessible locations (i.e. all variables and other slots where
5315 Lisp objects might occur) and recursively traversing all objects
5316 accessible from those slots, marking each one that is found.
5317 We then go through all of memory and free each object that is
5318 not marked, and unmarking each object that is marked. Note
5319 that ``all of memory'' means all currently allocated objects.
5320 Traversing all these objects means traversing all frob blocks,
5321 all vectors (which are chained in one big list), and all
5322 lcrecords (which are likewise chained).
5324 Garbage collection can be invoked explicitly by calling
5325 @code{garbage-collect} but is also called automatically by @code{eval},
5326 once a certain amount of memory has been allocated since the last
5327 garbage collection (according to @code{gc-cons-threshold}).
5331 @section @code{GCPRO}ing
5332 @cindex @code{GCPRO}ing
5333 @cindex garbage collection protection
5334 @cindex protection, garbage collection
5336 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
5337 internals. The basic idea is that whenever garbage collection
5338 occurs, all in-use objects must be reachable somehow or
5339 other from one of the roots of accessibility. The roots
5340 of accessibility are:
5344 All objects that have been @code{staticpro()}d or
5345 @code{staticpro_nodump()}ed. This is used for any global C variables
5346 that hold Lisp objects. A call to @code{staticpro()} happens implicitly
5347 as a result of any symbols declared with @code{defsymbol()} and any
5348 variables declared with @code{DEFVAR_FOO()}. You need to explicitly
5349 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
5350 for other global C variables holding Lisp objects. (This typically
5351 includes internal lists and such things.). Use
5352 @code{staticpro_nodump()} only in the rare cases when you do not want
5353 the pointed variable to be saved at dump time but rather recompute it at
5356 Note that @code{obarray} is one of the @code{staticpro()}d things.
5357 Therefore, all functions and variables get marked through this.
5359 Any shadowed bindings that are sitting on the @code{specpdl} stack.
5361 Any objects sitting in currently active (Lisp) stack frames,
5362 catches, and condition cases.
5364 A couple of special-case places where active objects are
5367 Anything currently marked with @code{GCPRO}.
5370 Marking with @code{GCPRO} is necessary because some C functions (quite
5371 a lot, in fact), allocate objects during their operation. Quite
5372 frequently, there will be no other pointer to the object while the
5373 function is running, and if a garbage collection occurs and the object
5374 needs to be referenced again, bad things will happen. The solution is
5375 to mark those objects with @code{GCPRO}. Unfortunately this is easy to
5376 forget, and there is basically no way around this problem. Here are
5381 For every @code{GCPRO@var{n}}, there have to be declarations of
5382 @code{struct gcpro gcpro1, gcpro2}, etc.
5385 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
5386 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed. Getting
5387 either of these wrong will lead to crashes, often in completely random
5388 places unrelated to where the problem lies.
5391 The way this actually works is that all currently active @code{GCPRO}s
5392 are chained through the @code{struct gcpro} local variables, with the
5393 variable @samp{gcprolist} pointing to the head of the list and the nth
5394 local @code{gcpro} variable pointing to the first @code{gcpro} variable
5395 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an
5396 lvalue, and the @code{struct gcpro} local variable contains a pointer to
5397 this lvalue. This is why things will mess up badly if you don't pair up
5398 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
5399 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
5400 @code{Lisp_Object} variables in no-longer-active stack frames.
5403 It is actually possible for a single @code{struct gcpro} to
5404 protect a contiguous array of any number of values, rather than
5405 just a single lvalue. To effect this, call @code{GCPRO@var{n}} as usual on
5406 the first object in the array and then set @code{gcpro@var{n}.nvars}.
5409 @strong{Strings are relocated.} What this means in practice is that the
5410 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
5411 time, and you should never keep it around past any function call, or
5412 pass it as an argument to any function that might cause a garbage
5413 collection. This is why a number of functions accept either a
5414 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
5415 and only access the Lisp string's data at the very last minute. In some
5416 cases, you may end up having to @code{alloca()} some space and copy the
5417 string's data into it.
5420 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
5421 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
5422 etc. This avoids compiler warnings about shadowed locals.
5425 It is @emph{always} better to err on the side of extra @code{GCPRO}s
5426 rather than too few. The extra cycles spent on this are
5427 almost never going to make a whit of difference in the
5431 The general rule to follow is that caller, not callee, @code{GCPRO}s.
5432 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
5433 that are passed in as parameters.
5435 One exception from this rule is if you ever plan to change the parameter
5436 value, and store a new object in it. In that case, you @emph{must}
5437 @code{GCPRO} the parameter, because otherwise the new object will not be
5440 So, if you create any Lisp objects (remember, this happens in all sorts
5441 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
5442 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
5443 there's no possibility that a garbage-collection can occur while you
5444 need to use the object. Even then, consider @code{GCPRO}ing.
5447 A garbage collection can occur whenever anything calls @code{Feval}, or
5448 whenever a QUIT can occur where execution can continue past
5449 this. (Remember, this is almost anywhere.)
5452 If you have the @emph{least smidgeon of doubt} about whether
5453 you need to @code{GCPRO}, you should @code{GCPRO}.
5456 Beware of @code{GCPRO}ing something that is uninitialized. If you have
5457 any shade of doubt about this, initialize all your variables to @code{Qnil}.
5460 Be careful of traps, like calling @code{Fcons()} in the argument to
5461 another function. By the ``caller protects'' law, you should be
5462 @code{GCPRO}ing the newly-created cons, but you aren't. A certain
5463 number of functions that are commonly called on freshly created stuff
5464 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
5465 law and go ahead and @code{GCPRO} their arguments so as to simplify
5466 things, but make sure and check if it's OK whenever doing something like
5470 Once again, remember to @code{GCPRO}! Bugs resulting from insufficient
5471 @code{GCPRO}ing are intermittent and extremely difficult to track down,
5472 often showing up in crashes inside of @code{garbage-collect} or in
5473 weirdly corrupted objects or even in incorrect values in a totally
5474 different section of code.
5477 If you don't understand whether to @code{GCPRO} in a particular
5478 instance, ask on the mailing lists. A general hint is that @code{prog1}
5479 is the canonical example.
5481 @cindex garbage collection, conservative
5482 @cindex conservative garbage collection
5483 Given the extremely error-prone nature of the @code{GCPRO} scheme, and
5484 the difficulties in tracking down, it should be considered a deficiency
5485 in the SXEmacs code. A solution to this problem would involve
5486 implementing so-called @dfn{conservative} garbage collection for the C
5487 stack. That involves looking through all of stack memory and treating
5488 anything that looks like a reference to an object as a reference. This
5489 will result in a few objects not getting collected when they should, but
5490 it obviates the need for @code{GCPRO}ing, and allows garbage collection
5491 to happen at any point at all, such as during object allocation.
5493 @node Garbage Collection - Step by Step
5494 @section Garbage Collection - Step by Step
5495 @cindex garbage collection - step by step
5499 * garbage_collect_1::
5502 * sweep_lcrecords_1::
5503 * compact_string_chars::
5505 * sweep_bit_vectors_1::
5509 @subsection Invocation
5510 @cindex garbage collection, invocation
5512 The first thing that anyone should know about garbage collection is:
5513 when and how the garbage collector is invoked. One might think that this
5514 could happen every time new memory is allocated, e.g. new objects are
5515 created, but this is @emph{not} the case. Instead, we have the following
5518 The entry point of any process of garbage collection is an invocation
5519 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
5520 invocation can occur @emph{explicitly} by calling the function
5521 @code{Fgarbage_collect} (in addition this function provides information
5522 about the freed memory), or can occur @emph{implicitly} in four different
5526 In function @code{main_1} in file @code{emacs.c}. This function is called
5527 at each startup of xemacs. The garbage collection is invoked after all
5528 initial creations are completed, but only if a special internal error
5529 checking-constant @code{ERROR_CHECK_GC} is defined.
5531 In function @code{disksave_object_finalization} in file
5532 @code{alloc.c}. The only purpose of this function is to clear the
5533 objects from memory which need not be stored with xemacs when we dump out
5534 an executable. This is only done by @code{Fdump_emacs} or by
5535 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
5536 actual clearing is accomplished by making these objects unreachable and
5537 starting a garbage collection. The function is only used while building
5540 In function @code{Feval / eval} in file @code{eval.c}. Each time the
5541 well known and often used function eval is called to evaluate a form,
5542 one of the first things that could happen, is a potential call of
5543 @code{garbage_collect_1}. There exist three global variables,
5544 @code{consing_since_gc} (counts the created cons-cells since the last
5545 garbage collection), @code{gc_cons_threshold} (a specified threshold
5546 after which a garbage collection occurs) and @code{always_gc}. If
5547 @code{always_gc} is set or if the threshold is exceeded, the garbage
5548 collection will start.
5550 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
5551 function evaluates calls of elisp functions and works according to
5555 The upshot is that garbage collection can basically occur everywhere
5556 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
5557 through another function. Since calls to these two functions are hidden
5558 in various other functions, many calls to @code{garbage_collect_1} are
5559 not obviously foreseeable, and therefore unexpected. Instances where
5560 they are used that are worth remembering are various elisp commands, as
5561 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
5562 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
5563 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
5564 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
5565 for example the ones raised by every @code{QUIT}-macro triggered after
5568 @node garbage_collect_1
5569 @subsection @code{garbage_collect_1}
5570 @cindex @code{garbage_collect_1}
5572 We can now describe exactly what happens after the invocation takes
5576 There are several cases in which the garbage collector is left immediately:
5577 when we are already garbage collecting (@code{gc_in_progress}), when
5578 the garbage collection is somehow forbidden
5579 (@code{gc_currently_forbidden}), when we are currently displaying something
5580 (@code{in_display}) or when we are preparing for the armageddon of the
5581 whole system (@code{preparing_for_armageddon}).
5583 Next the correct frame in which to put
5584 all the output occurring during garbage collecting is determined. In
5585 order to be able to restore the old display's state after displaying the
5586 message, some data about the current cursor position has to be
5587 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
5590 The state of @code{gc_currently_forbidden} must be restored after
5591 the garbage collection, no matter what happens during the process. We
5592 accomplish this by @code{record_unwind_protect}ing the suitable function
5593 @code{restore_gc_inhibit} together with the current value of
5594 @code{gc_currently_forbidden}.
5596 If we are concurrently running an interactive xemacs session, the next step
5597 is simply to show the garbage collector's cursor/message.
5599 The following steps are the intrinsic steps of the garbage collector,
5600 therefore @code{gc_in_progress} is set.
5602 For debugging purposes, it is possible to copy the current C stack
5603 frame. However, this seems to be a currently unused feature.
5605 Before actually starting to go over all live objects, references to
5606 objects that are no longer used are pruned. We only have to do this for events
5607 (@code{clear_event_resource}) and for specifiers
5608 (@code{cleanup_specifiers}).
5610 Now the mark phase begins and marks all accessible elements. In order to
5612 all slots that serve as roots of accessibility, the function
5613 @code{mark_object} is called for each root individually to go out from
5614 there to mark all reachable objects. All roots that are traversed are
5615 shown in their processed order:
5618 all constant symbols and static variables that are registered via
5619 @code{staticpro}@ in the dynarr @code{staticpros}.
5620 @xref{Adding Global Lisp Variables}.
5622 all Lisp objects that are created in C functions and that must be
5623 protected from freeing them. They are registered in the global
5624 list @code{gcprolist}.
5627 all local variables (i.e. their name fields @code{symbol} and old
5628 values @code{old_values}) that are bound during the evaluation by the Lisp
5629 engine. They are stored in @code{specbinding} structs pushed on a stack
5630 called @code{specpdl}.
5631 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
5633 all catch blocks that the Lisp engine encounters during the evaluation
5634 cause the creation of structs @code{catchtag} inserted in the list
5635 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
5636 are freshly created objects and therefore have to be marked.
5637 @xref{Catch and Throw}.
5639 every function application pushes new structs @code{backtrace}
5640 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
5641 parts that have to be marked are the fields for each function
5642 (@code{function}) and all their arguments (@code{args}).
5645 all objects that are used by the redisplay engine that must not be freed
5646 are marked by a special function called @code{mark_redisplay} (in
5647 @code{redisplay.c}).
5649 all objects created for profiling purposes are allocated by C functions
5650 instead of using the lisp allocation mechanisms. In order to receive the
5651 right ones during the sweep phase, they also have to be marked
5652 manually. That is done by the function @code{mark_profiling_info}
5655 Hash tables in SXEmacs belong to a kind of special objects that
5656 make use of a concept often called 'weak pointers'.
5657 To make a long story short, these kind of pointers are not followed
5658 during the estimation of the live objects during garbage collection.
5659 Any object referenced only by weak pointers is collected
5660 anyway, and the reference to it is cleared. In hash tables there are
5661 different usage patterns of them, manifesting in different types of hash
5662 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
5663 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
5664 clearing entries depending on different conditions. More information can
5665 be found in the documentation to the function @code{make-hash-table}.
5667 Because there are complicated dependency rules about when and what to
5668 mark while processing weak hash tables, the standard @code{marker}
5669 method is only active if it is marking non-weak hash tables. As soon as
5670 a weak component is in the table, the hash table entries are ignored
5671 while marking. Instead their marking is done each separately by the
5672 function @code{finish_marking_weak_hash_tables}. This function iterates
5673 over each hash table entry @code{hentries} for each weak hash table in
5674 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
5675 appropriate action is performed.
5676 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
5677 everything reachable from the @code{value} component is marked. If it is
5678 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
5679 already marked, the marking starts beginning only from the
5680 @code{key} component.
5681 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
5682 of the key entry is already marked, we mark both the @code{key} and
5683 @code{value} components.
5684 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
5685 and the car of the value components is already marked, again both the
5686 @code{key} and the @code{value} components get marked.
5688 Again, there are lists with comparable properties called weak
5689 lists. There exist different peculiarities of their types called
5690 @code{simple}, @code{assoc}, @code{key-assoc} and
5691 @code{value-assoc}. You can find further details about them in the
5692 description to the function @code{make-weak-list}. The scheme of their
5693 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
5694 therefore we iterate over them. The marking is advanced until we hit an
5695 already marked pair. Then we know that during a former run all
5696 the rest has been marked completely. Again, depending on the special
5697 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
5698 and the elem is marked, we mark the @code{cons} part. If it is a
5699 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
5700 cdr, we mark the @code{cons} and the @code{elem}. If it is a
5701 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
5702 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
5703 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
5704 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
5706 Since, by marking objects in reach from weak hash tables and weak lists,
5707 other objects could get marked, this perhaps implies further marking of
5708 other weak objects, both finishing functions are redone as long as
5709 yet unmarked objects get freshly marked.
5712 After completing the special marking for the weak hash tables and for the weak
5713 lists, all entries that point to objects that are going to be swept in
5714 the further process are useless, and therefore have to be removed from
5715 the table or the list.
5717 The function @code{prune_weak_hash_tables} does the job for weak hash
5718 tables. Totally unmarked hash tables are removed from the list
5719 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
5720 by scanning over all entries and removing one as soon as one of
5721 the components @code{key} and @code{value} is unmarked.
5723 The same idea applies to the weak lists. It is accomplished by
5724 @code{prune_weak_lists}: An unmarked list is pruned from
5725 @code{Vall_weak_lists} immediately. A marked list is treated more
5726 carefully by going over it and removing just the unmarked pairs.
5729 The function @code{prune_specifiers} checks all listed specifiers held
5730 in @code{Vall_specifiers} and removes the ones from the lists that are
5734 All syntax tables are stored in a list called
5735 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5736 through it and unlinks the tables that are unmarked.
5739 Next, we will attack the complete sweeping - the function
5740 @code{gc_sweep} which holds the predominance.
5742 First, all the variables with respect to garbage collection are
5743 reset. @code{consing_since_gc} - the counter of the created cells since
5744 the last garbage collection - is set back to 0, and
5745 @code{gc_in_progress} is not @code{true} anymore.
5747 In case the session is interactive, the displayed cursor and message are
5750 The state of @code{gc_inhibit} is restored to the former value by
5751 unwinding the stack.
5753 A small memory reserve is always held back that can be reached by
5754 @code{breathing_space}. If nothing more is left, we create a new reserve
5759 @subsection @code{mark_object}
5760 @cindex @code{mark_object}
5762 The first thing that is checked while marking an object is whether the
5763 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5764 or a character. Integers and characters are the only two types that are
5765 stored directly - without another level of indirection, and therefore they
5766 don't have to be marked and collected.
5767 @xref{How Lisp Objects Are Represented in C}.
5769 The second case is the one we have to handle. It is the one when we are
5770 dealing with a pointer to a Lisp object. But, there exist also three
5771 possibilities, that prevent us from doing anything while marking: The
5772 object is read only which prevents it from being garbage collected,
5773 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5774 already marked, and need not be marked for the second time (checked by
5775 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5776 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5777 sit in some const space, and can therefore not be marked, see
5778 @code{this_one_is_unmarkable} in @code{alloc.c}).
5780 Now, the actual marking is feasible. We do so by once using the macro
5781 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5782 special flag in the lrecord header), and calling its special marker
5783 "method" @code{marker} if available. The marker method marks every
5784 other object that is in reach from our current object. Note, that these
5785 marker methods should not call @code{mark_object} recursively, but
5786 instead should return the next object from where further marking has to
5789 In case another object was returned, as mentioned before, we reiterate
5790 the whole @code{mark_object} process beginning with this next object.
5793 @subsection @code{gc_sweep}
5794 @cindex @code{gc_sweep}
5796 The job of this function is to free all unmarked records from memory. As
5797 we know, there are different types of objects implemented and managed, and
5798 consequently different ways to free them from memory.
5799 @xref{Introduction to Allocation}.
5801 We start with all objects stored through @code{lcrecords}. All
5802 bulkier objects are allocated and handled using that scheme of
5803 @code{lcrecords}. Each object is @code{malloc}ed separately
5804 instead of placing it in one of the contiguous frob blocks. All types
5805 that are currently stored
5806 using @code{lcrecords}'s @code{alloc_lcrecord} and
5807 @code{make_lcrecord_list} are the types: vectors, buffers,
5808 char-table, char-table-entry, console, weak-list, database, device,
5809 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5810 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5811 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5812 process, range-table, specifier, symbol-value-buffer-local,
5813 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5814 window, and window-configuration. We
5815 take care of them in the fist place
5816 in order to be able to handle and to finalize items stored in them more
5817 easily. The function @code{sweep_lcrecords_1} as described below is
5818 doing the whole job for us.
5819 For a description about the internals: @xref{lrecords}.
5821 Our next candidates are the other objects that behave quite differently
5822 than everything else: the strings. They consists of two parts, a
5823 fixed-size portion (@code{struct Lisp_String}) holding the string's
5824 length, its property list and a pointer to the second part, and the
5825 actual string data, which is stored in string-chars blocks comparable to
5826 frob blocks. In this block, the data is not only freed, but also a
5827 compression of holes is made, i.e. all strings are relocated together.
5828 @xref{String}. This compacting phase is performed by the function
5829 @code{compact_string_chars}, the actual sweeping by the function
5830 @code{sweep_strings} is described below.
5832 After that, the other types are swept step by step using functions
5833 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5834 @code{sweep_compiled_functions}, @code{sweep_floats},
5835 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5836 @code{sweep_extents}. They are the fixed-size types cons, floats,
5837 compiled-functions, symbol, marker, extent, and event stored in
5838 so-called "frob blocks", and therefore we can basically do the same on
5839 every type objects, using the same macros, especially defined only to
5840 handle everything with respect to fixed-size blocks. The only fixed-size
5841 type that is not handled here are the fixed-size portion of strings,
5842 because we took special care of them earlier.
5844 The only big exceptions are bit vectors stored differently and
5845 therefore treated differently by the function @code{sweep_bit_vectors_1}
5848 At first, we need some brief information about how
5849 these fixed-size types are managed in general, in order to understand
5850 how the sweeping is done. They have all a fixed size, and are therefore
5851 stored in big blocks of memory - allocated at once - that can hold a
5852 certain amount of objects of one type. The macro
5853 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5854 every type. More precisely, we have the block struct
5855 (holding a pointer to the previous block @code{prev} and the
5856 objects in @code{block[]}), a pointer to current block
5857 (@code{current_..._block)}) and its last index
5858 (@code{current_..._block_index}), and a pointer to the free list that
5859 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5860 related macros exists that are used to obtain a new object, either from
5861 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5862 of that type stored or by allocating a completely new block using
5863 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5865 The rest works as follows: all of them define a
5866 macro @code{UNMARK_...} that is used to unmark the object. They define a
5867 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5868 to be done when converting an object from in use to not in use (so far,
5869 only markers use it in order to unchain them). Then, they all call
5870 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5871 and their struct name.
5873 This call in particular does the following: we go over all blocks
5874 starting with the current moving towards the oldest.
5875 For each block, we look at every object in it. If the object already
5876 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5877 object), or if it is
5878 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5879 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5880 is put in the free list and set free (using the macro
5881 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5882 (by @code{UNMARK_...}). While going through one block, we note if the
5883 whole block is empty. If so, the whole block is freed (using
5884 @code{xfree}) and the free list state is set to the state it had before
5885 handling this block.
5887 @node sweep_lcrecords_1
5888 @subsection @code{sweep_lcrecords_1}
5889 @cindex @code{sweep_lcrecords_1}
5891 After nullifying the complete lcrecord statistics, we go over all
5892 lcrecords two separate times. They are all chained together in a list with
5893 a head called @code{all_lcrecords}.
5895 The first loop calls for each object its @code{finalizer} method, but only
5896 in the case that it is not read only
5897 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5898 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5899 freed objects, field @code{free}) and finally it owns a finalizer
5902 The second loop actually frees the appropriate objects again by iterating
5903 through the whole list. In case an object is read only or marked, it
5904 has to persist, otherwise it is manually freed by calling
5905 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5906 date by calling @code{tick_lcrecord_stats} with the right arguments,
5908 @node compact_string_chars
5909 @subsection @code{compact_string_chars}
5910 @cindex @code{compact_string_chars}
5912 The purpose of this function is to compact all the data parts of the
5913 strings that are held in so-called @code{string_chars_block}, i.e. the
5914 strings that do not exceed a certain maximal length.
5916 The procedure with which this is done is as follows. We are keeping two
5917 positions in the @code{string_chars_block}s using two pointer/integer
5918 pairs, namely @code{from_sb}/@code{from_pos} and
5919 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5920 where to where, to copy the actually handled string.
5922 While going over all chained @code{string_char_block}s and their held
5923 strings, staring at @code{first_string_chars_block}, both pointers
5924 are advanced and eventually a string is copied from @code{from_sb} to
5925 @code{to_sb}, depending on the status of the pointed at strings.
5927 More precisely, we can distinguish between the following actions.
5930 The string at @code{from_sb}'s position could be marked as free, which
5931 is indicated by an invalid pointer to the pointer that should point back
5932 to the fixed size string object, and which is checked by
5933 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5934 is advanced to the next string, and nothing has to be copied.
5936 Also, if a string object itself is unmarked, nothing has to be
5937 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5938 pair as described above.
5940 In all other cases, we have a marked string at hand. The string data
5941 must be moved from the from-position to the to-position. In case
5942 there is not enough space in the actual @code{to_sb}-block, we advance
5943 this pointer to the beginning of the next block before copying. In case the
5944 from and to positions are different, we perform the
5945 actual copying using the library function @code{memmove}.
5948 After compacting, the pointer to the current
5949 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5950 is reset on the last block to which we moved a string,
5951 i.e. @code{to_block}, and all remaining blocks (we know that they just
5952 carry garbage) are explicitly @code{xfree}d.
5955 @subsection @code{sweep_strings}
5956 @cindex @code{sweep_strings}
5958 The sweeping for the fixed sized string objects is essentially exactly
5959 the same as it is for all other fixed size types. As before, the freeing
5960 into the suitable free list is done by using the macro
5961 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5962 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5963 definitions are a little bit special compared to the ones used
5964 for the other fixed size types.
5966 @code{UNMARK_string} is defined the same way except some additional code
5967 used for updating the bookkeeping information.
5969 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5970 addition: in case, the string was not allocated in a
5971 @code{string_chars_block} because it exceeded the maximal length, and
5972 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5975 @node sweep_bit_vectors_1
5976 @subsection @code{sweep_bit_vectors_1}
5977 @cindex @code{sweep_bit_vectors_1}
5979 Bit vectors are also one of the rare types that are @code{malloc}ed
5980 individually. Consequently, while sweeping, all further needless
5981 bit vectors must be freed by hand. This is done, as one might imagine,
5982 the expected way: since they are all registered in a list called
5983 @code{all_bit_vectors}, all elements of that list are traversed,
5984 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5985 them become unmarked.
5986 In addition, the bookkeeping information used for garbage
5987 collector's output purposes is updated.
5989 @node Integers and Characters
5990 @section Integers and Characters
5991 @cindex integers and characters
5992 @cindex characters, integers and
5994 Integer and character Lisp objects are created from integers using the
5995 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5996 functions @code{make_int()} and @code{make_char()}. (These are actually
5997 macros on most systems.) These functions basically just do some moving
5998 of bits around, since the integral value of the object is stored
5999 directly in the @code{Lisp_Object}.
6001 @code{XSETINT()} and the like will truncate values given to them that
6002 are too big; i.e. you won't get the value you expected but the tag bits
6003 will at least be correct.
6005 @node Allocation from Frob Blocks
6006 @section Allocation from Frob Blocks
6007 @cindex allocation from frob blocks
6008 @cindex frob blocks, allocation from
6010 The uninitialized memory required by a @code{Lisp_Object} of a particular type
6012 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the
6013 lowest-level object-creating functions in @file{alloc.c}:
6014 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
6015 @code{Fmake_symbol()}, @code{allocate_extent()},
6016 @code{allocate_event()}, @code{Fmake_marker()}, and
6017 @code{make_uninit_string()}. The idea is that, for each type, there are
6018 a number of frob blocks (each 2K in size); each frob block is divided up
6019 into object-sized chunks. Each frob block will have some of these
6020 chunks that are currently assigned to objects, and perhaps some that are
6021 free. (If a frob block has nothing but free chunks, it is freed at the
6022 end of the garbage collection cycle.) The free chunks are stored in a
6023 free list, which is chained by storing a pointer in the first four bytes
6024 of the chunk. (Except for the free chunks at the end of the last frob
6025 block, which are handled using an index which points past the end of the
6026 last-allocated chunk in the last frob block.)
6027 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
6028 free list; if that fails, it calls
6029 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
6030 last frob block for space, and creates a new frob block if there is
6031 none. (There are actually two versions of these macros, one of which is
6032 more defensive but less efficient and is used for error-checking.)
6038 [see @file{lrecord.h}]
6040 All lrecords have at the beginning of their structure a @code{struct
6041 lrecord_header}. This just contains a type number and some flags,
6042 including the mark bit. All builtin type numbers are defined as
6043 constants in @code{enum lrecord_type}, to allow the compiler to generate
6044 more efficient code for @code{@var{type}P}. The type number, thru the
6045 @code{lrecord_implementation_table}, gives access to a @code{struct
6046 lrecord_implementation}, which is a structure containing method pointers
6047 and such. There is one of these for each type, and it is a global,
6048 constant, statically-declared structure that is declared in the
6049 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
6051 Simple lrecords (of type (b) above) just have a @code{struct
6052 lrecord_header} at their beginning. lcrecords, however, actually have a
6053 @code{struct lcrecord_header}. This, in turn, has a @code{struct
6054 lrecord_header} at its beginning, so sanity is preserved; but it also
6055 has a pointer used to chain all lcrecords together, and a special ID
6056 field used to distinguish one lcrecord from another. (This field is used
6057 only for debugging and could be removed, but the space gain is not
6060 Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
6061 like for other frob blocks. The only change is that the implementation
6062 pointer must be initialized correctly. (The implementation structure for
6063 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
6064 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
6066 lcrecords are created using @code{alloc_lcrecord()}. This takes a
6067 size to allocate and an implementation pointer. (The size needs to be
6068 passed because some lcrecords, such as window configurations, are of
6069 variable size.) This basically just @code{malloc()}s the storage,
6070 initializes the @code{struct lcrecord_header}, and chains the lcrecord
6071 onto the head of the list of all lcrecords, which is stored in the
6072 variable @code{all_lcrecords}. The calls to @code{alloc_lcrecord()}
6073 generally occur in the lowest-level allocation function for each lrecord
6076 Whenever you create an lrecord, you need to call either
6077 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
6078 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be
6079 specified in a @file{.c} file, at the top level. What this actually
6080 does is define and initialize the implementation structure for the
6081 lrecord. (And possibly declares a function @code{error_check_foo()} that
6082 implements the @code{XFOO()} macro when error-checking is enabled.) The
6083 arguments to the macros are the actual type name (this is used to
6084 construct the C variable name of the lrecord implementation structure
6085 and related structures using the @samp{##} macro concatenation
6086 operator), a string that names the type on the Lisp level (this may not
6087 be the same as the C type name; typically, the C type name has
6088 underscores, while the Lisp string has dashes), various method pointers,
6089 and the name of the C structure that contains the object. The methods
6090 are used to encapsulate type-specific information about the object, such
6091 as how to print it or mark it for garbage collection, so that it's easy
6092 to add new object types without having to add a specific case for each
6093 new type in a bunch of different places.
6095 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
6096 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
6097 used for fixed-size object types and the latter is for variable-size
6098 object types. Most object types are fixed-size; some complex
6099 types, however (e.g. window configurations), are variable-size.
6100 Variable-size object types have an extra method, which is called
6101 to determine the actual size of a particular object of that type.
6102 (Currently this is only used for keeping allocation statistics.)
6104 For the purpose of keeping allocation statistics, the allocation
6105 engine keeps a list of all the different types that exist. Note that,
6106 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
6107 specified at top-level, there is no way for it to initialize the global
6108 data structures containing type information, like
6109 @code{lrecord_implementations_table}. For this reason a call to
6110 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
6111 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
6112 top level, to one of the init functions, typically
6113 @code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be
6114 called before an object of this type is used.
6116 The type number is also used to index into an array holding the number
6117 of objects of each type and the total memory allocated for objects of
6118 that type. The statistics in this array are computed during the sweep
6119 stage. These statistics are returned by the call to
6120 @code{garbage-collect}.
6122 Note that for every type defined with a @code{DEFINE_LRECORD_*()}
6123 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
6124 somewhere in a @file{.h} file, and this @file{.h} file needs to be
6125 included by @file{inline.c}.
6127 Furthermore, there should generally be a set of @code{XFOOBAR()},
6128 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
6129 file. To create one of these, copy an existing model and modify as
6132 @strong{Please note:} If you define an lrecord in an external
6133 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
6134 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
6135 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
6136 non-EXTERNAL forms. These macros will dynamically add new type numbers
6137 to the global enum that records them, whereas the non-EXTERNAL forms
6138 assume that the programmer has already inserted the correct type numbers
6139 into the enum's code at compile-time.
6141 The various methods in the lrecord implementation structure are:
6146 A @dfn{mark} method. This is called during the marking stage and passed
6147 a function pointer (usually the @code{mark_object()} function), which is
6148 used to mark an object. All Lisp objects that are contained within the
6149 object need to be marked by applying this function to them. The mark
6150 method should also return a Lisp object, which should be either @code{nil} or
6151 an object to mark. (This can be used in lieu of calling
6152 @code{mark_object()} on the object, to reduce the recursion depth, and
6153 consequently should be the most heavily nested sub-object, such as a
6156 @strong{Please note:} When the mark method is called, garbage collection
6157 is in progress, and special precautions need to be taken when accessing
6158 objects; see section (B) above.
6160 If your mark method does not need to do anything, it can be
6164 A @dfn{print} method. This is called to create a printed representation
6165 of the object, whenever @code{princ}, @code{prin1}, or the like is
6166 called. It is passed the object, a stream to which the output is to be
6167 directed, and an @code{escapeflag} which indicates whether the object's
6168 printed representation should be @dfn{escaped} so that it is
6169 readable. (This corresponds to the difference between @code{princ} and
6170 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
6171 quotes around them and confusing characters in the strings such as
6172 quotes, backslashes, and newlines will be backslashed; and that special
6173 care will be taken to make symbols print in a readable fashion
6174 (e.g. symbols that look like numbers will be backslashed). Other
6175 readable objects should perhaps pass @code{escapeflag} on when
6176 sub-objects are printed, so that readability is preserved when necessary
6177 (or if not, always pass in a 1 for @code{escapeflag}). Non-readable
6178 objects should in general ignore @code{escapeflag}, except that some use
6179 it as an indication that more verbose output should be given.
6181 Sub-objects are printed using @code{print_internal()}, which takes
6182 exactly the same arguments as are passed to the print method.
6184 Literal C strings should be printed using @code{write_c_string()},
6185 or @code{write_string_1()} for non-null-terminated strings.
6187 Functions that do not have a readable representation should check the
6188 @code{print_readably} flag and signal an error if it is set.
6190 If you specify NULL for the print method, the
6191 @code{default_object_printer()} will be used.
6194 A @dfn{finalize} method. This is called at the beginning of the sweep
6195 stage on lcrecords that are about to be freed, and should be used to
6196 perform any extra object cleanup. This typically involves freeing any
6197 extra @code{malloc()}ed memory associated with the object, releasing any
6198 operating-system and window-system resources associated with the object
6199 (e.g. pixmaps, fonts), etc.
6201 The finalize method can be NULL if nothing needs to be done.
6203 WARNING #1: The finalize method is also called at the end of the dump
6204 phase; this time with the for_disksave parameter set to non-zero. The
6205 object is @emph{not} about to disappear, so you have to make sure to
6206 @emph{not} free any extra @code{malloc()}ed memory if you're going to
6207 need it later. (Also, signal an error if there are any operating-system
6208 and window-system resources here, because they can't be dumped.)
6210 Finalize methods should, as a rule, set to zero any pointers after
6211 they've been freed, and check to make sure pointers are not zero before
6212 freeing. Although I'm pretty sure that finalize methods are not called
6213 twice on the same object (except for the @code{for_disksave} proviso),
6214 we've gotten nastily burned in some cases by not doing this.
6216 WARNING #2: The finalize method is @emph{only} called for
6217 lcrecords, @emph{not} for simply lrecords. If you need a
6218 finalize method for simple lrecords, you have to stick
6219 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
6221 WARNING #3: Things are in an @emph{extremely} bizarre state
6222 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
6223 be incredibly careful when writing one of these functions.
6224 See the comment in @code{gc_sweep()}. If you ever have to add
6225 one of these, consider using an lcrecord or dealing with
6226 the problem in a different fashion.
6229 An @dfn{equal} method. This compares the two objects for similarity,
6230 when @code{equal} is called. It should compare the contents of the
6231 objects in some reasonable fashion. It is passed the two objects and a
6232 @dfn{depth} value, which is used to catch circular objects. To compare
6233 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
6234 by one. If this value gets too high, a @code{circular-object} error
6237 If this is NULL, objects are @code{equal} only when they are @code{eq},
6241 A @dfn{hash} method. This is used to hash objects when they are to be
6242 compared with @code{equal}. The rule here is that if two objects are
6243 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
6244 function should use some subset of the sub-fields of the object that are
6245 compared in the ``equal'' method. If you specify this method as
6246 @code{NULL}, the object's pointer will be used as the hash, which will
6247 @emph{fail} if the object has an @code{equal} method, so don't do this.
6249 To hash a sub-Lisp-object, call @code{internal_hash()}. Bump the
6250 depth by one, just like in the ``equal'' method.
6252 To convert a Lisp object directly into a hash value (using
6253 its pointer), use @code{LISP_HASH()}. This is what happens when
6254 the hash method is NULL.
6256 To hash two or more values together into a single value, use
6257 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
6260 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
6261 These are used for object types that have properties. I don't feel like
6262 documenting them here. If you create one of these objects, you have to
6263 use different macros to define them,
6264 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
6265 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
6268 A @dfn{size_in_bytes} method, when the object is of variable-size.
6269 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
6270 simply return the object's size in bytes, exactly as you might expect.
6271 For an example, see the methods for window configurations and opaques.
6274 @node Low-level allocation
6275 @section Low-level allocation
6276 @cindex low-level allocation
6277 @cindex allocation, low-level
6279 Memory that you want to allocate directly should be allocated using
6280 @code{xmalloc()} rather than @code{malloc()}. This implements
6281 error-checking on the return value, and once upon a time did some more
6282 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
6283 Free using @code{xfree()}, and realloc using @code{xrealloc()}. Note
6284 that @code{xmalloc()} will do a non-local exit if the memory can't be
6285 allocated. (Many functions, however, do not expect this, and thus SXEmacs
6286 will likely crash if this happens. @strong{This is a bug.} If you can,
6287 you should strive to make your function handle this OK. However, it's
6288 difficult in the general circumstance, perhaps requiring extra
6289 unwind-protects and such.)
6291 Note that SXEmacs provides two separate replacements for the standard
6292 @code{malloc()} library function. These are called @dfn{old GNU malloc}
6293 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
6294 respectively. New GNU malloc is better in pretty much every way than
6295 old GNU malloc, and should be used if possible. (It used to be that on
6296 some systems, the old one worked but the new one didn't. I think this
6297 was due specifically to a bug in SunOS, which the new one now works
6298 around; so I don't think the old one ever has to be used any more.) The
6299 primary difference between both of these mallocs and the standard system
6300 malloc is that they are much faster, at the expense of increased space.
6301 The basic idea is that memory is allocated in fixed chunks of powers of
6302 two. This allows for basically constant malloc time, since the various
6303 chunks can just be kept on a number of free lists. (The standard system
6304 malloc typically allocates arbitrary-sized chunks and has to spend some
6305 time, sometimes a significant amount of time, walking the heap looking
6306 for a free block to use and cleaning things up.) The new GNU malloc
6307 improves on things by allocating large objects in chunks of 4096 bytes
6308 rather than in ever larger powers of two, which results in ever larger
6309 wastage. There is a slight speed loss here, but it's of doubtful
6312 NOTE: Apparently there is a third-generation GNU malloc that is
6313 significantly better than the new GNU malloc, and should probably
6314 be included in SXEmacs.
6316 There is also the relocating allocator, @file{ralloc.c}. This actually
6317 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
6318 and virtual memory released back to the system. On some systems,
6319 this is a big win. On all systems, it causes a noticeable (and
6320 sometimes huge) speed penalty, so I turn it off by default.
6321 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
6322 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
6323 rather than block copies to move data around. This purports to
6324 be faster, although that depends on the amount of data that would
6325 have had to be block copied and the system-call overhead for
6326 @code{mmap()}. I don't know exactly how this works, except that the
6327 relocating-allocation routines are pretty much used only for
6328 the memory allocated for a buffer, which is the biggest consumer
6329 of space, esp. of space that may get freed later.
6331 Note that the GNU mallocs have some ``memory warning'' facilities.
6332 SXEmacs taps into them and issues a warning through the standard
6333 warning system, when memory gets to 75%, 85%, and 95% full.
6334 (On some systems, the memory warnings are not functional.)
6336 Allocated memory that is going to be used to make a Lisp object
6337 is created using @code{allocate_lisp_storage()}. This just calls
6338 @code{xmalloc()}. It used to verify that the pointer to the memory can
6339 fit into a Lisp word, before the current Lisp object representation was
6340 introduced. @code{allocate_lisp_storage()} is called by
6341 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
6342 and bit-vector creation routines. These routines also call
6343 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
6344 statistics on how much memory is allocated, so that garbage-collection
6345 can be invoked when the threshold is reached.
6351 Conses are allocated in standard frob blocks. The only thing to
6352 note is that conses can be explicitly freed using @code{free_cons()}
6353 and associated functions @code{free_list()} and @code{free_alist()}. This
6354 immediately puts the conses onto the cons free list, and decrements
6355 the statistics on memory allocation appropriately. This is used
6356 to good effect by some extremely commonly-used code, to avoid
6357 generating extra objects and thereby triggering GC sooner.
6358 However, you have to be @emph{extremely} careful when doing this.
6359 If you mess this up, you will get BADLY BURNED, and it has happened
6366 As mentioned above, each vector is @code{malloc()}ed individually, and
6367 all are threaded through the variable @code{all_vectors}. Vectors are
6368 marked strangely during garbage collection, by kludging the size field.
6369 Note that the @code{struct Lisp_Vector} is declared with its
6370 @code{contents} field being a @emph{stretchy} array of one element. It
6371 is actually @code{malloc()}ed with the right size, however, and access
6372 to any element through the @code{contents} array works fine.
6379 Bit vectors work exactly like vectors, except for more complicated
6380 code to access an individual bit, and except for the fact that bit
6381 vectors are lrecords while vectors are not. (The only difference here is
6382 that there's an lrecord implementation pointer at the beginning and the
6383 tag field in bit vector Lisp words is ``lrecord'' rather than
6390 Symbols are also allocated in frob blocks. Symbols in the awful
6391 horrible obarray structure are chained through their @code{next} field.
6393 Remember that @code{intern} looks up a symbol in an obarray, creating
6400 Markers are allocated in frob blocks, as usual. They are kept
6401 in a buffer unordered, but in a doubly-linked list so that they
6402 can easily be removed. (Formerly this was a singly-linked list,
6403 but in some cases garbage collection took an extraordinarily
6404 long time due to the O(N^2) time required to remove lots of
6405 markers from a buffer.) Markers are removed from a buffer in
6406 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
6412 As mentioned above, strings are a special case. A string is logically
6413 two parts, a fixed-size object (containing the length, property list,
6414 and a pointer to the actual data), and the actual data in the string.
6415 The fixed-size object is a @code{struct Lisp_String} and is allocated in
6416 frob blocks, as usual. The actual data is stored in special
6417 @dfn{string-chars blocks}, which are 8K blocks of memory.
6418 Currently-allocated strings are simply laid end to end in these
6419 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
6420 stored before each string in the string-chars block. When a new string
6421 needs to be allocated, the remaining space at the end of the last
6422 string-chars block is used if there's enough, and a new string-chars
6423 block is created otherwise.
6425 There are never any holes in the string-chars blocks due to the string
6426 compaction and relocation that happens at the end of garbage collection.
6427 During the sweep stage of garbage collection, when objects are
6428 reclaimed, the garbage collector goes through all string-chars blocks,
6429 looking for unused strings. Each chunk of string data is preceded by a
6430 pointer to the corresponding @code{struct Lisp_String}, which indicates
6431 both whether the string is used and how big the string is, i.e. how to
6432 get to the next chunk of string data. Holes are compressed by
6433 block-copying the next string into the empty space and relocating the
6434 pointer stored in the corresponding @code{struct Lisp_String}.
6435 @strong{This means you have to be careful with strings in your code.}
6436 See the section above on @code{GCPRO}ing.
6438 Note that there is one situation not handled: a string that is too big
6439 to fit into a string-chars block. Such strings, called @dfn{big
6440 strings}, are all @code{malloc()}ed as their own block. (#### Although it
6441 would make more sense for the threshold for big strings to be somewhat
6442 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
6443 this was indeed the case formerly---indeed, the threshold was set at
6444 1/8---but Mly forgot about this when rewriting things for 19.8.)
6446 Note also that the string data in string-chars blocks is padded as
6447 necessary so that proper alignment constraints on the @code{struct
6448 Lisp_String} back pointers are maintained.
6450 Finally, strings can be resized. This happens in Mule when a
6451 character is substituted with a different-length character, or during
6452 modeline frobbing. (You could also export this to Lisp, but it's not
6453 done so currently.) Resizing a string is a potentially tricky process.
6454 If the change is small enough that the padding can absorb it, nothing
6455 other than a simple memory move needs to be done. Keep in mind,
6456 however, that the string can't shrink too much because the offset to the
6457 next string in the string-chars block is computed by looking at the
6458 length and rounding to the nearest multiple of four or eight. If the
6459 string would shrink or expand beyond the correct padding, new string
6460 data needs to be allocated at the end of the last string-chars block and
6461 the data moved appropriately. This leaves some dead string data, which
6462 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
6463 Lisp_String} pointer before the data (there's no real @code{struct
6464 Lisp_String} to point to and relocate), and storing the size of the dead
6465 string data (which would normally be obtained from the now-non-existent
6466 @code{struct Lisp_String}) at the beginning of the dead string data gap.
6467 The string compactor recognizes this special 0xFFFFFFFF marker and
6468 handles it correctly.
6470 @node Compiled Function
6471 @section Compiled Function
6472 @cindex compiled function
6473 @cindex function, compiled
6478 @node Dumping, Events and the Event Loop, Allocation of Objects in SXEmacs Lisp, Top
6482 @section What is dumping and its justification
6483 @cindex dumping and its justification, what is
6485 The C code of SXEmacs is just a Lisp engine with a lot of built-in
6486 primitives useful for writing an editor. The editor itself is written
6487 mostly in Lisp, and represents around 100K lines of code. Loading and
6488 executing the initialization of all this code takes a bit a time (five
6489 to ten times the usual startup time of current xemacs) and requires
6490 having all the lisp source files around. Having to reload them each
6491 time the editor is started would not be acceptable.
6493 The traditional solution to this problem is called dumping: the build
6494 process first creates the lisp engine under the name @file{temacs}, then
6495 runs it until it has finished loading and initializing all the lisp
6496 code, and eventually creates a new executable called @file{xemacs}
6497 including both the object code in @file{temacs} and all the contents of
6498 the memory after the initialization.
6500 This solution, while working, has a huge problem: the creation of the
6501 new executable from the actual contents of memory is an extremely
6502 system-specific process, quite error-prone, and which interferes with a
6503 lot of system libraries (like malloc). It is even getting worse
6504 nowadays with libraries using constructors which are automatically
6505 called when the program is started (even before main()) which tend to
6506 crash when they are called multiple times, once before dumping and once
6507 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
6508 dependencies which have this problem). Writing the dumper is also one
6509 of the most difficult parts of porting SXEmacs to a new operating system.
6510 Basically, `dumping' is an operation that is just not officially
6511 supported on many operating systems.
6513 The aim of the portable dumper is to solve the same problem as the
6514 system-specific dumper, that is to be able to reload quickly, using only
6515 a small number of files, the fully initialized lisp part of the editor,
6516 without any system-specific hacks.
6520 * Data descriptions::
6523 * Remaining issues::
6528 @cindex dumping overview
6530 The portable dumping system has to:
6534 At dump time, write all initialized, non-quickly-rebuildable data to a
6535 file [Note: currently named @file{xemacs.dmp}, but the name will
6536 change], along with all informations needed for the reloading.
6539 When starting xemacs, reload the dump file, relocate it to its new
6540 starting address if needed, and reinitialize all pointers to this
6541 data. Also, rebuild all the quickly rebuildable data.
6544 @node Data descriptions
6545 @section Data descriptions
6546 @cindex dumping data descriptions
6548 The more complex task of the dumper is to be able to write lisp objects
6549 (lrecords) and C structs to disk and reload them at a different address,
6550 updating all the pointers they include in the process. This is done by
6551 using external data descriptions that give information about the layout
6552 of the structures in memory.
6554 The specification of these descriptions is in lrecord.h. A description
6555 of an lrecord is an array of struct lrecord_description. Each of these
6556 structs include a type, an offset in the structure and some optional
6557 parameters depending on the type. For instance, here is the string
6561 static const struct lrecord_description string_description[] = @{
6562 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @},
6563 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
6564 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @},
6569 The first line indicates a member of type Bytecount, which is used by
6570 the next, indirect directive. The second means "there is a pointer to
6571 some opaque data in the field @code{data}". The length of said data is
6572 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
6573 in the 0th line of the description (welcome to C) plus one". The third
6574 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
6575 structure". @code{XD_END} then ends the description.
6577 This gives us all the information we need to move around what is pointed
6578 to by a structure (C or lrecord) and, by transitivity, everything that
6579 it points to. The only missing information for dumping is the size of
6580 the structure. For lrecords, this is part of the
6581 lrecord_implementation, so we don't need to duplicate it. For C
6582 structures we use a struct struct_description, which includes a size
6583 field and a pointer to an associated array of lrecord_description.
6586 @section Dumping phase
6587 @cindex dumping phase
6589 Dumping is done by calling the function pdump() (in dumper.c) which is
6590 invoked from Fdump_emacs (in emacs.c). This function performs a number
6594 * Object inventory::
6595 * Address allocation::
6598 * Pointers dumping::
6601 @node Object inventory
6602 @subsection Object inventory
6603 @cindex dumping object inventory
6605 The first task is to build the list of the objects to dump. This
6613 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
6614 of C structs are kept together) which includes a pointer to the first
6615 object of the group, the per-object size and the count of objects in the
6616 group, along with some other information which is initialized later.
6618 These entries are linked together in @code{pdump_entry_list} structures
6619 and can be enumerated thru either:
6623 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
6624 per lrecord type, indexed by type number.
6627 the @code{pdump_opaque_data_list}, used for the opaque data which does
6628 not include pointers, and hence does not need descriptions.
6631 the @code{pdump_struct_table}, which is a vector of
6632 @code{struct_description}/@code{pdump_entry_list} pairs, used for
6633 non-opaque C structures.
6636 This uses a marking strategy similar to the garbage collector. Some
6641 We do not use the mark bit (which does not exist for C structures
6642 anyway); we use a big hash table instead.
6645 We do not use the mark function of lrecords but instead rely on the
6646 external descriptions. This happens essentially because we need to
6647 follow pointers to C structures and opaque data in addition to
6648 Lisp_Object members.
6651 This is done by @code{pdump_register_object()}, which handles Lisp_Object
6652 variables, and @code{pdump_register_struct()} which handles C structures,
6653 which both delegate the description management to @code{pdump_register_sub()}.
6655 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
6656 allows us to look up a pdump_entry_list_elmt with the object it points
6657 to). Entries are added with @code{pdump_add_entry()} and looked up with
6658 @code{pdump_get_entry()}. There is no need for entry removal. The hash
6659 value is computed quite simply from the object pointer by
6660 @code{pdump_make_hash()}.
6662 The roots for the marking are:
6666 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
6667 call for protected variables we do not want to dump).
6670 the variables registered via @code{dump_add_root_object}
6671 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
6672 @code{dump_add_root_object()}).
6675 the variables registered via @code{dump_add_root_struct_ptr}, each of
6676 which points to a C structure.
6679 This does not include the GCPRO'ed variables, the specbinds, the
6680 catchtags, the backlist, the redisplay or the profiling info, since we
6681 do not want to rebuild the actual chain of lisp calls which end up to
6682 the dump-emacs call, only the global variables.
6684 Weak lists and weak hash tables are dumped as if they were their
6685 non-weak equivalent (without changing their type, of course). This has
6686 not yet been a problem.
6688 @node Address allocation
6689 @subsection Address allocation
6690 @cindex dumping address allocation
6693 The next step is to allocate the offsets of each of the objects in the
6694 final dump file. This is done by @code{pdump_allocate_offset()} which
6695 is called indirectly by @code{pdump_scan_by_alignment()}.
6697 The strategy to deal with alignment problems uses these facts:
6701 real world alignment requirements are powers of two.
6704 the C compiler is required to adjust the size of a struct so that you
6705 can have an array of them next to each other. This means you can have an
6706 upper bound of the alignment requirements of a given structure by
6707 looking at which power of two its size is a multiple.
6710 the non-variant part of variable size lrecords has an alignment
6714 Hence, for each lrecord type, C struct type or opaque data block the
6715 alignment requirement is computed as a power of two, with a minimum of
6716 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the
6717 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
6718 first. This ensures the best packing.
6720 The maximum alignment requirement we take into account is 2^8.
6722 @code{pdump_allocate_offset()} only has to do a linear allocation,
6723 starting at offset 256 (this leaves room for the header and keeps the
6727 @subsection The header
6728 @cindex dumping, the header
6730 The next step creates the file and writes a header with a signature and
6731 some random information in it. The @code{reloc_address} field, which
6732 indicates at which address the file should be loaded if we want to avoid
6733 post-reload relocation, is set to 0. It then seeks to offset 256 (base
6734 offset for the objects).
6737 @subsection Data dumping
6738 @cindex data dumping
6739 @cindex dumping, data
6741 The data is dumped in the same order as the addresses were allocated by
6742 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
6743 This function copies the data to a temporary buffer, relocates all
6744 pointers in the object to the addresses allocated in step Address
6745 Allocation, and writes it to the file. Using the same order means that,
6746 if we are careful with lrecords whose size is not a multiple of 4, we
6747 are ensured that the object is always written at the offset in the file
6748 allocated in step Address Allocation.
6750 @node Pointers dumping
6751 @subsection Pointers dumping
6752 @cindex pointers dumping
6753 @cindex dumping, pointers
6755 A bunch of tables needed to reassign properly the global pointers are
6756 then written. They are:
6760 the pdump_root_struct_ptrs dynarr
6762 the pdump_opaques dynarr
6764 a vector of all the offsets to the objects in the file that include a
6765 description (for faster relocation at reload time)
6767 the pdump_root_objects and pdump_weak_object_chains dynarrs.
6770 For each of the dynarrs we write both the pointer to the variables and
6771 the relocated offset of the object they point to. Since these variables
6772 are global, the pointers are still valid when restarting the program and
6773 are used to regenerate the global pointers.
6775 The @code{pdump_weak_object_chains} dynarr is a special case. The
6776 variables it points to are the head of weak linked lists of lisp objects
6777 of the same type. Not all objects of this list are dumped so the
6778 relocated pointer we associate with them points to the first dumped
6779 object of the list, or Qnil if none is available. This is also the
6780 reason why they are not used as roots for the purpose of object
6783 Some very important information like the @code{staticpros} and
6784 @code{lrecord_implementations_table} are handled indirectly using
6785 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}.
6787 This is the end of the dumping part.
6789 @node Reloading phase
6790 @section Reloading phase
6791 @cindex reloading phase
6792 @cindex dumping, reloading phase
6794 @subsection File loading
6795 @cindex dumping, file loading
6797 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6798 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6799 malloc is done and the file is loaded.
6801 Some variables are reinitialized from the values found in the header.
6803 The difference between the actual loading address and the reloc_address
6804 is computed and will be used for all the relocations.
6807 @subsection Putting back the pdump_opaques
6808 @cindex dumping, putting back the pdump_opaques
6810 The memory contents are restored in the obvious and trivial way.
6813 @subsection Putting back the pdump_root_struct_ptrs
6814 @cindex dumping, putting back the pdump_root_struct_ptrs
6816 The variables pointed to by pdump_root_struct_ptrs in the dump phase are
6817 reset to the right relocated object addresses.
6820 @subsection Object relocation
6821 @cindex dumping, object relocation
6823 All the objects are relocated using their description and their offset
6824 by @code{pdump_reloc_one}. This step is unnecessary if the
6825 reloc_address is equal to the file loading address.
6828 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains
6829 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
6831 Same as Putting back the pdump_root_struct_ptrs.
6834 @subsection Reorganize the hash tables
6835 @cindex dumping, reorganize the hash tables
6837 Since some of the hash values in the lisp hash tables are
6838 address-dependent, their layout is now wrong. So we go through each of
6839 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6841 @node Remaining issues
6842 @section Remaining issues
6843 @cindex dumping, remaining issues
6845 The build process will have to start a post-dump xemacs, ask it the
6846 loading address (which will, hopefully, be always the same between
6847 different xemacs invocations) and relocate the file to the new address.
6848 This way the object relocation phase will not have to be done, which
6849 means no writes in the objects and that, because of the use of mmap, the
6850 dumped data will be shared between all the xemacs running on the
6853 Some executable signature will be necessary to ensure that a given dump
6854 file is really associated with a given executable, or random crashes
6855 will occur. Maybe a random number set at compile or configure time thru
6856 a define. This will also allow for having differently-compiled xemacsen
6857 on the same system (mule and no-mule comes to mind).
6859 The DOC file contents should probably end up in the dump file.
6862 @node Events and the Event Loop, Asynchronous Events; Quit Checking, Dumping, Top
6863 @chapter Events and the Event Loop
6864 @cindex events and the event loop
6865 @cindex event loop, events and the
6868 * Introduction to Events::
6870 * Specifics of the Event Gathering Mechanism::
6871 * Specifics About the Emacs Event::
6873 * Event Stream Callback Routines::
6874 * Other Event Loop Functions::
6876 * Converting Events::
6877 * Dispatching Events; The Command Builder::
6879 * Editor-Level Control Flow Modules::
6882 @node Introduction to Events
6883 @section Introduction to Events
6884 @cindex events, introduction to
6886 An event is an object that encapsulates information about an
6887 interesting occurrence in the operating system. Events are
6888 generated either by user action, direct (e.g. typing on the
6889 keyboard or moving the mouse) or indirect (moving another
6890 window, thereby generating an expose event on an Emacs frame),
6891 or as a result of some other typically asynchronous action happening,
6892 such as output from a subprocess being ready or a timer expiring.
6893 Events come into the system in an asynchronous fashion (typically
6894 through a callback being called) and are converted into a
6895 synchronous event queue (first-in, first-out) in a process that
6896 we will call @dfn{collection}.
6898 Note that each application has its own event queue. (It is
6899 immaterial whether the collection process directly puts the
6900 events in the proper application's queue, or puts them into
6901 a single system queue, which is later split up.)
6903 The most basic level of event collection is done by the
6904 operating system or window system. Typically, SXEmacs does
6905 its own event collection as well. Often there are multiple
6906 layers of collection in SXEmacs, with events from various
6907 sources being collected into a queue, which is then combined
6908 with other sources to go into another queue (i.e. a second
6909 level of collection), with perhaps another level on top of
6912 SXEmacs has its own types of events (called @dfn{Emacs events}),
6913 which provides an abstract layer on top of the system-dependent
6914 nature of the most basic events that are received. Part of the
6915 complex nature of the SXEmacs event collection process involves
6916 converting from the operating-system events into the proper
6917 Emacs events---there may not be a one-to-one correspondence.
6919 Emacs events are documented in @file{events.h}; I'll discuss them
6925 @cindex events, main loop
6927 The @dfn{command loop} is the top-level loop that the editor is always
6928 running. It loops endlessly, calling @code{next-event} to retrieve an
6929 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6930 the appropriate thing with non-user events (process, timeout,
6931 magic, eval, mouse motion); this involves calling a Lisp handler
6932 function, redrawing a newly-exposed part of a frame, reading
6933 subprocess output, etc. For user events, @code{dispatch-event}
6934 looks up the event in relevant keymaps or menubars; when a
6935 full key sequence or menubar selection is reached, the appropriate
6936 function is executed. @code{dispatch-event} may have to keep state
6937 across calls; this is done in the ``command-builder'' structure
6938 associated with each console (remember, there's usually only
6939 one console), and the engine that looks up keystrokes and
6940 constructs full key sequences is called the @dfn{command builder}.
6941 This is documented elsewhere.
6943 The guts of the command loop are in @code{command_loop_1()}. This
6944 function doesn't catch errors, though---that's the job of
6945 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6946 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never
6947 returns, but may get thrown out of.
6949 When an error occurs, @code{cmd_error()} is called, which usually
6950 invokes the Lisp error handler in @code{command-error}; however, a
6951 default error handler is provided if @code{command-error} is @code{nil}
6952 (e.g. during startup). The purpose of the error handler is simply to
6953 display the error message and do associated cleanup; it does not need to
6954 throw anywhere. When the error handler finishes, the condition-case in
6955 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6956 reinvoke @code{command_loop_1()}.
6958 @code{command_loop_2()} is invoked from three places: from
6959 @code{initial_command_loop()} (called from @code{main()} at the end of
6960 internal initialization), from the Lisp function @code{recursive-edit},
6961 and from @code{call_command_loop()}.
6963 @code{call_command_loop()} is called when a macro is started and when
6964 the minibuffer is entered; normal termination of the macro or minibuffer
6965 causes a throw out of the recursive command loop. (To
6966 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6967 Note also that the low-level minibuffer-entering function,
6968 @code{read-minibuffer-internal}, provides its own error handling and
6969 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6970 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6972 Note that both read-minibuffer-internal and recursive-edit set up a
6973 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6974 throws to this catch, exits out of either one.
6976 @code{initial_command_loop()}, called from @code{main()}, sets up a
6977 catch for @code{top-level} when invoking @code{command_loop_2()},
6978 allowing functions to throw all the way to the top level if they really
6979 need to. Before invoking @code{command_loop_2()},
6980 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6981 all of the startup stuff (creating the initial frame, handling the
6982 command-line options, loading the user's @file{.emacs} file, etc.). The
6983 function that actually does this is in Lisp and is pointed to by the
6984 variable @code{top-level}; normally this function is
6985 @code{normal-top-level}. @code{top_level_1()} is just an error-handling
6986 wrapper similar to @code{command_loop_2()}. Note also that
6987 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6988 invoking @code{top_level_1()}, just like when it invokes
6989 @code{command_loop_2()}.
6991 @node Specifics of the Event Gathering Mechanism
6992 @section Specifics of the Event Gathering Mechanism
6993 @cindex event gathering mechanism, specifics of the
6995 Here is an approximate diagram of the collection processes
6996 at work in SXEmacs, under TTY's (TTY's are simpler than X
6997 so we'll look at this first):
7001 asynch. asynch. asynch. asynch. [Collectors in
7002 kbd events kbd events process process the OS]
7005 | | | | SIGINT, [signal handlers
7006 | | | | SIGQUIT, in SXEmacs]
7008 file file file file SIGALRM
7009 desc. desc. desc. desc. |
7010 (TTY) (TTY) (pipe) (pipe) |
7011 | | | | fake timeouts
7019 ------>-----------<----------------<----------------
7022 | [collected using select() in emacs_tty_next_event()
7023 | and converted to the appropriate Emacs event]
7026 V (above this line is TTY-specific)
7027 Emacs -----------------------------------------------
7028 event (below this line is the generic event mechanism)
7031 was there if not, call
7032 a SIGINT? emacs_tty_next_event()
7039 | [collected in event_stream_next_event();
7040 | SIGINT is converted using maybe_read_quit_event()]
7045 \---->------>----- maybe_kbd_translate() ---->---\
7049 command event queue |
7051 (contains events that were event queue, call
7052 read earlier but not processed, event_stream_next_event()
7053 typically when waiting in a |
7054 sit-for, sleep-for, etc. for |
7055 a particular event to be received) |
7059 ---->------------------------------------<----
7062 | next_event_internal()]
7064 unread- unread- event from |
7065 command- command- keyboard else, call
7066 events event macro next_event_internal()
7071 --------->----------------------<------------
7073 | [collected in `next-event', which may loop
7074 | more than once if the event it gets is on
7075 | a dead frame, device, etc.]
7079 feed into top-level event loop,
7080 which repeatedly calls `next-event'
7081 and then dispatches the event
7082 using `dispatch-event'
7085 Notice the separation between TTY-specific and generic event mechanism.
7086 When using the Xt-based event loop, the TTY-specific stuff is replaced
7087 but the rest stays the same.
7089 It's also important to realize that only one different kind of
7090 system-specific event loop can be operating at a time, and must be able
7091 to receive all kinds of events simultaneously. For the two existing
7092 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
7093 respectively), the TTY event loop @emph{only} handles TTY consoles,
7094 while the Xt event loop handles @emph{both} TTY and X consoles. This
7095 situation is different from all of the output handlers, where you simply
7096 have one per console type.
7098 Here's the Xt Event Loop Diagram (notice that below a certain point,
7099 it's the same as the above diagram):
7102 asynch. asynch. asynch. asynch. [Collectors in
7103 kbd kbd process process the OS]
7104 events events output output
7106 | | | | asynch. asynch. [Collectors in the
7107 | | | | X X OS and X Window System]
7108 | | | | events events
7111 | | | | | | SIGINT, [signal handlers
7112 | | | | | | SIGQUIT, in SXEmacs]
7113 | | | | | | SIGWINCH,
7117 | | | | | | | timeouts
7122 file file file file file file file |
7123 desc. desc. desc. desc. desc. desc. desc. |
7124 (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
7129 --->----------------------------------------<---------<------
7131 | | |[collected using select() in
7132 | | | _XtWaitForSomething(), called
7133 | | | from XtAppProcessEvent(), called
7134 | | | in emacs_Xt_next_event();
7135 | | | dispatched to various callbacks]
7138 emacs_Xt_ p_s_callback(), | [popup_selection_callback]
7139 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
7140 | x_u_h_s_callback(),| callback]
7141 | search_callback() | [x_update_horizontal_scrollbar_
7145 enqueue_Xt_ signal_special_ |
7146 dispatch_event() Xt_user_event() |
7151 | dispatch_event() |
7158 dispatch Xt_what_callback()
7165 ---->-----------<--------
7168 | [collected and converted as appropriate in
7169 | emacs_Xt_next_event()]
7172 V (above this line is Xt-specific)
7173 Emacs ------------------------------------------------
7174 event (below this line is the generic event mechanism)
7177 was there if not, call
7178 a SIGINT? emacs_Xt_next_event()
7185 | [collected in event_stream_next_event();
7186 | SIGINT is converted using maybe_read_quit_event()]
7191 \---->------>----- maybe_kbd_translate() -->-----\
7195 command event queue |
7197 (contains events that were event queue, call
7198 read earlier but not processed, event_stream_next_event()
7199 typically when waiting in a |
7200 sit-for, sleep-for, etc. for |
7201 a particular event to be received) |
7205 ---->----------------------------------<------
7208 | next_event_internal()]
7210 unread- unread- event from |
7211 command- command- keyboard else, call
7212 events event macro next_event_internal()
7217 --------->----------------------<------------
7219 | [collected in `next-event', which may loop
7220 | more than once if the event it gets is on
7221 | a dead frame, device, etc.]
7225 feed into top-level event loop,
7226 which repeatedly calls `next-event'
7227 and then dispatches the event
7228 using `dispatch-event'
7231 @node Specifics About the Emacs Event
7232 @section Specifics About the Emacs Event
7233 @cindex event, specifics about the Lisp object
7236 @section Event Queues
7237 @cindex event queues
7238 @cindex queues, event
7240 There are two event queues here -- the command event queue (#### which
7241 should be called "deferred event queue" and is in my glyph ws) and the
7242 dispatch event queue. Under X, it's possible to selectively process
7243 events such that we take all the user events before the non-user
7246 The dispatch queue (which used to occur duplicated inside of each event
7247 implementation) is used for events that have been read from the
7248 window-system event queue(s) and not yet process by
7249 @code{next_event_internal()}. It exists for two reasons: (1) because in many
7250 implementations, events often come from the window system by way of
7251 callbacks, and need to push the event to be returned onto a queue; (2)
7252 in order to handle QUIT in a guaranteed correct fashion without
7253 resorting to weird implementation-specific hacks that may or may not
7254 work well, we need to drain the window-system event queues and then look
7255 through to see if there's an event matching quit-char (usually ^G). the
7256 drained events need to go onto a queue. (There are other, similar cases
7257 where we need to drain the pending events so we can look ahead -- for
7258 example, checking for pending expose events under X to avoid excessive
7261 The command event queue is used @strong{AFTER} an event has been read from
7262 @code{next_event_internal()}, when it needs to be pushed back. This
7263 includes, for example, @code{accept-process-output}, @code{sleep-for}
7264 and @code{wait_delaying_user_input()}. Eval events and the like,
7265 generated by @code{enqueue-eval-event},
7266 @code{enqueue_magic_eval_event()}, etc. are also pushed onto this queue.
7267 Some events generated by callbacks are also pushed onto this queue, ####
7268 although maybe shouldn't be.
7270 The command queue takes precedence over the dispatch queue.
7272 #### It is worth investigating to see whether both queues are really
7273 needed, and how exactly they should be used. @code{enqueue-eval-event},
7274 for example, could certainly push onto the dispatch queue, and all
7275 callbacks maybe should. @code{wait_delaying_user_input()} seems to need
7276 both queues, since it can take events from the dispatch queue and push
7277 them onto the command queue; but it perhaps could be rewritten to avoid
7278 this. #### In general we need to review the handling of these two
7279 queues, figure out exactly what ought to be happening, and document it.
7281 @node Event Stream Callback Routines
7282 @section The Event Stream Callback Routines
7283 @cindex event stream callback routines, the
7284 @cindex callback routines, the event stream
7286 There is one object called an event_stream. This object contains
7287 callback functions for doing the window-system-dependent operations
7288 that XEmacs requires.
7290 If XEmacs is compiled with support for X11 and the X Toolkit, then this
7291 event_stream structure will contain functions that can cope with input
7292 on SXEmacs windows on multiple displays, as well as input from dumb tty
7295 If it is desired to have SXEmacs able to open frames on the displays of
7296 multiple heterogeneous machines, X11 and SunView, or X11 and NeXT, for
7297 example, then it will be necessary to construct an event_stream structure
7298 that can cope with the given types. Currently, the only implemented
7299 event_streams are for dumb-ttys, and for X11 plus dumb-ttys,
7302 To implement this for one window system is relatively simple.
7303 To implement this for multiple window systems is trickier and may
7304 not be possible in all situations, but it's been done for X and TTY.
7306 Note that these callbacks are @strong{NOT} console methods; that's because
7307 the routines are not specific to a particular console type but must
7308 be able to simultaneously cope with all allowable console types.
7310 The slots of the event_stream structure:
7314 A function which fills in a SXEmacs_event structure with the next event
7315 available. If there is no event available, then this should block.
7317 IMPORTANT: timer events and especially process events *must not* be
7318 returned if there are events of other types available; otherwise you can
7319 end up with an infinite loop in @code{Fdiscard_input()}.
7321 @item event_pending_cb
7322 A function which says whether there are events to be read. If called
7323 with an argument of 0, then this should say whether calling the
7324 @code{next_event_cb} will block. If called with a non-zero argument,
7325 then this should say whether there are that many user-generated events
7326 pending (that is, keypresses, mouse-clicks, dialog-box selection events,
7327 etc.). (This is used for redisplay optimization, among other things.)
7328 The difference is that the former includes process events and timer
7329 events, but the latter doesn't.
7331 If this function is not sure whether there are events to be read, it
7332 @strong{must} return 0. Otherwise various undesirable effects will
7333 occur, such as redisplay not occurring until the next event occurs.
7335 @item handle_magic_event_cb
7336 SXEmacs calls this with an event structure which contains window-system
7337 dependent information that SXEmacs doesn't need to know about, but which
7338 must happen in order. If the @code{next_event_cb} never returns an
7339 event of type "magic", this will never be used.
7341 @item format_magic_event_cb
7342 Called with a magic event; print a representation of the innards of the
7343 event to @var{PSTREAM}.
7345 @item compare_magic_event_cb
7346 Called with two magic events; return non-zero if the innards of the two
7347 are equal, zero otherwise.
7349 @item hash_magic_event_cb
7350 Called with a magic event; return a hash of the innards of the event.
7352 @item add_timeout_cb
7353 Called with an @var{EMACS_TIME}, the absolute time at which a wakeup event
7354 should be generated; and a void *, which is an arbitrary value that will
7355 be returned in the timeout event. The timeouts generated by this
7356 function should be one-shots: they fire once and then disappear. This
7357 callback should return an int id-number which uniquely identifies this
7358 wakeup. If an implementation doesn't have microseconds or millisecond
7359 granularity, it should round up to the closest value it can deal with.
7361 @item remove_timeout_cb
7362 Called with an int, the id number of a wakeup to discard. This id
7363 number must have been returned by the @code{add_timeout_cb}. If the given
7364 wakeup has already expired, this should do nothing.
7366 @item select_process_cb
7367 @item unselect_process_cb
7368 These callbacks tell the underlying implementation to add or remove a
7369 file descriptor from the list of fds which are polled for
7370 inferior-process input. When input becomes available on the given
7371 process connection, an event of type "process" should be generated.
7373 @item select_console_cb
7374 @item unselect_console_cb
7375 These callbacks tell the underlying implementation to add or remove a
7376 console from the list of consoles which are polled for user-input.
7378 @item select_device_cb
7379 @item unselect_device_cb
7380 These callbacks are used by Unixoid event loops (those that use @code{select()}
7381 and file descriptors and have a separate input fd per device).
7383 @item create_io_streams_cb
7384 @item delete_io_streams_cb
7385 These callbacks are called by process code to create the input and
7386 output lstreams which are used for subprocess I/O.
7389 A handler function called from the @code{QUIT} macro which should check
7390 whether the quit character has been typed. On systems with SIGIO, this
7391 will not be called unless the @code{sigio_happened} flag is true (it is set
7392 from the SIGIO handler).
7395 SXEmacs has its own event structures, which are distinct from the event
7396 structures used by X or any other window system. It is the job of the
7397 event_stream layer to translate to this format.
7399 @node Other Event Loop Functions
7400 @section Other Event Loop Functions
7401 @cindex event loop functions, other
7403 @code{detect_input_pending()} and @code{input-pending-p} look for
7404 input by calling @code{event_stream->event_pending_p} and looking in
7405 @code{[V]unread-command-event} and the @code{command_event_queue} (they
7406 do not check for an executing keyboard macro, though).
7408 @code{discard-input} cancels any command events pending (and any
7409 keyboard macros currently executing), and puts the others onto the
7410 @code{command_event_queue}. There is a comment about a ``race
7411 condition'', which is not a good sign.
7413 @code{next-command-event} and @code{read-char} are higher-level
7414 interfaces to @code{next-event}. @code{next-command-event} gets the
7415 next @dfn{command} event (i.e. keypress, mouse event, menu selection,
7416 or scrollbar action), calling @code{dispatch-event} on any others.
7417 @code{read-char} calls @code{next-command-event} and uses
7418 @code{event_to_character()} to return the character equivalent. With
7419 the right kind of input method support, it is possible for (read-char)
7420 to return a Kanji character.
7423 @section Stream Pairs
7424 @cindex stream pairs
7425 @cindex pairs, stream
7427 Since there are many possible processes/event loop combinations, the
7428 event code is responsible for creating an appropriate lstream type. The
7429 process implementation does not care about that implementation.
7431 The Create stream pair function is passed two void* values, which
7432 identify process-dependent 'handles'. The process implementation uses
7433 these handles to communicate with child processes. The function must be
7434 prepared to receive handle types of any process implementation. Since
7435 only one process implementation exists in a particular XEmacs
7436 configuration, preprocessing is a means of compiling in the support for
7437 the code which deals with particular handle types.
7439 For example, a unixoid type loop, which relies on file descriptors, may be
7440 asked to create a pair of streams by a unix-style process implementation.
7441 In this case, the handles passed are unix file descriptors, and the code
7442 may deal with these directly. Although, the same code may be used on Win32
7443 system with X-Windows. In this case, Win32 process implementation passes
7444 handles of type HANDLE, and the @code{create_io_streams} function must call
7445 appropriate function to get file descriptors given HANDLEs, so that these
7446 descriptors may be passed to @code{XtAddInput}.
7448 The handle given may have special denying value, in which case the
7449 corresponding lstream should not be created.
7451 The return value of the function is a unique stream identifier. It is used
7452 by processes implementation, in its platform-independent part. There is
7453 the get_process_from_usid function, which returns process object given its
7454 USID. The event stream is responsible for converting its internal handle
7457 Example is the TTY event stream. When a file descriptor signals input, the
7458 event loop must determine process to which the input is destined. Thus,
7459 the implementation uses process input stream file descriptor as USID, by
7460 simply casting the fd value to USID type.
7462 There are two special USID values. One, @code{USID_ERROR}, indicates
7463 that the stream pair cannot be created. The second,
7464 @code{USID_DONTHASH}, indicates that streams are created, but the event
7465 stream does not wish to be able to find the process by its
7466 USID. Specifically, if an event stream implementation never calls
7467 @code{get_process_from_usid}, this value should always be returned, to
7468 prevent accumulating useless information on USID to process
7471 @node Converting Events
7472 @section Converting Events
7473 @cindex converting events
7474 @cindex events, converting
7476 @code{character_to_event()}, @code{event_to_character()},
7477 @code{event-to-character}, and @code{character-to-event} convert between
7478 characters and keypress events corresponding to the characters. If the
7479 event was not a keypress, @code{event_to_character()} returns -1 and
7480 @code{event-to-character} returns @code{nil}. These functions convert
7481 between character representation and the split-up event representation
7482 (keysym plus mod keys).
7484 @node Dispatching Events; The Command Builder
7485 @section Dispatching Events; The Command Builder
7486 @cindex dispatching events; the command builder
7487 @cindex events; the command builder, dispatching
7488 @cindex command builder, dispatching events; the
7492 @node Focus Handling
7493 @section Focus Handling
7494 @cindex focus handling
7496 Ben's capsule lecture on focus:
7498 In GNU Emacs @code{select-frame} never changes the window-manager frame
7499 focus. All it does is change the "selected frame". This is similar to
7500 what happens when we call @code{select-device} or @code{select-console}.
7501 Whenever an event comes in (including a keyboard event), its frame is
7502 selected; therefore, evaluating @code{select-frame} in @samp{*scratch*}
7503 won't cause any effects because the next received event (in the same
7504 frame) will cause a switch back to the frame displaying
7507 Whenever a focus-change event is received from the window manager, it
7508 generates a @code{switch-frame} event, which causes the Lisp function
7509 @code{handle-switch-frame} to get run. This basically just runs
7510 @code{select-frame} (see below, however).
7512 In GNU Emacs, if you want to have an operation run when a frame is
7513 selected, you supply an event binding for @code{switch-frame} (and then
7514 maybe call @code{handle-switch-frame}, or something ...).
7516 In XEmacs, we @strong{do} change the window-manager frame focus as a
7517 result of @code{select-frame}, but not until the next time an event is
7518 received, so that a function that momentarily changes the selected frame
7519 won't cause WM focus flashing. (#### There's something not quite right
7520 here; this is causing the wrong-cursor-focus problems that you
7521 occasionally see. But the general idea is correct.) This approach is
7522 winning for people who use the explicit-focus model, but is trickier to
7525 We also don't make the @code{switch-frame} event visible but instead have
7526 @code{select-frame-hook}, which is a better approach.
7528 There is the problem of surrogate minibuffers, where when we enter the
7529 minibuffer, you essentially want to temporarily switch the WM focus to
7530 the frame with the minibuffer, and switch it back when you exit the
7533 GNU Emacs solves this with the crockish @code{redirect-frame-focus},
7534 which says "for keyboard events received from FRAME, act like they're
7535 coming from FOCUS-FRAME". I think what this means is that, when a
7536 keyboard event comes in and the event manager is about to select the
7537 event's frame, if that frame has its focus redirected, the redirected-to
7538 frame is selected instead. That way, if you're in a minibufferless
7539 frame and enter the minibuffer, then all Lisp functions that run see the
7540 selected frame as the minibuffer's frame rather than the minibufferless
7541 frame you came from, so that (e.g.) your typing actually appears in the
7542 minibuffer's frame and things behave sanely.
7544 There's also some weird logic that switches the redirected frame focus
7545 from one frame to another if Lisp code explicitly calls
7546 @code{select-frame} (but not if @code{handle-switch-frame} is called),
7547 and saves and restores the frame focus in window configurations,
7548 etc. etc. All of this logic is heavily @code{#if 0}'d, with lots of
7549 comments saying "No, this approach doesn't seem to work, so I'm trying
7550 this ... is it reasonable? Well, I'm not sure ..." that are a red flag
7551 indicating crockishness.
7553 Because of our way of doing things, we can avoid all this crock.
7554 Keyboard events never cause a select-frame (who cares what frame they're
7555 associated with? They come from a console, only). We change the actual
7556 WM focus to a surrogate minibuffer frame, so we don't have to do any
7557 internal redirection. In order to get the focus back, I took the
7558 approach in @file{minibuf.el} of just checking to see if the frame we moved to
7559 is still the selected frame, and move back to the old one if so.
7560 Conceivably we might have to do the weird "tracking" that GNU Emacs does
7561 when @code{select-frame} is called, but I don't think so. If the
7562 selected frame moved from the minibuffer frame, then we just leave it
7563 there, figuring that someone knows what they're doing. Because we don't
7564 have any redirection recorded anywhere, it's safe to do this, and we
7565 don't end up with unwanted redirection.
7567 @node Editor-Level Control Flow Modules
7568 @section Editor-Level Control Flow Modules
7569 @cindex control flow modules, editor-level
7570 @cindex modules, editor-level control flow
7574 @file{event-stream.c}
7583 These implement the handling of events (user input and other system
7586 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
7587 type and primitives for manipulating it.
7589 @file{event-stream.c} implements the basic functions for working with
7590 event queues, dispatching an event by looking it up in relevant keymaps
7591 and such, and handling timeouts; this includes the primitives
7592 @code{next-event} and @code{dispatch-event}, as well as related
7593 primitives such as @code{sit-for}, @code{sleep-for}, and
7594 @code{accept-process-output}. (@file{event-stream.c} is one of the
7595 hairiest and trickiest modules in XEmacs. Beware! You can easily mess
7598 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
7599 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
7600 (using @code{read()} and @code{select()}), respectively. The event
7601 interface enforces a clean separation between the specific code for
7602 interfacing with the operating system and the generic code for working
7603 with events, by defining an API of basic, low-level event methods;
7604 @file{event-Xt.c} and @file{event-tty.c} are two different
7605 implementations of this API. To add support for a new operating system
7606 (e.g. NeXTstep), one merely needs to provide another implementation of
7607 those API functions.
7609 Note that the choice of whether to use @file{event-Xt.c} or
7610 @file{event-tty.c} is made at compile time! Or at the very latest, it
7611 is made at startup time. @file{event-Xt.c} handles events for
7612 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
7613 support is not compiled into XEmacs. The reason for this is that there
7614 is only one event loop in XEmacs: thus, it needs to be able to receive
7615 events from all different kinds of frames.
7624 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
7625 type and associated methods and primitives. (Remember that keymaps are
7626 objects that associate event descriptions with functions to be called to
7627 ``execute'' those events; @code{dispatch-event} looks up events in the
7636 @file{cmdloop.c} contains functions that implement the actual editor
7637 command loop---i.e. the event loop that cyclically retrieves and
7638 dispatches events. This code is also rather tricky, just like
7639 @file{event-stream.c}.
7648 These two modules contain the basic code for defining keyboard macros.
7649 These functions don't actually do much; most of the code that handles keyboard
7650 macros is mixed in with the event-handling code in @file{event-stream.c}.
7658 This contains some miscellaneous code related to the minibuffer (most of
7659 the minibuffer code was moved into Lisp by Richard Mlynarik). This
7660 includes the primitives for completion (although filename completion is
7661 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
7662 command loop were cleaned up, this too could be in Lisp), and code for
7663 dealing with the echo area (this, too, was mostly moved into Lisp, and
7664 the only code remaining is code to call out to Lisp or provide simple
7665 bootstrapping implementations early in temacs, before the echo-area Lisp
7668 @node Asynchronous Events; Quit Checking, Evaluation; Stack Frames; Bindings, Events and the Event Loop, Top
7669 @chapter Asynchronous Events; Quit Checking
7670 @cindex asynchronous events; quit checking
7671 @cindex asynchronous events
7675 * Control-G (Quit) Checking::
7677 * Asynchronous Timeouts::
7681 @node Signal Handling
7682 @section Signal Handling
7683 @cindex signal handling
7685 @node Control-G (Quit) Checking
7686 @section Control-G (Quit) Checking
7687 @cindex Control-g checking
7688 @cindex C-g checking
7689 @cindex quit checking
7690 @cindex QUIT checking
7691 @cindex critical quit
7693 @emph{Note}: The code to handle QUIT is divided between @file{lisp.h}
7694 and @file{signal.c}. There is also some special-case code in the async
7695 timer code in @file{event-stream.c} to notice when the poll-for-quit
7696 (and poll-for-sigchld) timers have gone off.
7698 Here's an overview of how this convoluted stuff works:
7703 Scattered throughout the XEmacs core code are calls to the macro QUIT;
7704 This macro checks to see whether a @kbd{C-g} has recently been pressed
7705 and not yet handled, and if so, it handles the @kbd{C-g} by calling
7706 @code{signal_quit()}, which invokes the standard @code{Fsignal()} code,
7707 with the error being @code{Qquit}. Lisp code can establish handlers
7708 for this (using @code{condition-case}), but normally there is no
7709 handler, and so execution is thrown back to the innermost enclosing
7710 event loop. (One of the things that happens when entering an event loop
7711 is that a @code{condition-case} is established that catches @strong{all} calls
7712 to @code{signal}, including this one.)
7715 How does the QUIT macro check to see whether @kbd{C-g} has been pressed;
7716 obviously this needs to be extremely fast. Now for some history.
7717 In early Lemacs as inherited from the FSF going back 15 years or
7718 more, there was a great fondness for using SIGIO (which is sent
7719 whenever there is I/O available on a given socket, tty, etc.).
7720 In fact, in GNU Emacs, perhaps even today, all reading of events
7721 from the X server occurs inside the SIGIO handler! This is crazy,
7722 but not completely relevant. What is relevant is that similar
7723 stuff happened inside the SIGIO handler for @kbd{C-g}: it searched
7724 through all the pending (i.e. not yet delivered to XEmacs yet)
7725 X events for one that matched @kbd{C-g}. When it saw a match, it set
7726 Vquit_flag to Qt. On TTY's, @kbd{C-g} is actually mapped to be the
7727 interrupt character (i.e. it generates SIGINT), and XEmacs's
7728 handler for this signal sets Vquit_flag to Qt. Then, sometime
7729 later after the signal handlers finished and a QUIT macro was
7730 called, the macro noticed the setting of @code{Vquit_flag} and used
7731 this as an indication to call @code{signal_quit()}. What @code{signal_quit()}
7732 actually does is set @code{Vquit_flag} to Qnil (so that we won't get
7733 repeated interruptions from a single @kbd{C-g} press) and then calls
7734 the equivalent of (signal 'quit nil).
7737 Another complication is introduced in that Vquit_flag is actually
7738 exported to Lisp as @code{quit-flag}. This allows users some level of
7739 control over whether and when @kbd{C-g} is processed as quit, esp. in
7740 combination with @code{inhibit-quit}. This is another Lisp variable,
7741 and if set to non-nil, it inhibits @code{signal_quit()} from getting
7742 called, meaning that the @kbd{C-g} gets essentially ignored. But not
7743 completely: Because the resetting of @code{quit-flag} happens only
7744 in @code{signal_quit()}, which isn't getting called, the @kbd{C-g} press is
7745 still noticed, and as soon as @code{inhibit-quit} is set back to nil,
7746 a quit will be signalled at the next QUIT macro. Thus, what
7747 @code{inhibit-quit} really does is defer quits until after the quit-
7751 Another consideration, introduced by XEmacs, is critical quitting. If
7752 you press @kbd{Control-Shift-G} instead of just @kbd{C-g},
7753 @code{quit-flag} is set to @code{critical} instead of to t. When QUIT
7754 processes this value, it @strong{ignores} the value of
7755 @code{inhibit-quit}. This allows you to quit even out of a
7756 quit-inhibitted section of code! Furthermore, when @code{signal_quit()}
7757 notices that it was invoked as a result of a critical quit, it
7758 automatically invokes the debugger (which otherwise would only happen
7759 when @code{debug-on-quit} is set to t).
7762 Well, I explained above about how @code{quit-flag} gets set correctly,
7763 but I began with a disclaimer stating that this was the old way
7764 of doing things. What's done now? Well, first of all, the SIGIO
7765 handler (which formerly checked all pending events to see if there's
7766 a @kbd{C-g}) now does nothing but set a flag -- or actually two flags,
7767 something_happened and quit_check_signal_happened. There are two
7768 flags because the QUIT macro is now used for more than just handling
7769 QUIT; it's also used for running asynchronous timeout handlers that
7770 have recently expired, and perhaps other things. The idea here is
7771 that the QUIT macros occur extremely often in the code, but only occur
7772 at places that are relatively safe -- in particular, if an error occurs,
7773 nothing will get completely trashed.
7776 Now, let's look at QUIT again.
7780 UNFINISHED. Note, however, that as of the point when this comment got
7781 committed to CVS (mid-2001), the interaction between reading @kbd{C-g}
7782 as an event and processing it as QUIT was overhauled to (for the first
7783 time) be understandable and actually work correctly. Now, the way
7784 things work is that if @kbd{C-g} is pressed while XEmacs is blocking at
7785 the top level, waiting for a user event, it will be read as an event;
7786 otherwise, it will cause QUIT. (This includes times when XEmacs is
7787 blocking, but not waiting for a user event,
7788 e.g. @code{accept-process-output} and
7789 @code{wait_delaying_user_events()}.) Formerly, this was supposed to
7790 happen, but didn't always due to a bizarre and broken scheme, documented
7791 in @code{next_event_internal} like this:
7794 If we read a @kbd{C-g}, then set @code{quit-flag} but do not discard the
7795 @kbd{C-g}. The callers of @code{next_event_internal()} will do one of
7800 set @code{Vquit_flag} to Qnil. (@code{next-event} does this.) This will
7801 cause the ^G to be treated as a normal keystroke.
7804 not change @code{Vquit_flag} but attempt to enqueue the ^G, at which
7805 point it will be discarded. The next time QUIT is called, it will
7806 notice that @code{Vquit_flag} was set.
7810 This required weirdness in @code{enqueue_command_event_1} like this:
7813 put the event on the typeahead queue, unless the event is the quit char,
7814 in which case the @code{QUIT} which will occur on the next trip through this
7815 loop is all the processing we should do - leaving it on the queue would
7816 cause the quit to be processed twice.
7819 And further weirdness elsewhere, none of which made any sense, and
7820 didn't work, because (e.g.) it required that QUIT never happen anywhere
7821 inside @code{next_event_internal()} or any callers when @kbd{C-g} should
7822 be read as a user event, which was impossible to implement in practice.
7824 Now what we do is fairly simple. Callers of
7825 @code{next_event_internal()} that want @kbd{C-g} read as a user event
7826 call @code{begin_dont_check_for_quit()}. @code{next_event_internal()},
7827 when it gets a @kbd{C-g}, simply sets @code{Vquit_flag} (just as when a
7828 @kbd{C-g} is detected during the operation of @code{QUIT} or
7829 @code{QUITP}), and then tries to @code{QUIT}. This will fail if blocked
7830 by the previous call, at which point @code{next_event_internal()} will
7831 return the @kbd{C-g} as an event. To unblock things, first set
7832 @code{Vquit_flag} to nil (it was set to t when the @kbd{C-g} was read,
7833 and if we don't reset it, the next call to @code{QUIT} will quit), and
7834 then @code{unbind_to()} the depth returned by
7835 @code{begin_dont_check_for_quit()}. It makes no difference is
7836 @code{QUIT} is called a zillion times in @code{next_event_internal()} or
7837 anywhere else, because it's blocked and will never signal.
7840 @subsection Reentrancy Problems due to QUIT Checking
7842 Checking for QUIT can do quite a long of things -- since it pumps the
7843 event loop, this may cause arbitrary code to get executed, garbage collection
7844 to happen. etc. (In fact, garbage collection cannot happen because it is inhibited.) This has led to crashes when functions get called reentrantly when not expecting it. Example:
7846 @subheading Crash -- reentrant @code{re_match_2()}
7849 /* dont_check_for_quit is set in three circumstances:
7851 (1) when we are in the process of changing the window
7852 configuration. The frame might be in an inconsistent state,
7853 which will cause assertion failures if we check for QUIT.
7855 (2) when we are reading events, and want to read the C-g
7856 as an event. The normal check for quit will discard the C-g,
7859 (3) when we're going down with a fatal error. we're most likely
7860 in an inconsistent state, and we definitely don't want to be
7863 /* We should *not* conditionalize on Vinhibit_quit, or
7864 critical-quit (Control-Shift-G) won't work right. */
7866 /* WARNING: Even calling check_quit(), without actually dispatching
7867 a quit signal, can result in arbitrary Lisp code getting executed
7868 -- at least under Windows. (Not to mention obvious Lisp
7869 invocations like asynchronous timer callbacks.) Here's a sample
7870 stack trace to demonstrate:
7872 NTDLL! DbgBreakPoint@@0 address 0x77f9eea9
7873 assert_failed(const char * 0x012d036c, int 4596, const char * 0x012d0354) line 3478
7874 re_match_2_internal(re_pattern_buffer * 0x012d6780, const unsigned char * 0x00000000, int 0, const unsigned char * 0x022f9328, int 34, int 0, re_registers * 0x012d53d0 search_regs, int 34) line 4596 + 41 bytes
7875 re_search_2(re_pattern_buffer * 0x012d6780, const char * 0x00000000, int 0, const char * 0x022f9328, int 34, int 0, int 34, re_registers * 0x012d53d0 search_regs, int 34) line 4269 + 37 bytes
7876 re_search(re_pattern_buffer * 0x012d6780, const char * 0x022f9328, int 34, int 0, int 34, re_registers * 0x012d53d0 search_regs) line 4031 + 37 bytes
7877 string_match_1(long 31222628, long 30282164, long 28377092, buffer * 0x022fde00, int 0) line 413 + 69 bytes
7878 Fstring_match(long 31222628, long 30282164, long 28377092, long 28377092) line 436 + 34 bytes
7879 Ffuncall(int 3, long * 0x008297f8) line 3488 + 168 bytes
7880 execute_optimized_program(const unsigned char * 0x020ddc50, int 6, long * 0x020ddf50) line 744 + 16 bytes
7881 funcall_compiled_function(long 34407748, int 1, long * 0x00829aec) line 516 + 53 bytes
7882 Ffuncall(int 2, long * 0x00829ae8) line 3523 + 17 bytes
7883 execute_optimized_program(const unsigned char * 0x020ddc90, int 4, long * 0x020ddf90) line 744 + 16 bytes
7884 funcall_compiled_function(long 34407720, int 1, long * 0x00829e28) line 516 + 53 bytes
7885 Ffuncall(int 2, long * 0x00829e24) line 3523 + 17 bytes
7886 mapcar1(long 15, long * 0x00829e48, long 34447820, long 34187868) line 2929 + 11 bytes
7887 Fmapcar(long 34447820, long 34187868) line 3035 + 21 bytes
7888 Ffuncall(int 3, long * 0x00829f20) line 3488 + 93 bytes
7889 execute_optimized_program(const unsigned char * 0x020c2b70, int 7, long * 0x020dd010) line 744 + 16 bytes
7890 funcall_compiled_function(long 34407580, int 2, long * 0x0082a210) line 516 + 53 bytes
7891 Ffuncall(int 3, long * 0x0082a20c) line 3523 + 17 bytes
7892 execute_optimized_program(const unsigned char * 0x020cf810, int 6, long * 0x020cfb10) line 744 + 16 bytes
7893 funcall_compiled_function(long 34407524, int 0, long * 0x0082a580) line 516 + 53 bytes
7894 Ffuncall(int 1, long * 0x0082a57c) line 3523 + 17 bytes
7895 run_hook_with_args_in_buffer(buffer * 0x022fde00, int 1, long * 0x0082a57c, int 0) line 3980 + 13 bytes
7896 run_hook_with_args(int 1, long * 0x0082a57c, int 0) line 3993 + 23 bytes
7897 Frun_hooks(int 1, long * 0x0082a57c) line 3847 + 19 bytes
7898 run_hook(long 34447484) line 4094 + 11 bytes
7899 unsafe_handle_wm_initmenu_1(frame * 0x01dbb000) line 736 + 11 bytes
7900 unsafe_handle_wm_initmenu(long 28377092) line 807 + 11 bytes
7901 condition_case_1(long 28377116, long (long)* 0x0101c827 unsafe_handle_wm_initmenu(long), long 28377092, long (long, long)* 0x01005fa4 mswindows_modal_loop_error_handler(long, long), long 28377092) line 1692 + 7 bytes
7902 mswindows_protect_modal_loop(long (long)* 0x0101c827 unsafe_handle_wm_initmenu(long), long 28377092) line 1194 + 32 bytes
7903 mswindows_handle_wm_initmenu(HMENU__ * 0x00010199, frame * 0x01dbb000) line 826 + 17 bytes
7904 mswindows_wnd_proc(HWND__ * 0x000501da, unsigned int 278, unsigned int 65945, long 0) line 3089 + 31 bytes
7905 USER32! UserCallWinProc@@20 + 24 bytes
7906 USER32! DispatchClientMessage@@20 + 47 bytes
7907 USER32! __fnDWORD@@4 + 34 bytes
7908 NTDLL! KiUserCallbackDispatcher@@12 + 19 bytes
7909 USER32! DispatchClientMessage@@20 address 0x77e163cc
7910 USER32! DefWindowProcW@@16 + 34 bytes
7911 qxeDefWindowProc(HWND__ * 0x000501da, unsigned int 274, unsigned int 61696, long 98) line 1188 + 22 bytes
7912 mswindows_wnd_proc(HWND__ * 0x000501da, unsigned int 274, unsigned int 61696, long 98) line 3362 + 21 bytes
7913 USER32! UserCallWinProc@@20 + 24 bytes
7914 USER32! DispatchClientMessage@@20 + 47 bytes
7915 USER32! __fnDWORD@@4 + 34 bytes
7916 NTDLL! KiUserCallbackDispatcher@@12 + 19 bytes
7917 USER32! DispatchClientMessage@@20 address 0x77e163cc
7918 USER32! DefWindowProcW@@16 + 34 bytes
7919 qxeDefWindowProc(HWND__ * 0x000501da, unsigned int 262, unsigned int 98, long 540016641) line 1188 + 22 bytes
7920 mswindows_wnd_proc(HWND__ * 0x000501da, unsigned int 262, unsigned int 98, long 540016641) line 3362 + 21 bytes
7921 USER32! UserCallWinProc@@20 + 24 bytes
7922 USER32! DispatchMessageWorker@@8 + 244 bytes
7923 USER32! DispatchMessageW@@4 + 11 bytes
7924 qxeDispatchMessage(const tagMSG * 0x0082c684 @{msg=0x00000106 wp=0x00000062 lp=0x20300001@}) line 989 + 10 bytes
7925 mswindows_drain_windows_queue() line 1345 + 9 bytes
7926 emacs_mswindows_quit_p() line 3947
7927 event_stream_quit_p() line 666
7928 check_quit() line 686
7929 check_what_happened() line 437
7930 re_match_2_internal(re_pattern_buffer * 0x012d5a18, const unsigned char * 0x00000000, int 0, const unsigned char * 0x02235000, int 23486, int 14645, re_registers * 0x012d53d0 search_regs, int 23486) line 4717 + 14 bytes
7931 re_search_2(re_pattern_buffer * 0x012d5a18, const char * 0x02235000, int 23486, const char * 0x0223b38e, int 0, int 14645, int 8841, re_registers * 0x012d53d0 search_regs, int 23486) line 4269 + 37 bytes
7932 search_buffer(buffer * 0x022fde00, long 29077572, long 13789, long 23487, long 1, int 1, long 28377092, long 28377092, int 0) line 1224 + 89 bytes
7933 search_command(long 29077572, long 46975, long 28377116, long 28377092, long 28377092, int 1, int 1, int 0) line 1054 + 151 bytes
7934 Fre_search_forward(long 29077572, long 46975, long 28377116, long 28377092, long 28377092) line 2147 + 31 bytes
7935 Ffuncall(int 4, long * 0x0082ceb0) line 3488 + 216 bytes
7936 execute_optimized_program(const unsigned char * 0x02047810, int 13, long * 0x02080c10) line 744 + 16 bytes
7937 funcall_compiled_function(long 34187208, int 3, long * 0x0082d1b8) line 516 + 53 bytes
7938 Ffuncall(int 4, long * 0x0082d1b4) line 3523 + 17 bytes
7939 execute_optimized_program(const unsigned char * 0x01e96a10, int 6, long * 0x020ae510) line 744 + 16 bytes
7940 funcall_compiled_function(long 34186676, int 3, long * 0x0082d4a0) line 516 + 53 bytes
7941 Ffuncall(int 4, long * 0x0082d49c) line 3523 + 17 bytes
7942 execute_optimized_program(const unsigned char * 0x02156b50, int 4, long * 0x020c2db0) line 744 + 16 bytes
7943 funcall_compiled_function(long 34186564, int 2, long * 0x0082d780) line 516 + 53 bytes
7944 Ffuncall(int 3, long * 0x0082d77c) line 3523 + 17 bytes
7945 execute_optimized_program(const unsigned char * 0x0082d964, int 3, long * 0x020c2d70) line 744 + 16 bytes
7946 Fbyte_code(long 29405156, long 34352480, long 7) line 2392 + 38 bytes
7947 Feval(long 34354440) line 3290 + 187 bytes
7948 condition_case_1(long 34354572, long (long)* 0x01087232 Feval(long), long 34354440, long (long, long)* 0x01084764 run_condition_case_handlers(long, long), long 28377092) line 1692 + 7 bytes
7949 condition_case_3(long 34354440, long 28377092, long 34354572) line 1779 + 27 bytes
7950 execute_rare_opcode(long * 0x0082dc7c, const unsigned char * 0x01b090af, int 143) line 1269 + 19 bytes
7951 execute_optimized_program(const unsigned char * 0x01b09090, int 6, long * 0x020ae590) line 654 + 17 bytes
7952 funcall_compiled_function(long 34186620, int 0, long * 0x0082df68) line 516 + 53 bytes
7953 Ffuncall(int 1, long * 0x0082df64) line 3523 + 17 bytes
7954 execute_optimized_program(const unsigned char * 0x02195470, int 1, long * 0x020c2df0) line 744 + 16 bytes
7955 funcall_compiled_function(long 34186508, int 0, long * 0x0082e23c) line 516 + 53 bytes
7956 Ffuncall(int 1, long * 0x0082e238) line 3523 + 17 bytes
7957 execute_optimized_program(const unsigned char * 0x01e5d410, int 6, long * 0x0207d410) line 744 + 16 bytes
7958 funcall_compiled_function(long 34186312, int 1, long * 0x0082e524) line 516 + 53 bytes
7959 Ffuncall(int 2, long * 0x0082e520) line 3523 + 17 bytes
7960 execute_optimized_program(const unsigned char * 0x02108fb0, int 2, long * 0x020c2e30) line 744 + 16 bytes
7961 funcall_compiled_function(long 34186340, int 0, long * 0x0082e7fc) line 516 + 53 bytes
7962 Ffuncall(int 1, long * 0x0082e7f8) line 3523 + 17 bytes
7963 execute_optimized_program(const unsigned char * 0x020fe150, int 2, long * 0x01e6f510) line 744 + 16 bytes
7964 funcall_compiled_function(long 31008124, int 0, long * 0x0082ebd8) line 516 + 53 bytes
7965 Ffuncall(int 1, long * 0x0082ebd4) line 3523 + 17 bytes
7966 run_hook_with_args_in_buffer(buffer * 0x022fde00, int 1, long * 0x0082ebd4, int 0) line 3980 + 13 bytes
7967 run_hook_with_args(int 1, long * 0x0082ebd4, int 0) line 3993 + 23 bytes
7968 Frun_hooks(int 1, long * 0x0082ebd4) line 3847 + 19 bytes
7969 Ffuncall(int 2, long * 0x0082ebd0) line 3509 + 14 bytes
7970 execute_optimized_program(const unsigned char * 0x01ef2210, int 5, long * 0x01da8e10) line 744 + 16 bytes
7971 funcall_compiled_function(long 31020440, int 2, long * 0x0082eeb8) line 516 + 53 bytes
7972 Ffuncall(int 3, long * 0x0082eeb4) line 3523 + 17 bytes
7973 execute_optimized_program(const unsigned char * 0x0082f09c, int 3, long * 0x01d89390) line 744 + 16 bytes
7974 Fbyte_code(long 31102388, long 30970752, long 7) line 2392 + 38 bytes
7975 Feval(long 31087568) line 3290 + 187 bytes
7976 condition_case_1(long 30961240, long (long)* 0x01087232 Feval(long), long 31087568, long (long, long)* 0x01084764 run_condition_case_handlers(long, long), long 28510180) line 1692 + 7 bytes
7977 condition_case_3(long 31087568, long 28510180, long 30961240) line 1779 + 27 bytes
7978 execute_rare_opcode(long * 0x0082f450, const unsigned char * 0x01ef23ec, int 143) line 1269 + 19 bytes
7979 execute_optimized_program(const unsigned char * 0x01ef2310, int 6, long * 0x01da8f10) line 654 + 17 bytes
7980 funcall_compiled_function(long 31020412, int 1, long * 0x0082f740) line 516 + 53 bytes
7981 Ffuncall(int 2, long * 0x0082f73c) line 3523 + 17 bytes
7982 execute_optimized_program(const unsigned char * 0x020fe650, int 3, long * 0x01d8c490) line 744 + 16 bytes
7983 funcall_compiled_function(long 31020020, int 2, long * 0x0082fa14) line 516 + 53 bytes
7984 Ffuncall(int 3, long * 0x0082fa10) line 3523 + 17 bytes
7985 Fcall_interactively(long 29685180, long 28377092, long 28377092) line 1008 + 22 bytes
7986 Fcommand_execute(long 29685180, long 28377092, long 28377092) line 2929 + 17 bytes
7987 execute_command_event(command_builder * 0x01be1900, long 36626492) line 4048 + 25 bytes
7988 Fdispatch_event(long 36626492) line 4341 + 70 bytes
7989 Fcommand_loop_1() line 582 + 9 bytes
7990 command_loop_1(long 28377092) line 495
7991 condition_case_1(long 28377188, long (long)* 0x01064fb9 command_loop_1(long), long 28377092, long (long, long)* 0x010649d0 cmd_error(long, long), long 28377092) line 1692 + 7 bytes
7992 command_loop_3() line 256 + 35 bytes
7993 command_loop_2(long 28377092) line 269
7994 internal_catch(long 28457612, long (long)* 0x01064b20 command_loop_2(long), long 28377092, int * volatile 0x00000000) line 1317 + 7 bytes
7995 initial_command_loop(long 28377092) line 305 + 25 bytes
7996 STACK_TRACE_EYE_CATCHER(int 1, char * * 0x01b63ff0, char * * 0x01ca5300, int 0) line 2501
7997 main(int 1, char * * 0x01b63ff0, char * * 0x01ca5300) line 2938
7998 XEMACS! mainCRTStartup + 180 bytes
8000 KERNEL32! BaseProcessStart@@4 + 115547 bytes
8003 [explain dont_check_for_quit() et al]
8010 We implement our own profiling scheme so that we can determine
8011 things like which Lisp functions are occupying the most time. Any
8012 standard OS-provided profiling works on C functions, which is
8013 not always that useful -- and inconvenient, since it requires compiling
8014 with profile info and can't be retrieved dynamically, as XEmacs is
8017 The basic idea is simple. We set a profiling timer using setitimer
8018 (ITIMER_PROF), which generates a SIGPROF every so often. (This runs not
8019 in real time but rather when the process is executing or the system is
8020 running on behalf of the process -- at least, that is the case under
8021 Unix. Under MS Windows and Cygwin, there is no @code{setitimer()}, so we
8022 simulate it using multimedia timers, which run in real time. To make
8023 the results a bit more realistic, we ignore ticks that go off while
8024 blocking on an event wait. Note that Cygwin does provide a simulation
8025 of @code{setitimer()}, but it's in real time anyway, since Windows doesn't
8026 provide a way to have process-time timers, and furthermore, it's broken,
8027 so we don't use it.) When the signal goes off, we see what we're in, and
8028 add 1 to the count associated with that function.
8030 It would be nice to use the Lisp allocation mechanism etc. to keep track
8031 of the profiling information (i.e. to use Lisp hash tables), but we
8032 can't because that's not safe -- updating the timing information happens
8033 inside of a signal handler, so we can't rely on not being in the middle
8034 of Lisp allocation, garbage collection, @code{malloc()}, etc. Trying to make
8035 it work would be much more work than it's worth. Instead we use a basic
8036 (non-Lisp) hash table, which will not conflict with garbage collection
8037 or anything else as long as it doesn't try to resize itself. Resizing
8038 itself, however (which happens as a result of a @code{puthash()}), could be
8039 deadly. To avoid this, we make sure, at points where it's safe
8040 (e.g. @code{profile_record_about_to_call()} -- recording the entry into a
8041 function call), that the table always has some breathing room in it so
8042 that no resizes will occur until at least that many items are added.
8043 This is safe because any new item to be added in the sigprof would
8044 likely have the @code{profile_record_about_to_call()} called just before it,
8045 and the breathing room is checked.
8047 In general: any entry that the sigprof handler puts into the table comes
8048 from a backtrace frame (except "Processing Events at Top Level", and
8049 there's only one of those). Either that backtrace frame was added when
8050 profiling was on (in which case @code{profile_record_about_to_call()} was
8051 called and the breathing space updated), or when it was off -- and in
8052 this case, no such frames can have been added since the last time
8053 @code{start-profile} was called, so when @code{start-profile} is called we make
8054 sure there is sufficient breathing room to account for all entries
8055 currently on the stack.
8057 Jan 1998: In addition to timing info, I have added code to remember call
8058 counts of Lisp funcalls. The @code{profile_increase_call_count()}
8059 function is called from @code{Ffuncall()}, and serves to add data to
8060 Vcall_count_profile_table. This mechanism is much simpler and
8061 independent of the SIGPROF-driven one. It uses the Lisp allocation
8062 mechanism normally, since it is not called from a handler. It may
8063 even be useful to provide a way to turn on only one profiling
8064 mechanism, but I haven't done so yet. --hniksic
8066 Dec 2002: Total overhaul of the interface, making it sane and easier to
8069 Feb 2003: Lots of rewriting of the internal code. Add GC-consing-usage,
8070 total GC usage, and total timing to the information tracked. Track
8071 profiling overhead and allow the ability to have internal sections
8072 (e.g. internal-external conversion, byte-char conversion) that are
8073 treated like Lisp functions for the purpose of profiling. --ben
8075 BEWARE: If you are modifying this file, be @strong{very} careful. Correctly
8076 implementing the "total" values is very tricky due to the possibility of
8077 recursion and of functions already on the stack when starting to
8078 profile/still on the stack when stopping.
8080 @node Asynchronous Timeouts
8081 @section Asynchronous Timeouts
8082 @cindex asynchronous timeouts
8091 @cindex exits, expected and unexpected
8092 @cindex unexpected exits
8093 @cindex expected exits
8095 Ben's capsule summary about expected and unexpected exits from XEmacs.
8097 Expected exits occur when the user directs XEmacs to exit, for example
8098 by pressing the close button on the only frame in XEmacs, or by typing
8099 @kbd{C-x C-c}. This runs @code{save-buffers-kill-emacs}, which saves
8100 any necessary buffers, and then exits using the primitive
8103 However, unexpected exits occur in a few different ways:
8107 A memory access violation or other hardware-generated exception occurs.
8108 This is the worst possible problem to deal with, because the fault can
8109 occur while XEmacs is in any state whatsoever, even quite unstable ones.
8110 As a result, we need to be @strong{extremely} careful what we do.
8113 We are using one X display (or if we've used more, we've closed the
8114 others already), and some hardware or other problem happens and
8115 suddenly we've lost our connection to the display. In this situation,
8116 things are not so dire as in the last one; our code itself isn't
8117 trashed, so we can continue execution as normal, after having set
8118 things up so that we can exit at the appropriate time. Our exit
8119 still needs to be of the emergency nature; we have no displays, so
8120 any attempts to use them will fail. We simply want to auto-save
8121 (the single most important thing to do during shut-down), do minimal
8122 cleanup of stuff that has an independent existence outside of XEmacs,
8126 Currently, both unexpected exit scenarios described above set
8127 @code{preparing_for_armageddon} to indicate that nonessential and possibly
8128 dangerous things should not be done, specifically:
8132 no garbage collection.
8136 no messages of any sort from autosaving.
8138 autosaving tries harder, ignoring certain failures.
8140 existing frames are not deleted.
8143 (Also, all places that set @code{preparing_for_armageddon} also
8144 set @code{dont_check_for_quit}. This happens separately because it's
8145 also necessary to set other variables to make absolutely sure
8146 no quitting happens.)
8148 In the first scenario above (the access violation), we also set
8149 @code{fatal_error_in_progress}. This causes more things to not happen:
8153 assertion failures do not abort.
8155 printing code does not do code conversion or gettext when
8156 printing to stdout/stderr.
8159 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Asynchronous Events; Quit Checking, Top
8160 @chapter Evaluation; Stack Frames; Bindings
8161 @cindex evaluation; stack frames; bindings
8162 @cindex stack frames; bindings, evaluation;
8163 @cindex bindings, evaluation; stack frames;
8167 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
8168 * Simple Special Forms::
8176 @code{Feval()} evaluates the form (a Lisp object) that is passed to
8177 it. Note that evaluation is only non-trivial for two types of objects:
8178 symbols and conses. A symbol is evaluated simply by calling
8179 @code{symbol-value} on it and returning the value.
8181 Evaluating a cons means calling a function. First, @code{eval} checks
8182 to see if garbage-collection is necessary, and calls
8183 @code{garbage_collect_1()} if so. It then increases the evaluation
8184 depth by 1 (@code{lisp_eval_depth}, which is always less than
8185 @code{max_lisp_eval_depth}) and adds an element to the linked list of
8186 @code{struct backtrace}'s (@code{backtrace_list}). Each such structure
8187 contains a pointer to the function being called plus a list of the
8188 function's arguments. Originally these values are stored unevalled, and
8189 as they are evaluated, the backtrace structure is updated. Garbage
8190 collection pays attention to the objects pointed to in the backtrace
8191 structures (garbage collection might happen while a function is being
8192 called or while an argument is being evaluated, and there could easily
8193 be no other references to the arguments in the argument list; once an
8194 argument is evaluated, however, the unevalled version is not needed by
8195 eval, and so the backtrace structure is changed).
8197 At this point, the function to be called is determined by looking at
8198 the car of the cons (if this is a symbol, its function definition is
8199 retrieved and the process repeated). The function should then consist
8200 of either a @code{Lisp_Subr} (built-in function written in C), a
8201 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
8202 symbols @code{autoload}, @code{macro} or @code{lambda}.
8204 If the function is a @code{Lisp_Subr}, the lisp object points to a
8205 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
8206 pointer to the C function, a minimum and maximum number of arguments
8207 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
8208 pointer to the symbol referring to that subr, and a couple of other
8209 things. If the subr wants its arguments @code{UNEVALLED}, they are
8210 passed raw as a list. Otherwise, an array of evaluated arguments is
8211 created and put into the backtrace structure, and either passed whole
8212 (@code{MANY}) or each argument is passed as a C argument.
8214 If the function is a @code{Lisp_Compiled_Function},
8215 @code{funcall_compiled_function()} is called. If the function is a
8216 lambda list, @code{funcall_lambda()} is called. If the function is a
8217 macro, [..... fill in] is done. If the function is an autoload,
8218 @code{do_autoload()} is called to load the definition and then eval
8219 starts over [explain this more].
8221 When @code{Feval()} exits, the evaluation depth is reduced by one, the
8222 debugger is called if appropriate, and the current backtrace structure
8223 is removed from the list.
8225 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
8226 to go through the list of formal parameters to the function and bind
8227 them to the actual arguments, checking for @code{&rest} and
8228 @code{&optional} symbols in the formal parameters and making sure the
8229 number of actual arguments is correct.
8230 @code{funcall_compiled_function()} can do this a little more
8231 efficiently, since the formal parameter list can be checked for sanity
8232 when the compiled function object is created.
8234 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
8237 @code{funcall_compiled_function()} calls the real byte-code interpreter
8238 @code{execute_optimized_program()} on the byte-code instructions, which
8239 are converted into an internal form for faster execution.
8241 When a compiled function is executed for the first time by
8242 @code{funcall_compiled_function()}, or during the dump phase of building
8243 SXEmacs, the byte-code instructions are converted from a
8244 @code{Lisp_String} (which is inefficient to access, especially in the
8245 presence of MULE) into a @code{Lisp_Opaque} object containing an array
8246 of unsigned char, which can be directly executed by the byte-code
8247 interpreter. At this time the byte code is also analyzed for validity
8248 and transformed into a more optimized form, so that
8249 @code{execute_optimized_program()} can really fly.
8251 Here are some of the optimizations performed by the internal byte-code
8255 References to the @code{constants} array are checked for out-of-range
8256 indices, so that the byte interpreter doesn't have to.
8258 References to the @code{constants} array that will be used as a Lisp
8259 variable are checked for being correct non-constant (i.e. not @code{t},
8260 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
8263 The maximum number of variable bindings in the byte-code is
8264 pre-computed, so that space on the @code{specpdl} stack can be
8265 pre-reserved once for the whole function execution.
8267 All byte-code jumps are relative to the current program counter instead
8268 of the start of the program, thereby saving a register.
8270 One-byte relative jumps are converted from the byte-code form of unsigned
8271 chars offset by 127 to machine-friendly signed chars.
8274 Of course, this transformation of the @code{instructions} should not be
8275 visible to the user, so @code{Fcompiled_function_instructions()} needs
8276 to know how to convert the optimized opaque object back into a Lisp
8277 string that is identical to the original string from the @file{.elc}
8278 file. (Actually, the resulting string may (rarely) contain slightly
8279 different, yet equivalent, byte code.)
8281 @code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun
8282 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
8283 x2) (quote x3) ...))}. @code{Ffuncall()} contains its own code to do
8284 the evaluation, however, and is very similar to @code{Feval()}.
8286 From the performance point of view, it is worth knowing that most of the
8287 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
8288 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
8291 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
8292 @code{funcall} except that if the last argument is a list, the result is the
8293 same as if each of the arguments in the list had been passed separately.
8294 @code{Fapply()} does some business to expand the last argument if it's a
8295 list, then calls @code{Ffuncall()} to do the work.
8297 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
8298 @code{call3()} call a function, passing it the argument(s) given (the
8299 arguments are given as separate C arguments rather than being passed as
8300 an array). @code{apply1()} uses @code{Fapply()} while the others use
8301 @code{Ffuncall()} to do the real work.
8303 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
8304 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
8305 @cindex dynamic binding; the specbinding stack; unwind-protects
8306 @cindex binding; the specbinding stack; unwind-protects, dynamic
8307 @cindex specbinding stack; unwind-protects, dynamic binding; the
8308 @cindex unwind-protects, dynamic binding; the specbinding stack;
8314 Lisp_Object old_value;
8315 Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
8319 @code{struct specbinding} is used for local-variable bindings and
8320 unwind-protects. @code{specpdl} holds an array of @code{struct specbinding}'s,
8321 @code{specpdl_ptr} points to the beginning of the free bindings in the
8322 array, @code{specpdl_size} specifies the total number of binding slots
8323 in the array, and @code{max_specpdl_size} specifies the maximum number
8324 of bindings the array can be expanded to hold. @code{grow_specpdl()}
8325 increases the size of the @code{specpdl} array, multiplying its size by
8326 2 but never exceeding @code{max_specpdl_size} (except that if this
8327 number is less than 400, it is first set to 400).
8329 @code{specbind()} binds a symbol to a value and is used for local
8330 variables and @code{let} forms. The symbol and its old value (which
8331 might be @code{Qunbound}, indicating no prior value) are recorded in the
8332 specpdl array, and @code{specpdl_size} is increased by 1.
8334 @code{record_unwind_protect()} implements an @dfn{unwind-protect},
8335 which, when placed around a section of code, ensures that some specified
8336 cleanup routine will be executed even if the code exits abnormally
8337 (e.g. through a @code{throw} or quit). @code{record_unwind_protect()}
8338 simply adds a new specbinding to the @code{specpdl} array and stores the
8339 appropriate information in it. The cleanup routine can either be a C
8340 function, which is stored in the @code{func} field, or a @code{progn}
8341 form, which is stored in the @code{old_value} field.
8343 @code{unbind_to()} removes specbindings from the @code{specpdl} array
8344 until the specified position is reached. Each specbinding can be one of
8349 an unwind-protect with a C cleanup function (@code{func} is not 0, and
8350 @code{old_value} holds an argument to be passed to the function);
8352 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
8353 is @code{nil}, and @code{old_value} holds the form to be executed with
8354 @code{Fprogn()}); or
8356 a local-variable binding (@code{func} is 0, @code{symbol} is not
8357 @code{nil}, and @code{old_value} holds the old value, which is stored as
8358 the symbol's value).
8361 @node Simple Special Forms
8362 @section Simple Special Forms
8363 @cindex special forms, simple
8365 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
8366 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
8367 @code{let*}, @code{let}, @code{while}
8369 All of these are very simple and work as expected, calling
8370 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
8371 @code{let} and @code{let*}) using @code{specbind()} to create bindings
8372 and @code{unbind_to()} to undo the bindings when finished.
8374 Note that, with the exception of @code{Fprogn}, these functions are
8375 typically called in real life only in interpreted code, since the byte
8376 compiler knows how to convert calls to these functions directly into
8379 @node Catch and Throw
8380 @section Catch and Throw
8381 @cindex catch and throw
8382 @cindex throw, catch and
8389 struct catchtag *next;
8390 struct gcpro *gcpro;
8392 struct backtrace *backlist;
8393 int lisp_eval_depth;
8398 @code{catch} is a Lisp function that places a catch around a body of
8399 code. A catch is a means of non-local exit from the code. When a catch
8400 is created, a tag is specified, and executing a @code{throw} to this tag
8401 will exit from the body of code caught with this tag, and its value will
8402 be the value given in the call to @code{throw}. If there is no such
8403 call, the code will be executed normally.
8405 Information pertaining to a catch is held in a @code{struct catchtag},
8406 which is placed at the head of a linked list pointed to by
8407 @code{catchlist}. @code{internal_catch()} is passed a C function to
8408 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
8409 give it, and places a catch around the function. Each @code{struct
8410 catchtag} is held in the stack frame of the @code{internal_catch()}
8411 instance that created the catch.
8413 @code{internal_catch()} is fairly straightforward. It stores into the
8414 @code{struct catchtag} the tag name and the current values of
8415 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
8416 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
8417 (storing the jump point into the @code{struct catchtag}), and calls the
8418 function. Control will return to @code{internal_catch()} either when
8419 the function exits normally or through a @code{_longjmp()} to this jump
8420 point. In the latter case, @code{throw} will store the value to be
8421 returned into the @code{struct catchtag} before jumping. When it's
8422 done, @code{internal_catch()} removes the @code{struct catchtag} from
8423 the catchlist and returns the proper value.
8425 @code{Fthrow()} goes up through the catchlist until it finds one with
8426 a matching tag. It then calls @code{unbind_catch()} to restore
8427 everything to what it was when the appropriate catch was set, stores the
8428 return value in the @code{struct catchtag}, and jumps (with
8429 @code{_longjmp()}) to its jump point.
8431 @code{unbind_catch()} removes all catches from the catchlist until it
8432 finds the correct one. Some of the catches might have been placed for
8433 error-trapping, and if so, the appropriate entries on the handlerlist
8434 must be removed (see ``errors''). @code{unbind_catch()} also restores
8435 the values of @code{gcprolist}, @code{backtrace_list}, and
8436 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
8437 created since the catch.
8440 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
8441 @chapter Symbols and Variables
8442 @cindex symbols and variables
8443 @cindex variables, symbols and
8446 * Introduction to Symbols::
8451 @node Introduction to Symbols
8452 @section Introduction to Symbols
8453 @cindex symbols, introduction to
8455 A symbol is basically just an object with four fields: a name (a
8456 string), a value (some Lisp object), a function (some Lisp object), and
8457 a property list (usually a list of alternating keyword/value pairs).
8458 What makes symbols special is that there is usually only one symbol with
8459 a given name, and the symbol is referred to by name. This makes a
8460 symbol a convenient way of calling up data by name, i.e. of implementing
8461 variables. (The variable's value is stored in the @dfn{value slot}.)
8462 Similarly, functions are referenced by name, and the definition of the
8463 function is stored in a symbol's @dfn{function slot}. This means that
8464 there can be a distinct function and variable with the same name. The
8465 property list is used as a more general mechanism of associating
8466 additional values with particular names, and once again the namespace is
8467 independent of the function and variable namespaces.
8473 The identity of symbols with their names is accomplished through a
8474 structure called an obarray, which is just a poorly-implemented hash
8475 table mapping from strings to symbols whose name is that string. (I say
8476 ``poorly implemented'' because an obarray appears in Lisp as a vector
8477 with some hidden fields rather than as its own opaque type. This is an
8478 Emacs Lisp artifact that should be fixed.)
8480 Obarrays are implemented as a vector of some fixed size (which should
8481 be a prime for best results), where each ``bucket'' of the vector
8482 contains one or more symbols, threaded through a hidden @code{next}
8483 field in the symbol. Lookup of a symbol in an obarray, and adding a
8484 symbol to an obarray, is accomplished through standard hash-table
8487 The standard Lisp function for working with symbols and obarrays is
8488 @code{intern}. This looks up a symbol in an obarray given its name; if
8489 it's not found, a new symbol is automatically created with the specified
8490 name, added to the obarray, and returned. This is what happens when the
8491 Lisp reader encounters a symbol (or more precisely, encounters the name
8492 of a symbol) in some text that it is reading. There is a standard
8493 obarray called @code{obarray} that is used for this purpose, although
8494 the Lisp programmer is free to create his own obarrays and @code{intern}
8497 Note that, once a symbol is in an obarray, it stays there until
8498 something is done about it, and the standard obarray @code{obarray}
8499 always stays around, so once you use any particular variable name, a
8500 corresponding symbol will stay around in @code{obarray} until you exit
8503 Note that @code{obarray} itself is a variable, and as such there is a
8504 symbol in @code{obarray} whose name is @code{"obarray"} and which
8505 contains @code{obarray} as its value.
8507 Note also that this call to @code{intern} occurs only when in the Lisp
8508 reader, not when the code is executed (at which point the symbol is
8509 already around, stored as such in the definition of the function).
8511 You can create your own obarray using @code{make-vector} (this is
8512 horrible but is an artifact) and intern symbols into that obarray.
8513 Doing that will result in two or more symbols with the same name.
8514 However, at most one of these symbols is in the standard @code{obarray}:
8515 You cannot have two symbols of the same name in any particular obarray.
8516 Note that you cannot add a symbol to an obarray in any fashion other
8517 than using @code{intern}: i.e. you can't take an existing symbol and put
8518 it in an existing obarray. Nor can you change the name of an existing
8519 symbol. (Since obarrays are vectors, you can violate the consistency of
8520 things by storing directly into the vector, but let's ignore that
8523 Usually symbols are created by @code{intern}, but if you really want,
8524 you can explicitly create a symbol using @code{make-symbol}, giving it
8525 some name. The resulting symbol is not in any obarray (i.e. it is
8526 @dfn{uninterned}), and you can't add it to any obarray. Therefore its
8527 primary purpose is as a symbol to use in macros to avoid namespace
8528 pollution. It can also be used as a carrier of information, but cons
8529 cells could probably be used just as well.
8531 You can also use @code{intern-soft} to look up a symbol but not create
8532 a new one, and @code{unintern} to remove a symbol from an obarray. This
8533 returns the removed symbol. (Remember: You can't put the symbol back
8534 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
8538 @section Symbol Values
8539 @cindex symbol values
8540 @cindex values, symbol
8542 The value field of a symbol normally contains a Lisp object. However,
8543 a symbol can be @dfn{unbound}, meaning that it logically has no value.
8544 This is internally indicated by storing a special Lisp object, called
8545 @dfn{the unbound marker} and stored in the global variable
8546 @code{Qunbound}. The unbound marker is of a special Lisp object type
8547 called @dfn{symbol-value-magic}. It is impossible for the Lisp
8548 programmer to directly create or access any object of this type.
8550 @strong{You must not let any ``symbol-value-magic'' object escape to
8551 the Lisp level.} Printing any of these objects will cause the message
8552 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
8553 (You may see this normally when you call @code{debug_print()} from the
8554 debugger on a Lisp object.) If you let one of these objects escape to
8555 the Lisp level, you will violate a number of assumptions contained in
8556 the C code and make the unbound marker not function right.
8558 When a symbol is created, its value field (and function field) are set
8559 to @code{Qunbound}. The Lisp programmer can restore these conditions
8560 later using @code{makunbound} or @code{fmakunbound}, and can query to
8561 see whether the value of function fields are @dfn{bound} (i.e. have a
8562 value other than @code{Qunbound}) using @code{boundp} and
8563 @code{fboundp}. The fields are set to a normal Lisp object using
8564 @code{set} (or @code{setq}) and @code{fset}.
8566 Other symbol-value-magic objects are used as special markers to
8567 indicate variables that have non-normal properties. This includes any
8568 variables that are tied into C variables (setting the variable magically
8569 sets some global variable in the C code, and likewise for retrieving the
8570 variable's value), variables that magically tie into slots in the
8571 current buffer, variables that are buffer-local, etc. The
8572 symbol-value-magic object is stored in the value cell in place of
8573 a normal object, and the code to retrieve a symbol's value
8574 (i.e. @code{symbol-value}) knows how to do special things with them.
8575 This means that you should not just fetch the value cell directly if you
8576 want a symbol's value.
8578 The exact workings of this are rather complex and involved and are
8579 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
8582 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
8583 @chapter Buffers and Textual Representation
8584 @cindex buffers and textual representation
8585 @cindex textual representation, buffers and
8588 * Introduction to Buffers:: A buffer holds a block of text such as a file.
8589 * The Text in a Buffer:: Representation of the text in a buffer.
8590 * Buffer Lists:: Keeping track of all buffers.
8591 * Markers and Extents:: Tagging locations within a buffer.
8592 * Bufbytes and Emchars:: Representation of individual characters.
8593 * The Buffer Object:: The Lisp object corresponding to a buffer.
8596 @node Introduction to Buffers
8597 @section Introduction to Buffers
8598 @cindex buffers, introduction to
8600 A buffer is logically just a Lisp object that holds some text.
8601 In this, it is like a string, but a buffer is optimized for
8602 frequent insertion and deletion, while a string is not. Furthermore:
8606 Buffers are @dfn{permanent} objects, i.e. once you create them, they
8607 remain around, and need to be explicitly deleted before they go away.
8609 Each buffer has a unique name, which is a string. Buffers are
8610 normally referred to by name. In this respect, they are like
8613 Buffers have a default insertion position, called @dfn{point}.
8614 Inserting text (unless you explicitly give a position) goes at point,
8615 and moves point forward past the text. This is what is going on when
8616 you type text into Emacs.
8618 Buffers have lots of extra properties associated with them.
8620 Buffers can be @dfn{displayed}. What this means is that there
8621 exist a number of @dfn{windows}, which are objects that correspond
8622 to some visible section of your display, and each window has
8623 an associated buffer, and the current contents of the buffer
8624 are shown in that section of the display. The redisplay mechanism
8625 (which takes care of doing this) knows how to look at the
8626 text of a buffer and come up with some reasonable way of displaying
8627 this. Many of the properties of a buffer control how the
8628 buffer's text is displayed.
8630 One buffer is distinguished and called the @dfn{current buffer}. It is
8631 stored in the variable @code{current_buffer}. Buffer operations operate
8632 on this buffer by default. When you are typing text into a buffer, the
8633 buffer you are typing into is always @code{current_buffer}. Switching
8634 to a different window changes the current buffer. Note that Lisp code
8635 can temporarily change the current buffer using @code{set-buffer} (often
8636 enclosed in a @code{save-excursion} so that the former current buffer
8637 gets restored when the code is finished). However, calling
8638 @code{set-buffer} will NOT cause a permanent change in the current
8639 buffer. The reason for this is that the top-level event loop sets
8640 @code{current_buffer} to the buffer of the selected window, each time
8641 it finishes executing a user command.
8644 Make sure you understand the distinction between @dfn{current buffer}
8645 and @dfn{buffer of the selected window}, and the distinction between
8646 @dfn{point} of the current buffer and @dfn{window-point} of the selected
8647 window. (This latter distinction is explained in detail in the section
8650 @node The Text in a Buffer
8651 @section The Text in a Buffer
8652 @cindex text in a buffer, the
8653 @cindex buffer, the text in a
8655 The text in a buffer consists of a sequence of zero or more
8656 characters. A @dfn{character} is an integer that logically represents
8657 a letter, number, space, or other unit of text. Most of the characters
8658 that you will typically encounter belong to the ASCII set of characters,
8659 but there are also characters for various sorts of accented letters,
8660 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
8661 etc.), Cyrillic and Greek letters, etc. The actual number of possible
8662 characters is quite large.
8664 For now, we can view a character as some non-negative integer that
8665 has some shape that defines how it typically appears (e.g. as an
8666 uppercase A). (The exact way in which a character appears depends on the
8667 font used to display the character.) The internal type of characters in
8668 the C code is an @code{Emchar}; this is just an @code{int}, but using a
8669 symbolic type makes the code clearer.
8671 Between every character in a buffer is a @dfn{buffer position} or
8672 @dfn{character position}. We can speak of the character before or after
8673 a particular buffer position, and when you insert a character at a
8674 particular position, all characters after that position end up at new
8675 positions. When we speak of the character @dfn{at} a position, we
8676 really mean the character after the position. (This schizophrenia
8677 between a buffer position being ``between'' a character and ``on'' a
8678 character is rampant in Emacs.)
8680 Buffer positions are numbered starting at 1. This means that
8681 position 1 is before the first character, and position 0 is not
8682 valid. If there are N characters in a buffer, then buffer
8683 position N+1 is after the last one, and position N+2 is not valid.
8685 The internal makeup of the Emchar integer varies depending on whether
8686 we have compiled with MULE support. If not, the Emchar integer is an
8687 8-bit integer with possible values from 0 - 255. 0 - 127 are the
8688 standard ASCII characters, while 128 - 255 are the characters from the
8689 ISO-8859-1 character set. If we have compiled with MULE support, an
8690 Emchar is a 19-bit integer, with the various bits having meanings
8691 according to a complex scheme that will be detailed later. The
8692 characters numbered 0 - 255 still have the same meanings as for the
8693 non-MULE case, though.
8695 Internally, the text in a buffer is represented in a fairly simple
8696 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
8697 in the middle. Although the gap is of some substantial size in bytes,
8698 there is no text contained within it: From the perspective of the text
8699 in the buffer, it does not exist. The gap logically sits at some buffer
8700 position, between two characters (or possibly at the beginning or end of
8701 the buffer). Insertion of text in a buffer at a particular position is
8702 always accomplished by first moving the gap to that position
8703 (i.e. through some block moving of text), then writing the text into the
8704 beginning of the gap, thereby shrinking the gap. If the gap shrinks
8705 down to nothing, a new gap is created. (What actually happens is that a
8706 new gap is ``created'' at the end of the buffer's text, which requires
8707 nothing more than changing a couple of indices; then the gap is
8708 ``moved'' to the position where the insertion needs to take place by
8709 moving up in memory all the text after that position.) Similarly,
8710 deletion occurs by moving the gap to the place where the text is to be
8711 deleted, and then simply expanding the gap to include the deleted text.
8712 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
8713 just that the internal indices that keep track of where the gap is
8714 located are changed.)
8716 Note that the total amount of memory allocated for a buffer text never
8717 decreases while the buffer is live. Therefore, if you load up a
8718 20-megabyte file and then delete all but one character, there will be a
8719 20-megabyte gap, which won't get any smaller (except by inserting
8720 characters back again). Once the buffer is killed, the memory allocated
8721 for the buffer text will be freed, but it will still be sitting on the
8722 heap, taking up virtual memory, and will not be released back to the
8723 operating system. (However, if you have compiled SXEmacs with rel-alloc,
8724 the situation is different. In this case, the space @emph{will} be
8725 released back to the operating system. However, this tends to result in a
8726 noticeable speed penalty.)
8728 Astute readers may notice that the text in a buffer is represented as
8729 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
8730 a 19-bit integer, which clearly cannot fit in a byte. This means (of
8731 course) that the text in a buffer uses a different representation from
8732 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
8733 four bytes. The conversion between these two representations is complex
8734 and will be described later.
8736 In the non-MULE case, everything is very simple: An Emchar
8737 is an 8-bit value, which fits neatly into one byte.
8739 If we are given a buffer position and want to retrieve the
8740 character at that position, we need to follow these steps:
8744 Pretend there's no gap, and convert the buffer position into a @dfn{byte
8745 index} that indexes to the appropriate byte in the buffer's stream of
8746 textual bytes. By convention, byte indices begin at 1, just like buffer
8747 positions. In the non-MULE case, byte indices and buffer positions are
8748 identical, since one character equals one byte.
8750 Convert the byte index into a @dfn{memory index}, which takes the gap
8751 into account. The memory index is a direct index into the block of
8752 memory that stores the text of a buffer. This basically just involves
8753 checking to see if the byte index is past the gap, and if so, adding the
8754 size of the gap to it. By convention, memory indices begin at 1, just
8755 like buffer positions and byte indices, and when referring to the
8756 position that is @dfn{at} the gap, we always use the memory position at
8757 the @emph{beginning}, not at the end, of the gap.
8759 Fetch the appropriate bytes at the determined memory position.
8761 Convert these bytes into an Emchar.
8764 In the non-Mule case, (3) and (4) boil down to a simple one-byte
8767 Note that we have defined three types of positions in a buffer:
8771 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
8773 @dfn{byte indices}, typedef @code{Bytind}
8775 @dfn{memory indices}, typedef @code{Memind}
8778 All three typedefs are just @code{int}s, but defining them this way makes
8779 things a lot clearer.
8781 Most code works with buffer positions. In particular, all Lisp code
8782 that refers to text in a buffer uses buffer positions. Lisp code does
8783 not know that byte indices or memory indices exist.
8785 Finally, we have a typedef for the bytes in a buffer. This is a
8786 @code{Bufbyte}, which is an unsigned char. Referring to them as
8787 Bufbytes underscores the fact that we are working with a string of bytes
8788 in the internal Emacs buffer representation rather than in one of a
8789 number of possible alternative representations (e.g. EUC-encoded text,
8793 @section Buffer Lists
8794 @cindex buffer lists
8796 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
8797 they remain around until explicitly deleted. This entails that there is
8798 a list of all the buffers in existence. This list is actually an
8799 assoc-list (mapping from the buffer's name to the buffer) and is stored
8800 in the global variable @code{Vbuffer_alist}.
8802 The order of the buffers in the list is important: the buffers are
8803 ordered approximately from most-recently-used to least-recently-used.
8804 Switching to a buffer using @code{switch-to-buffer},
8805 @code{pop-to-buffer}, etc. and switching windows using
8806 @code{other-window}, etc. usually brings the new current buffer to the
8807 front of the list. @code{switch-to-buffer}, @code{other-buffer},
8808 etc. look at the beginning of the list to find an alternative buffer to
8809 suggest. You can also explicitly move a buffer to the end of the list
8810 using @code{bury-buffer}.
8812 In addition to the global ordering in @code{Vbuffer_alist}, each frame
8813 has its own ordering of the list. These lists always contain the same
8814 elements as in @code{Vbuffer_alist} although possibly in a different
8815 order. @code{buffer-list} normally returns the list for the selected
8816 frame. This allows you to work in separate frames without things
8817 interfering with each other.
8819 The standard way to look up a buffer given a name is
8820 @code{get-buffer}, and the standard way to create a new buffer is
8821 @code{get-buffer-create}, which looks up a buffer with a given name,
8822 creating a new one if necessary. These operations correspond exactly
8823 with the symbol operations @code{intern-soft} and @code{intern},
8824 respectively. You can also force a new buffer to be created using
8825 @code{generate-new-buffer}, which takes a name and (if necessary) makes
8826 a unique name from this by appending a number, and then creates the
8827 buffer. This is basically like the symbol operation @code{gensym}.
8829 @node Markers and Extents
8830 @section Markers and Extents
8831 @cindex markers and extents
8832 @cindex extents, markers and
8834 Among the things associated with a buffer are things that are
8835 logically attached to certain buffer positions. This can be used to
8836 keep track of a buffer position when text is inserted and deleted, so
8837 that it remains at the same spot relative to the text around it; to
8838 assign properties to particular sections of text; etc. There are two
8839 such objects that are useful in this regard: they are @dfn{markers} and
8842 A @dfn{marker} is simply a flag placed at a particular buffer
8843 position, which is moved around as text is inserted and deleted.
8844 Markers are used for all sorts of purposes, such as the @code{mark} that
8845 is the other end of textual regions to be cut, copied, etc.
8847 An @dfn{extent} is similar to two markers plus some associated
8848 properties, and is used to keep track of regions in a buffer as text is
8849 inserted and deleted, and to add properties (e.g. fonts) to particular
8850 regions of text. The external interface of extents is explained
8853 The important thing here is that markers and extents simply contain
8854 buffer positions in them as integers, and every time text is inserted or
8855 deleted, these positions must be updated. In order to minimize the
8856 amount of shuffling that needs to be done, the positions in markers and
8857 extents (there's one per marker, two per extent) are stored in Meminds.
8858 This means that they only need to be moved when the text is physically
8859 moved in memory; since the gap structure tries to minimize this, it also
8860 minimizes the number of marker and extent indices that need to be
8861 adjusted. Look in @file{insdel.c} for the details of how this works.
8863 One other important distinction is that markers are @dfn{temporary}
8864 while extents are @dfn{permanent}. This means that markers disappear as
8865 soon as there are no more pointers to them, and correspondingly, there
8866 is no way to determine what markers are in a buffer if you are just
8867 given the buffer. Extents remain in a buffer until they are detached
8868 (which could happen as a result of text being deleted) or the buffer is
8869 deleted, and primitives do exist to enumerate the extents in a buffer.
8871 @node Bufbytes and Emchars
8872 @section Bufbytes and Emchars
8873 @cindex Bufbytes and Emchars
8874 @cindex Emchars, Bufbytes and
8878 @node The Buffer Object
8879 @section The Buffer Object
8880 @cindex buffer object, the
8881 @cindex object, the buffer
8883 Buffers contain fields not directly accessible by the Lisp programmer.
8884 We describe them here, naming them by the names used in the C code.
8885 Many are accessible indirectly in Lisp programs via Lisp primitives.
8889 The buffer name is a string that names the buffer. It is guaranteed to
8890 be unique. @xref{Buffer Names,,, lispref, SXEmacs Lisp Reference
8894 This field contains the time when the buffer was last saved, as an
8895 integer. @xref{Buffer Modification,,, lispref, SXEmacs Lisp Reference
8899 This field contains the modification time of the visited file. It is
8900 set when the file is written or read. Every time the buffer is written
8901 to the file, this field is compared to the modification time of the
8902 file. @xref{Buffer Modification,,, lispref, SXEmacs Lisp Reference
8905 @item auto_save_modified
8906 This field contains the time when the buffer was last auto-saved.
8908 @item last_window_start
8909 This field contains the @code{window-start} position in the buffer as of
8910 the last time the buffer was displayed in a window.
8913 This field points to the buffer's undo list. @xref{Undo,,, lispref,
8914 SXEmacs Lisp Reference Manual}.
8916 @item syntax_table_v
8917 This field contains the syntax table for the buffer. @xref{Syntax
8918 Tables,,, lispref, SXEmacs Lisp Reference Manual}.
8920 @item downcase_table
8921 This field contains the conversion table for converting text to lower
8922 case. @xref{Case Tables,,, lispref, SXEmacs Lisp Reference Manual}.
8925 This field contains the conversion table for converting text to upper
8926 case. @xref{Case Tables,,, lispref, SXEmacs Lisp Reference Manual}.
8928 @item case_canon_table
8929 This field contains the conversion table for canonicalizing text for
8930 case-folding search. @xref{Case Tables,,, lispref, SXEmacs Lisp
8933 @item case_eqv_table
8934 This field contains the equivalence table for case-folding search.
8935 @xref{Case Tables,,, lispref, SXEmacs Lisp Reference Manual}.
8938 This field contains the buffer's display table, or @code{nil} if it
8939 doesn't have one. @xref{Display Tables,,, lispref, SXEmacs Lisp
8943 This field contains the chain of all markers that currently point into
8944 the buffer. Deletion of text in the buffer, and motion of the buffer's
8945 gap, must check each of these markers and perhaps update it.
8946 @xref{Markers,,, lispref, SXEmacs Lisp Reference Manual}.
8949 This field is a flag that tells whether a backup file has been made for
8950 the visited file of this buffer.
8953 This field contains the mark for the buffer. The mark is a marker,
8954 hence it is also included on the list @code{markers}. @xref{The Mark,,,
8955 lispref, SXEmacs Lisp Reference Manual}.
8958 This field is non-@code{nil} if the buffer's mark is active.
8960 @item local_var_alist
8961 This field contains the association list describing the variables local
8962 in this buffer, and their values, with the exception of local variables
8963 that have special slots in the buffer object. (Those slots are omitted
8964 from this table.) @xref{Buffer-Local Variables,,, lispref, SXEmacs Lisp
8967 @item modeline_format
8968 This field contains a Lisp object which controls how to display the mode
8969 line for this buffer. @xref{Modeline Format,,, lispref, SXEmacs Lisp
8973 This field holds the buffer's base buffer (if it is an indirect buffer),
8977 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
8978 @chapter MULE Character Sets and Encodings
8979 @cindex Mule character sets and encodings
8980 @cindex character sets and encodings, Mule
8981 @cindex encodings, Mule character sets and
8983 Recall that there are two primary ways that text is represented in
8984 SXEmacs. The @dfn{buffer} representation sees the text as a series of
8985 bytes (Bufbytes), with a variable number of bytes used per character.
8986 The @dfn{character} representation sees the text as a series of integers
8987 (Emchars), one per character. The character representation is a cleaner
8988 representation from a theoretical standpoint, and is thus used in many
8989 cases when lots of manipulations on a string need to be done. However,
8990 the buffer representation is the standard representation used in both
8991 Lisp strings and buffers, and because of this, it is the ``default''
8992 representation that text comes in. The reason for using this
8993 representation is that it's compact and is compatible with ASCII.
8998 * Internal Mule Encodings::
9002 @node Character Sets
9003 @section Character Sets
9004 @cindex character sets
9006 A character set (or @dfn{charset}) is an ordered set of characters. A
9007 particular character in a charset is indexed using one or more
9008 @dfn{position codes}, which are non-negative integers. The number of
9009 position codes needed to identify a particular character in a charset is
9010 called the @dfn{dimension} of the charset. In SXEmacs/Mule, all charsets
9011 have dimension 1 or 2, and the size of all charsets (except for a few
9012 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of
9013 position codes used to index characters from any of these types of
9014 character sets is as follows:
9017 Charset type Position code 1 Position code 2
9018 ------------------------------------------------------------
9021 94x94 33 - 126 33 - 126
9022 96x96 32 - 127 32 - 127
9025 Note that in the above cases position codes do not start at an
9026 expected value such as 0 or 1. The reason for this will become clear
9029 For example, Latin-1 is a 96-character charset, and JISX0208 (the
9030 Japanese national character set) is a 94x94-character charset.
9032 [Note that, although the ranges above define the @emph{valid} position
9033 codes for a charset, some of the slots in a particular charset may in
9034 fact be empty. This is the case for JISX0208, for example, where (e.g.)
9035 all the slots whose first position code is in the range 118 - 127 are
9038 There are three charsets that do not follow the above rules. All of
9039 them have one dimension, and have ranges of position codes as follows:
9042 Charset name Position code 1
9043 ------------------------------------
9046 Composite 0 - some large number
9049 (The upper bound of the position code for composite characters has not
9050 yet been determined, but it will probably be at least 16,383).
9052 ASCII is the union of two subsidiary character sets: Printing-ASCII
9053 (the printing ASCII character set, consisting of position codes 33 -
9054 126, like for a standard 94-character charset) and Control-ASCII (the
9055 non-printing characters that would appear in a binary file with codes 0
9058 Control-1 contains the non-printing characters that would appear in a
9059 binary file with codes 128 - 159.
9061 Composite contains characters that are generated by overstriking one
9062 or more characters from other charsets.
9064 Note that some characters in ASCII, and all characters in Control-1,
9065 are @dfn{control} (non-printing) characters. These have no printed
9066 representation but instead control some other function of the printing
9067 (e.g. TAB or 8 moves the current character position to the next tab
9068 stop). All other characters in all charsets are @dfn{graphic}
9069 (printing) characters.
9071 When a binary file is read in, the bytes in the file are assigned to
9072 character sets as follows:
9075 Bytes Character set Range
9076 --------------------------------------------------
9077 0 - 127 ASCII 0 - 127
9078 128 - 159 Control-1 0 - 31
9079 160 - 255 Latin-1 32 - 127
9082 This is a bit ad-hoc but gets the job done.
9086 @cindex encodings, Mule
9087 @cindex Mule encodings
9089 An @dfn{encoding} is a way of numerically representing characters from
9090 one or more character sets. If an encoding only encompasses one
9091 character set, then the position codes for the characters in that
9092 character set could be used directly. This is not possible, however, if
9093 more than one character set is to be used in the encoding.
9095 For example, the conversion detailed above between bytes in a binary
9096 file and characters is effectively an encoding that encompasses the
9097 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
9100 Thus, an encoding can be viewed as a way of encoding characters from a
9101 specified group of character sets using a stream of bytes, each of which
9102 contains a fixed number of bits (but not necessarily 8, as in the common
9105 Here are descriptions of a couple of common
9109 * Japanese EUC (Extended Unix Code)::
9113 @node Japanese EUC (Extended Unix Code)
9114 @subsection Japanese EUC (Extended Unix Code)
9115 @cindex Japanese EUC (Extended Unix Code)
9116 @cindex EUC (Extended Unix Code), Japanese
9117 @cindex Extended Unix Code, Japanese EUC
9119 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
9120 and Japanese-JISX0208-Kana (half-width katakana, the right half of
9121 JISX0201). It uses 8-bit bytes.
9123 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
9124 charsets, while Japanese-JISX0208 is a 94x94-character charset.
9126 The encoding is as follows:
9129 Character set Representation (PC=position-code)
9130 ------------- --------------
9132 Japanese-JISX0201-Kana 0x8E | PC1 + 0x80
9133 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80
9134 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80
9142 This encompasses the character sets Printing-ASCII,
9143 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
9144 is very similar to Printing-ASCII and is a 94-character charset),
9145 Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes.
9147 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
9148 means that there are multiple states that the encoding can
9149 be in, which affect how the bytes are to be interpreted.
9150 Special sequences of bytes (called @dfn{escape sequences})
9151 are used to change states.
9153 The encoding is as follows:
9156 Character set Representation (PC=position-code)
9157 ------------- --------------
9159 Japanese-JISX0201-Roman PC1
9160 Japanese-JISX0201-Kana PC1
9161 Japanese-JISX0208 PC1 PC2
9164 Escape sequence ASCII equivalent Meaning
9165 --------------- ---------------- -------
9166 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman
9167 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana
9168 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208
9169 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
9172 Initially, Printing-ASCII is invoked.
9174 @node Internal Mule Encodings
9175 @section Internal Mule Encodings
9176 @cindex internal Mule encodings
9177 @cindex Mule encodings, internal
9178 @cindex encodings, internal Mule
9180 In SXEmacs/Mule, each character set is assigned a unique number, called a
9181 @dfn{leading byte}. This is used in the encodings of a character.
9182 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
9183 a leading byte of 0), although some leading bytes are reserved.
9185 Charsets whose leading byte is in the range 0x80 - 0x9F are called
9186 @dfn{official} and are used for built-in charsets. Other charsets are
9187 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
9188 these are user-defined charsets.
9193 Character set Leading byte
9194 ------------- ------------
9197 Dimension-1 Official 0x81 - 0x8D
9200 Dimension-2 Official 0x90 - 0x99
9201 (0x9A - 0x9D are free;
9202 0x9E and 0x9F are reserved)
9203 Dimension-1 Private 0xA0 - 0xEF
9204 Dimension-2 Private 0xF0 - 0xFF
9207 There are two internal encodings for characters in SXEmacs/Mule. One is
9208 called @dfn{string encoding} and is an 8-bit encoding that is used for
9209 representing characters in a buffer or string. It uses 1 to 4 bytes per
9210 character. The other is called @dfn{character encoding} and is a 19-bit
9211 encoding that is used for representing characters individually in a
9214 (In the following descriptions, we'll ignore composite characters for
9215 the moment. We also give a general (structural) overview first,
9216 followed later by the exact details.)
9219 * Internal String Encoding::
9220 * Internal Character Encoding::
9223 @node Internal String Encoding
9224 @subsection Internal String Encoding
9225 @cindex internal string encoding
9226 @cindex string encoding, internal
9227 @cindex encoding, internal string
9229 ASCII characters are encoded using their position code directly. Other
9230 characters are encoded using their leading byte followed by their
9231 position code(s) with the high bit set. Characters in private character
9232 sets have their leading byte prefixed with a @dfn{leading byte prefix},
9233 which is either 0x9E or 0x9F. (No character sets are ever assigned these
9234 leading bytes.) Specifically:
9237 Character set Encoding (PC=position-code, LB=leading-byte)
9238 ------------- --------
9240 Control-1 LB | PC1 + 0xA0 |
9241 Dimension-1 official LB | PC1 + 0x80 |
9242 Dimension-1 private 0x9E | LB | PC1 + 0x80 |
9243 Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 |
9244 Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80
9247 The basic characteristic of this encoding is that the first byte
9248 of all characters is in the range 0x00 - 0x9F, and the second and
9249 following bytes of all characters is in the range 0xA0 - 0xFF.
9250 This means that it is impossible to get out of sync, or more
9255 Given any byte position, the beginning of the character it is
9256 within can be determined in constant time.
9258 Given any byte position at the beginning of a character, the
9259 beginning of the next character can be determined in constant
9262 Given any byte position at the beginning of a character, the
9263 beginning of the previous character can be determined in constant
9266 Textual searches can simply treat encoded strings as if they
9267 were encoded in a one-byte-per-character fashion rather than
9268 the actual multi-byte encoding.
9271 None of the standard non-modal encodings meet all of these
9272 conditions. For example, EUC satisfies only (2) and (3), while
9273 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
9274 non-modal encodings must satisfy (2), in order to be unambiguous.)
9276 @node Internal Character Encoding
9277 @subsection Internal Character Encoding
9278 @cindex internal character encoding
9279 @cindex character encoding, internal
9280 @cindex encoding, internal character
9282 One 19-bit word represents a single character. The word is
9283 separated into three fields:
9286 Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
9287 <------------> <------------------> <------------------>
9291 Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
9294 Character set Field 1 Field 2 Field 3
9295 ------------- ------- ------- -------
9300 Dimension-1 official 0 LB - 0x80 PC1
9301 range: (01 - 0D) (20 - 7F)
9302 Dimension-1 private 0 LB - 0x80 PC1
9303 range: (20 - 6F) (20 - 7F)
9304 Dimension-2 official LB - 0x8F PC1 PC2
9305 range: (01 - 0A) (20 - 7F) (20 - 7F)
9306 Dimension-2 private LB - 0xE1 PC1 PC2
9307 range: (0F - 1E) (20 - 7F) (20 - 7F)
9311 Note that character codes 0 - 255 are the same as the ``binary encoding''
9320 CCL_PROGRAM := (CCL_MAIN_BLOCK
9323 CCL_MAIN_BLOCK := CCL_BLOCK
9324 CCL_EOF_BLOCK := CCL_BLOCK
9326 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
9328 SET | IF | BRANCH | LOOP | REPEAT | BREAK
9331 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
9334 EXPRESSION := ARG | (EXPRESSION OP ARG)
9336 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
9337 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
9338 LOOP := (loop STATEMENT [STATEMENT ...])
9341 | (write-repeat [REG | INT-OR-CHAR | string])
9342 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
9343 READ := (read REG) | (read REG REG)
9344 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
9345 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
9346 WRITE := (write REG) | (write REG REG)
9347 | (write INT-OR-CHAR) | (write STRING) | STRING
9351 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
9352 ARG := REG | INT-OR-CHAR
9353 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
9354 | < | > | == | <= | >= | !=
9356 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
9357 ARRAY := '[' INT-OR-CHAR ... ']'
9358 INT-OR-CHAR := INT | CHAR
9362 The machine code consists of a vector of 32-bit words.
9363 The first such word specifies the start of the EOF section of the code;
9364 this is the code executed to handle any stuff that needs to be done
9365 (e.g. designating back to ASCII and left-to-right mode) after all
9366 other encoded/decoded data has been written out. This is not used for
9367 charset CCL programs.
9369 REGISTER: 0..7 -- referred by RRR or rrr
9371 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
9372 TTTTT (5-bit): operator type
9373 RRR (3-bit): register number
9374 XXXXXXXXXXXXXXXX (15-bit):
9375 CCCCCCCCCCCCCCC: constant or address
9376 000000000000rrr: register number
9403 OPERATORS: TTTTT RRR XX..
9405 SetCS: 00000 RRR C...C RRR = C...C
9406 SetCL: 00001 RRR ..... RRR = c...c
9408 SetR: 00010 RRR ..rrr RRR = rrr
9409 SetA: 00011 RRR ..rrr RRR = array[rrr]
9410 C.............C size of array = C...C
9411 c.............c contents = c...c
9413 Jump: 00100 000 c...c jump to c...c
9414 JumpCond: 00101 RRR c...c if (!RRR) jump to c...c
9415 WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c
9416 WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c
9417 WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c
9419 WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR,
9420 C.............C and jump to c...c
9421 WriteSJump: 01010 000 c...c WriteS, jump to c...c
9425 WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c
9429 WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c
9430 C.............C size of array = C...C
9431 c.............c contents = c...c
9433 Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..)
9434 c.............c branch to (RRR+1)th address
9435 Read1: 01110 RRR ... read 1-byte to RRR
9436 Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr
9437 ReadBranch: 10000 RRR C...C Read1 and Branch
9440 Write1: 10001 RRR ..... write 1-byte RRR
9441 Write2: 10010 RRR ..rrr write 2-byte RRR and rrr
9442 WriteC: 10011 000 ..... write 1-char C...CC
9444 WriteS: 10100 000 ..... write C..-byte of string
9448 WriteA: 10101 RRR ..... write array[RRR]
9449 C.............C size of array = C...C
9450 c.............c contents = c...c
9452 End: 10110 000 ..... terminate the execution
9454 SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C
9456 SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c
9459 SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr
9461 SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c
9464 SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr
9467 JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c
9470 JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c
9473 ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC
9476 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
9481 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
9482 @chapter The Lisp Reader and Compiler
9483 @cindex Lisp reader and compiler, the
9484 @cindex reader and compiler, the Lisp
9485 @cindex compiler, the Lisp reader and
9489 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
9493 An @dfn{lstream} is an internal Lisp object that provides a generic
9494 buffering stream implementation. Conceptually, you send data to the
9495 stream or read data from the stream, not caring what's on the other end
9496 of the stream. The other end could be another stream, a file
9497 descriptor, a stdio stream, a fixed block of memory, a reallocating
9498 block of memory, etc. The main purpose of the stream is to provide a
9499 standard interface and to do buffering. Macros are defined to read or
9500 write characters, so the calling functions do not have to worry about
9501 blocking data together in order to achieve efficiency.
9504 * Creating an Lstream:: Creating an lstream object.
9505 * Lstream Types:: Different sorts of things that are streamed.
9506 * Lstream Functions:: Functions for working with lstreams.
9507 * Lstream Methods:: Creating new lstream types.
9510 @node Creating an Lstream
9511 @section Creating an Lstream
9512 @cindex lstream, creating an
9514 Lstreams come in different types, depending on what is being interfaced
9515 to. Although the primitive for creating new lstreams is
9516 @code{Lstream_new()}, generally you do not call this directly. Instead,
9517 you call some type-specific creation function, which creates the lstream
9518 and initializes it as appropriate for the particular type.
9520 All lstream creation functions take a @var{mode} argument, specifying
9521 what mode the lstream should be opened as. This controls whether the
9522 lstream is for input and output, and optionally whether data should be
9523 blocked up in units of MULE characters. Note that some types of
9524 lstreams can only be opened for input; others only for output; and
9525 others can be opened either way. #### Richard Mlynarik thinks that
9526 there should be a strict separation between input and output streams,
9527 and he's probably right.
9529 @var{mode} is a string, one of
9537 Open for reading, but ``read'' never returns partial MULE characters.
9539 Open for writing, but never writes partial MULE characters.
9543 @section Lstream Types
9544 @cindex lstream types
9545 @cindex types, lstream
9556 @item resizing-buffer
9569 @node Lstream Functions
9570 @section Lstream Functions
9571 @cindex lstream functions
9573 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
9574 Allocate and return a new Lstream. This function is not really meant to
9575 be called directly; rather, each stream type should provide its own
9576 stream creation function, which creates the stream and does any other
9577 necessary creation stuff (e.g. opening a file).
9580 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
9581 Change the buffering of a stream. See @file{lstream.h}. By default the
9582 buffering is @code{STREAM_BLOCK_BUFFERED}.
9585 @deftypefun int Lstream_flush (Lstream *@var{lstr})
9586 Flush out any pending unwritten data in the stream. Clear any buffered
9587 input data. Returns 0 on success, -1 on error.
9590 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
9591 Write out one byte to the stream. This is a macro and so it is very
9592 efficient. The @var{c} argument is only evaluated once but the @var{stream}
9593 argument is evaluated more than once. Returns 0 on success, -1 on
9597 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
9598 Read one byte from the stream. This is a macro and so it is very
9599 efficient. The @var{stream} argument is evaluated more than once. Return
9600 value is -1 for EOF or error.
9603 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
9604 Push one byte back onto the input queue. This will be the next byte
9605 read from the stream. Any number of bytes can be pushed back and will
9606 be read in the reverse order they were pushed back---most recent
9607 first. (This is necessary for consistency---if there are a number of
9608 bytes that have been unread and I read and unread a byte, it needs to be
9609 the first to be read again.) This is a macro and so it is very
9610 efficient. The @var{c} argument is only evaluated once but the @var{stream}
9611 argument is evaluated more than once.
9614 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
9615 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
9616 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
9617 Function equivalents of the above macros.
9620 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
9621 Read @var{size} bytes of @var{data} from the stream. Return the number
9622 of bytes read. 0 means EOF. -1 means an error occurred and no bytes
9626 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
9627 Write @var{size} bytes of @var{data} to the stream. Return the number
9628 of bytes written. -1 means an error occurred and no bytes were written.
9631 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
9632 Push back @var{size} bytes of @var{data} onto the input queue. The next
9633 call to @code{Lstream_read()} with the same size will read the same
9634 bytes back. Note that this will be the case even if there is other
9635 pending unread data.
9638 @deftypefun int Lstream_close (Lstream *@var{stream})
9639 Close the stream. All data will be flushed out.
9642 @deftypefun void Lstream_reopen (Lstream *@var{stream})
9643 Reopen a closed stream. This enables I/O on it again. This is not
9644 meant to be called except from a wrapper routine that reinitializes
9645 variables and such---the close routine may well have freed some
9646 necessary storage structures, for example.
9649 @deftypefun void Lstream_rewind (Lstream *@var{stream})
9650 Rewind the stream to the beginning.
9653 @node Lstream Methods
9654 @section Lstream Methods
9655 @cindex lstream methods
9657 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
9658 Read some data from the stream's end and store it into @var{data}, which
9659 can hold @var{size} bytes. Return the number of bytes read. A return
9660 value of 0 means no bytes can be read at this time. This may be because
9661 of an EOF, or because there is a granularity greater than one byte that
9662 the stream imposes on the returned data, and @var{size} is less than
9663 this granularity. (This will happen frequently for streams that need to
9664 return whole characters, because @code{Lstream_read()} calls the reader
9665 function repeatedly until it has the number of bytes it wants or until 0
9666 is returned.) The lstream functions do not treat a 0 return as EOF or
9667 do anything special; however, the calling function will interpret any 0
9668 it gets back as EOF. This will normally not happen unless the caller
9669 calls @code{Lstream_read()} with a very small size.
9671 This function can be @code{NULL} if the stream is output-only.
9674 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
9675 Send some data to the stream's end. Data to be sent is in @var{data}
9676 and is @var{size} bytes. Return the number of bytes sent. This
9677 function can send and return fewer bytes than is passed in; in that
9678 case, the function will just be called again until there is no data left
9679 or 0 is returned. A return value of 0 means that no more data can be
9680 currently stored, but there is no error; the data will be squirreled
9681 away until the writer can accept data. (This is useful, e.g., if you're
9682 dealing with a non-blocking file descriptor and are getting
9683 @code{EWOULDBLOCK} errors.) This function can be @code{NULL} if the
9684 stream is input-only.
9687 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
9688 Rewind the stream. If this is @code{NULL}, the stream is not seekable.
9691 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
9692 Indicate whether this stream is seekable---i.e. it can be rewound.
9693 This method is ignored if the stream does not have a rewind method. If
9694 this method is not present, the result is determined by whether a rewind
9698 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
9699 Perform any additional operations necessary to flush the data in this
9703 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
9706 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
9707 Perform any additional operations necessary to close this stream down.
9708 May be @code{NULL}. This function is called when @code{Lstream_close()}
9709 is called or when the stream is garbage-collected. When this function
9710 is called, all pending data in the stream will already have been written
9714 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
9715 Mark this object for garbage collection. Same semantics as a standard
9716 @code{Lisp_Object} marker. This function can be @code{NULL}.
9719 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
9720 @chapter Consoles; Devices; Frames; Windows
9721 @cindex consoles; devices; frames; windows
9722 @cindex devices; frames; windows, consoles;
9723 @cindex frames; windows, consoles; devices;
9724 @cindex windows, consoles; devices; frames;
9727 * Introduction to Consoles; Devices; Frames; Windows::
9729 * Window Hierarchy::
9730 * The Window Object::
9733 @node Introduction to Consoles; Devices; Frames; Windows
9734 @section Introduction to Consoles; Devices; Frames; Windows
9735 @cindex consoles; devices; frames; windows, introduction to
9736 @cindex devices; frames; windows, introduction to consoles;
9737 @cindex frames; windows, introduction to consoles; devices;
9738 @cindex windows, introduction to consoles; devices; frames;
9740 A window-system window that you see on the screen is called a
9741 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or
9742 more non-overlapping panes, called (confusingly) @dfn{windows}. Each
9743 window displays the text of a buffer in it. (See above on Buffers.) Note
9744 that buffers and windows are independent entities: Two or more windows
9745 can be displaying the same buffer (potentially in different locations),
9746 and a buffer can be displayed in no windows.
9748 A single display screen that contains one or more frames is called
9749 a @dfn{display}. Under most circumstances, there is only one display.
9750 However, more than one display can exist, for example if you have
9751 a @dfn{multi-headed} console, i.e. one with a single keyboard but
9752 multiple displays. (Typically in such a situation, the various
9753 displays act like one large display, in that the mouse is only
9754 in one of them at a time, and moving the mouse off of one moves
9755 it into another.) In some cases, the different displays will
9756 have different characteristics, e.g. one color and one mono.
9758 SXEmacs can display frames on multiple displays. It can even deal
9759 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
9760 SXEmacs terminology). Here is one case where this might be useful: You
9761 are using SXEmacs on your workstation at work, and leave it running.
9762 Then you go home and dial in on a TTY line, and you can use the
9763 already-running SXEmacs process to display another frame on your local
9766 Thus, there is a hierarchy console -> display -> frame -> window.
9767 There is a separate Lisp object type for each of these four concepts.
9768 Furthermore, there is logically a @dfn{selected console},
9769 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
9770 Each of these objects is distinguished in various ways, such as being the
9771 default object for various functions that act on objects of that type.
9772 Note that every containing object remembers the ``selected'' object
9773 among the objects that it contains: e.g. not only is there a selected
9774 window, but every frame remembers the last window in it that was
9775 selected, and changing the selected frame causes the remembered window
9776 within it to become the selected window. Similar relationships apply
9777 for consoles to devices and devices to frames.
9783 Recall that every buffer has a current insertion position, called
9784 @dfn{point}. Now, two or more windows may be displaying the same buffer,
9785 and the text cursor in the two windows (i.e. @code{point}) can be in
9786 two different places. You may ask, how can that be, since each
9787 buffer has only one value of @code{point}? The answer is that each window
9788 also has a value of @code{point} that is squirreled away in it. There
9789 is only one selected window, and the value of ``point'' in that buffer
9790 corresponds to that window. When the selected window is changed
9791 from one window to another displaying the same buffer, the old
9792 value of @code{point} is stored into the old window's ``point'' and the
9793 value of @code{point} from the new window is retrieved and made the
9794 value of @code{point} in the buffer. This means that @code{window-point}
9795 for the selected window is potentially inaccurate, and if you
9796 want to retrieve the correct value of @code{point} for a window,
9797 you must special-case on the selected window and retrieve the
9798 buffer's point instead. This is related to why @code{save-window-excursion}
9799 does not save the selected window's value of @code{point}.
9801 @node Window Hierarchy
9802 @section Window Hierarchy
9803 @cindex window hierarchy
9804 @cindex hierarchy of windows
9806 If a frame contains multiple windows (panes), they are always created
9807 by splitting an existing window along the horizontal or vertical axis.
9808 Terminology is a bit confusing here: to @dfn{split a window
9809 horizontally} means to create two side-by-side windows, i.e. to make a
9810 @emph{vertical} cut in a window. Likewise, to @dfn{split a window
9811 vertically} means to create two windows, one above the other, by making
9812 a @emph{horizontal} cut.
9814 If you split a window and then split again along the same axis, you
9815 will end up with a number of panes all arranged along the same axis.
9816 The precise way in which the splits were made should not be important,
9817 and this is reflected internally. Internally, all windows are arranged
9818 in a tree, consisting of two types of windows, @dfn{combination} windows
9819 (which have children, and are covered completely by those children) and
9820 @dfn{leaf} windows, which have no children and are visible. Every
9821 combination window has two or more children, all arranged along the same
9822 axis. There are (logically) two subtypes of windows, depending on
9823 whether their children are horizontally or vertically arrayed. There is
9824 always one root window, which is either a leaf window (if the frame
9825 contains only one window) or a combination window (if the frame contains
9826 more than one window). In the latter case, the root window will have
9827 two or more children, either horizontally or vertically arrayed, and
9828 each of those children will be either a leaf window or another
9831 Here are some rules:
9835 Horizontal combination windows can never have children that are
9836 horizontal combination windows; same for vertical.
9839 Only leaf windows can be split (obviously) and this splitting does one
9840 of two things: (a) turns the leaf window into a combination window and
9841 creates two new leaf children, or (b) turns the leaf window into one of
9842 the two new leaves and creates the other leaf. Rule (1) dictates which
9843 of these two outcomes happens.
9846 Every combination window must have at least two children.
9849 Leaf windows can never become combination windows. They can be deleted,
9850 however. If this results in a violation of (3), the parent combination
9851 window also gets deleted.
9854 All functions that accept windows must be prepared to accept combination
9855 windows, and do something sane (e.g. signal an error if so).
9856 Combination windows @emph{do} escape to the Lisp level.
9859 All windows have three fields governing their contents:
9860 these are @dfn{hchild} (a list of horizontally-arrayed children),
9861 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
9862 (the buffer contained in a leaf window). Exactly one of
9863 these will be non-@code{nil}. Remember that @dfn{horizontally-arrayed}
9864 means ``side-by-side'' and @dfn{vertically-arrayed} means
9865 @dfn{one above the other}.
9868 Leaf windows also have markers in their @code{start} (the
9869 first buffer position displayed in the window) and @code{pointm}
9870 (the window's stashed value of @code{point}---see above) fields,
9871 while combination windows have @code{nil} in these fields.
9874 The list of children for a window is threaded through the
9875 @code{next} and @code{prev} fields of each child window.
9878 @strong{Deleted windows can be undeleted}. This happens as a result of
9879 restoring a window configuration, and is unlike frames, displays, and
9880 consoles, which, once deleted, can never be restored. Deleting a window
9881 does nothing except set a special @code{dead} bit to 1 and clear out the
9882 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
9886 Most frames actually have two top-level windows---one for the
9887 minibuffer and one (the @dfn{root}) for everything else. The modeline
9888 (if present) separates these two. The @code{next} field of the root
9889 points to the minibuffer, and the @code{prev} field of the minibuffer
9890 points to the root. The other @code{next} and @code{prev} fields are
9891 @code{nil}, and the frame points to both of these windows.
9892 Minibuffer-less frames have no minibuffer window, and the @code{next}
9893 and @code{prev} of the root window are @code{nil}. Minibuffer-only
9894 frames have no root window, and the @code{next} of the minibuffer window
9895 is @code{nil} but the @code{prev} points to itself. (#### This is an
9896 artifact that should be fixed.)
9899 @node The Window Object
9900 @section The Window Object
9901 @cindex window object, the
9902 @cindex object, the window
9904 Windows have the following accessible fields:
9908 The frame that this window is on.
9911 Non-@code{nil} if this window is a minibuffer window.
9914 The buffer that the window is displaying. This may change often during
9915 the life of the window.
9918 Non-@code{nil} if this window is dedicated to its buffer.
9921 @cindex window point internals
9922 This is the value of point in the current buffer when this window is
9923 selected; when it is not selected, it retains its previous value.
9926 The position in the buffer that is the first character to be displayed
9930 If this flag is non-@code{nil}, it says that the window has been
9931 scrolled explicitly by the Lisp program. This affects what the next
9932 redisplay does if point is off the screen: instead of scrolling the
9933 window to show the text around point, it moves point to a location that
9937 The @code{modified} field of the window's buffer, as of the last time
9938 a redisplay completed in this window.
9941 The buffer's value of point, as of the last time
9942 a redisplay completed in this window.
9945 This is the left-hand edge of the window, measured in columns. (The
9946 leftmost column on the screen is @w{column 0}.)
9949 This is the top edge of the window, measured in lines. (The top line on
9950 the screen is @w{line 0}.)
9953 The height of the window, measured in lines.
9956 The width of the window, measured in columns.
9959 This is the window that is the next in the chain of siblings. It is
9960 @code{nil} in a window that is the rightmost or bottommost of a group of
9964 This is the window that is the previous in the chain of siblings. It is
9965 @code{nil} in a window that is the leftmost or topmost of a group of
9969 Internally, SXEmacs arranges windows in a tree; each group of siblings has
9970 a parent window whose area includes all the siblings. This field points
9971 to a window's parent.
9973 Parent windows do not display buffers, and play little role in display
9974 except to shape their child windows. Emacs Lisp programs usually have
9975 no access to the parent windows; they operate on the windows at the
9976 leaves of the tree, which actually display buffers.
9979 This is the number of columns that the display in the window is scrolled
9980 horizontally to the left. Normally, this is 0.
9983 This is the last time that the window was selected. The function
9984 @code{get-lru-window} uses this field.
9987 The window's display table, or @code{nil} if none is specified for it.
9989 @item update_mode_line
9990 Non-@code{nil} means this window's mode line needs to be updated.
9992 @item base_line_number
9993 The line number of a certain position in the buffer, or @code{nil}.
9994 This is used for displaying the line number of point in the mode line.
9997 The position in the buffer for which the line number is known, or
9998 @code{nil} meaning none is known.
10000 @item region_showing
10001 If the region (or part of it) is highlighted in this window, this field
10002 holds the mark position that made one end of that region. Otherwise,
10003 this field is @code{nil}.
10006 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
10007 @chapter The Redisplay Mechanism
10008 @cindex redisplay mechanism, the
10010 The redisplay mechanism is one of the most complicated sections of
10011 SXEmacs, especially from a conceptual standpoint. This is doubly so
10012 because, unlike for the basic aspects of the Lisp interpreter, the
10013 computer science theories of how to efficiently handle redisplay are not
10016 When working with the redisplay mechanism, remember the Golden Rules
10021 It Is Better To Be Correct Than Fast.
10023 Thou Shalt Not Run Elisp From Within Redisplay.
10025 It Is Better To Be Fast Than Not To Be.
10029 * Critical Redisplay Sections::
10030 * Line Start Cache::
10031 * Redisplay Piece by Piece::
10034 @node Critical Redisplay Sections
10035 @section Critical Redisplay Sections
10036 @cindex redisplay sections, critical
10037 @cindex critical redisplay sections
10039 Within this section, we are defenseless and assume that the
10040 following cannot happen:
10046 Lisp code evaluation
10051 We ensure (3) by calling @code{hold_frame_size_changes()}, which
10052 will cause any pending frame size changes to get put on hold
10053 till after the end of the critical section. (1) follows
10054 automatically if (2) is met. #### Unfortunately, there are
10055 some places where Lisp code can be called within this section.
10056 We need to remove them.
10058 If @code{Fsignal()} is called during this critical section, we
10059 will @code{abort()}.
10061 If garbage collection is called during this critical section,
10062 we simply return. #### We should abort instead.
10064 #### If a frame-size change does occur we should probably
10065 actually be preempting redisplay.
10067 @node Line Start Cache
10068 @section Line Start Cache
10069 @cindex line start cache
10071 The traditional scrolling code in Emacs breaks in a variable height
10072 world. It depends on the key assumption that the number of lines that
10073 can be displayed at any given time is fixed. This led to a complete
10074 separation of the scrolling code from the redisplay code. In order to
10075 fully support variable height lines, the scrolling code must actually be
10076 tightly integrated with redisplay. Only redisplay can determine how
10077 many lines will be displayed on a screen for any given starting point.
10079 What is ideally wanted is a complete list of the starting buffer
10080 position for every possible display line of a buffer along with the
10081 height of that display line. Maintaining such a full list would be very
10082 expensive. We settle for having it include information for all areas
10083 which we happen to generate anyhow (i.e. the region currently being
10084 displayed) and for those areas we need to work with.
10086 In order to ensure that the cache accurately represents what redisplay
10087 would actually show, it is necessary to invalidate it in many
10088 situations. If the buffer changes, the starting positions may no longer
10089 be correct. If a face or an extent has changed then the line heights
10090 may have altered. These events happen frequently enough that the cache
10091 can end up being constantly disabled. With this potentially constant
10092 invalidation when is the cache ever useful?
10094 Even if the cache is invalidated before every single usage, it is
10095 necessary. Scrolling often requires knowledge about display lines which
10096 are actually above or below the visible region. The cache provides a
10097 convenient light-weight method of storing this information for multiple
10098 display regions. This knowledge is necessary for the scrolling code to
10099 always obey the First Golden Rule of Redisplay.
10101 If the cache already contains all of the information that the scrolling
10102 routines happen to need so that it doesn't have to go generate it, then
10103 we are able to obey the Third Golden Rule of Redisplay. The first thing
10104 we do to help out the cache is to always add the displayed region. This
10105 region had to be generated anyway, so the cache ends up getting the
10106 information basically for free. In those cases where a user is simply
10107 scrolling around viewing a buffer there is a high probability that this
10108 is sufficient to always provide the needed information. The second
10109 thing we can do is be smart about invalidating the cache.
10111 TODO---Be smart about invalidating the cache. Potential places:
10115 Insertions at end-of-line which don't cause line-wraps do not alter the
10116 starting positions of any display lines. These types of buffer
10117 modifications should not invalidate the cache. This is actually a large
10118 optimization for redisplay speed as well.
10120 Buffer modifications frequently only affect the display of lines at and
10121 below where they occur. In these situations we should only invalidate
10122 the part of the cache starting at where the modification occurs.
10125 In case you're wondering, the Second Golden Rule of Redisplay is not
10128 @node Redisplay Piece by Piece
10129 @section Redisplay Piece by Piece
10130 @cindex redisplay piece by piece
10132 As you can begin to see redisplay is complex and also not well
10133 documented. Chuck no longer works on XEmacs so this section is my take
10134 on the workings of redisplay.
10136 Redisplay happens in three phases:
10140 Determine desired display in area that needs redisplay.
10141 Implemented by @code{redisplay.c}
10143 Compare desired display with current display
10144 Implemented by @code{redisplay-output.c}
10146 Output changes Implemented by @code{redisplay-output.c},
10147 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
10150 Steps 1 and 2 are device-independent and relatively complex. Step 3 is
10151 mostly device-dependent.
10153 Determining the desired display
10155 Display attributes are stored in @code{display_line} structures. Each
10156 @code{display_line} consists of a set of @code{display_block}'s and each
10157 @code{display_block} contains a number of @code{rune}'s. Generally
10158 dynarr's of @code{display_line}'s are held by each window representing
10159 the current display and the desired display.
10161 The @code{display_line} structures are tightly tied to buffers which
10162 presents a problem for redisplay as this connection is bogus for the
10163 modeline. Hence the @code{display_line} generation routines are
10164 duplicated for generating the modeline. This means that the modeline
10165 display code has many bugs that the standard redisplay code does not.
10167 The guts of @code{display_line} generation are in
10168 @code{create_text_block}, which creates a single display line for the
10169 desired locale. This incrementally parses the characters on the current
10170 line and generates redisplay structures for each.
10172 Gutter redisplay is different. Because the data to display is stored in
10173 a string we cannot use @code{create_text_block}. Instead we use
10174 @code{create_text_string_block} which performs the same function as
10175 @code{create_text_block} but for strings. Many of the complexities of
10176 @code{create_text_block} to do with cursor handling and selective
10177 display have been removed.
10179 @node Extents, Faces, The Redisplay Mechanism, Top
10184 * Introduction to Extents:: Extents are ranges over text, with properties.
10185 * Extent Ordering:: How extents are ordered internally.
10186 * Format of the Extent Info:: The extent information in a buffer or string.
10187 * Zero-Length Extents:: A weird special case.
10188 * Mathematics of Extent Ordering:: A rigorous foundation.
10189 * Extent Fragments:: Cached information useful for redisplay.
10192 @node Introduction to Extents
10193 @section Introduction to Extents
10194 @cindex extents, introduction to
10196 Extents are regions over a buffer, with a start and an end position
10197 denoting the region of the buffer included in the extent. In
10198 addition, either end can be closed or open, meaning that the endpoint
10199 is or is not logically included in the extent. Insertion of a character
10200 at a closed endpoint causes the character to go inside the extent;
10201 insertion at an open endpoint causes the character to go outside.
10203 Extent endpoints are stored using memory indices (see @file{insdel.c}),
10204 to minimize the amount of adjusting that needs to be done when
10205 characters are inserted or deleted.
10207 (Formerly, extent endpoints at the gap could be either before or
10208 after the gap, depending on the open/closedness of the endpoint.
10209 The intent of this was to make it so that insertions would
10210 automatically go inside or out of extents as necessary with no
10211 further work needing to be done. It didn't work out that way,
10212 however, and just ended up complexifying and buggifying all the
10215 @node Extent Ordering
10216 @section Extent Ordering
10217 @cindex extent ordering
10219 Extents are compared using memory indices. There are two orderings
10220 for extents and both orders are kept current at all times. The normal
10221 or @dfn{display} order is as follows:
10224 Extent A is ``less than'' extent B,
10225 that is, earlier in the display order,
10226 if: A-start < B-start,
10227 or if: A-start = B-start, and A-end > B-end
10230 So if two extents begin at the same position, the larger of them is the
10231 earlier one in the display order (@code{EXTENT_LESS} is true).
10233 For the e-order, the same thing holds:
10236 Extent A is ``less than'' extent B in e-order,
10237 that is, later in the buffer,
10239 or if: A-end = B-end, and A-start > B-start
10242 So if two extents end at the same position, the smaller of them is the
10243 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
10245 The display order and the e-order are complementary orders: any
10246 theorem about the display order also applies to the e-order if you swap
10247 all occurrences of ``display order'' and ``e-order'', ``less than'' and
10248 ``greater than'', and ``extent start'' and ``extent end''.
10250 @node Format of the Extent Info
10251 @section Format of the Extent Info
10252 @cindex extent info, format of the
10254 An extent-info structure consists of a list of the buffer or string's
10255 extents and a @dfn{stack of extents} that lists all of the extents over
10256 a particular position. The stack-of-extents info is used for
10257 optimization purposes---it basically caches some info that might
10258 be expensive to compute. Certain otherwise hard computations are easy
10259 given the stack of extents over a particular position, and if the
10260 stack of extents over a nearby position is known (because it was
10261 calculated at some prior point in time), it's easy to move the stack
10262 of extents to the proper position.
10264 Given that the stack of extents is an optimization, and given that
10265 it requires memory, a string's stack of extents is wiped out each
10266 time a garbage collection occurs. Therefore, any time you retrieve
10267 the stack of extents, it might not be there. If you need it to
10268 be there, use the @code{_force} version.
10270 Similarly, a string may or may not have an extent_info structure.
10271 (Generally it won't if there haven't been any extents added to the
10272 string.) So use the @code{_force} version if you need the extent_info
10273 structure to be there.
10275 A list of extents is maintained as a double gap array: one gap array
10276 is ordered by start index (the @dfn{display order}) and the other is
10277 ordered by end index (the @dfn{e-order}). Note that positions in an
10278 extent list should logically be conceived of as referring @emph{to} a
10279 particular extent (as is the norm in programs) rather than sitting
10280 between two extents. Note also that callers of these functions should
10281 not be aware of the fact that the extent list is implemented as an
10282 array, except for the fact that positions are integers (this should be
10283 generalized to handle integers and linked list equally well).
10285 @node Zero-Length Extents
10286 @section Zero-Length Extents
10287 @cindex zero-length extents
10288 @cindex extents, zero-length
10290 Extents can be zero-length, and will end up that way if their endpoints
10291 are explicitly set that way or if their detachable property is @code{nil}
10292 and all the text in the extent is deleted. (The exception is open-open
10293 zero-length extents, which are barred from existing because there is
10294 no sensible way to define their properties. Deletion of the text in
10295 an open-open extent causes it to be converted into a closed-open
10296 extent.) Zero-length extents are primarily used to represent
10297 annotations, and behave as follows:
10301 Insertion at the position of a zero-length extent expands the extent
10302 if both endpoints are closed; goes after the extent if it is closed-open;
10303 and goes before the extent if it is open-closed.
10306 Deletion of a character on a side of a zero-length extent whose
10307 corresponding endpoint is closed causes the extent to be detached if
10308 it is detachable; if the extent is not detachable or the corresponding
10309 endpoint is open, the extent remains in the buffer, moving as necessary.
10312 Note that closed-open, non-detachable zero-length extents behave
10313 exactly like markers and that open-closed, non-detachable zero-length
10314 extents behave like the ``point-type'' marker in Mule.
10316 @node Mathematics of Extent Ordering
10317 @section Mathematics of Extent Ordering
10318 @cindex mathematics of extent ordering
10319 @cindex extent mathematics
10320 @cindex extent ordering
10322 @cindex display order of extents
10323 @cindex extents, display order
10324 The extents in a buffer are ordered by ``display order'' because that
10325 is that order that the redisplay mechanism needs to process them in.
10326 The e-order is an auxiliary ordering used to facilitate operations
10327 over extents. The operations that can be performed on the ordered
10328 list of extents in a buffer are
10332 Locate where an extent would go if inserted into the list.
10334 Insert an extent into the list.
10336 Remove an extent from the list.
10338 Map over all the extents that overlap a range.
10341 (4) requires being able to determine the first and last extents
10342 that overlap a range.
10344 NOTE: @dfn{overlap} is used as follows:
10348 two ranges overlap if they have at least one point in common.
10349 Whether the endpoints are open or closed makes a difference here.
10351 a point overlaps a range if the point is contained within the
10352 range; this is equivalent to treating a point @math{P} as the range
10355 In the case of an @emph{extent} overlapping a point or range, the extent
10356 is normally treated as having closed endpoints. This applies
10357 consistently in the discussion of stacks of extents and such below.
10358 Note that this definition of overlap is not necessarily consistent with
10359 the extents that @code{map-extents} maps over, since @code{map-extents}
10360 sometimes pays attention to whether the endpoints of an extents are open
10361 or closed. But for our purposes, it greatly simplifies things to treat
10362 all extents as having closed endpoints.
10365 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
10366 to mean comparison according to the display order. Comparison between
10367 an extent @math{E} and an index @math{I} means comparison between
10368 @math{E} and the range @math{[I, I]}.
10370 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
10371 according to the e-order.
10373 For any range @math{R}, define @math{R(0)} to be the starting index of
10374 the range and @math{R(1)} to be the ending index of the range.
10376 For any extent @math{E}, define @math{E(next)} to be the extent directly
10377 following @math{E}, and @math{E(prev)} to be the extent directly
10378 preceding @math{E}. Assume @math{E(next)} and @math{E(prev)} can be
10379 determined from @math{E} in constant time. (This is because we store
10380 the extent list as a doubly linked list.)
10382 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
10383 extents directly following and preceding @math{E} in the e-order.
10387 Let @math{R} be a range.
10388 Let @math{F} be the first extent overlapping @math{R}.
10389 Let @math{L} be the last extent overlapping @math{R}.
10391 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
10392 i.e. @math{L <= R(1) < L(next)}.
10394 This follows easily from the definition of display order. The
10395 basic reason that this theorem applies is that the display order
10396 sorts by increasing starting index.
10398 Therefore, we can determine @math{L} just by looking at where we would
10399 insert @math{R(1)} into the list, and if we know @math{F} and are moving
10400 forward over extents, we can easily determine when we've hit @math{L} by
10401 comparing the extent we're at to @math{R(1)}.
10404 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
10407 This is the analog of Theorem 1, and applies because the e-order
10408 sorts by increasing ending index.
10410 Therefore, @math{F} can be found in the same amount of time as
10411 operation (1), i.e. the time that it takes to locate where an extent
10412 would go if inserted into the e-order list.
10414 If the lists were stored as balanced binary trees, then operation (1)
10415 would take logarithmic time, which is usually quite fast. However,
10416 currently they're stored as simple doubly-linked lists, and instead we
10417 do some caching to try to speed things up.
10419 Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
10420 (ordered in the display order) that overlap an index @math{I}, together
10421 with the SOE's @dfn{previous} extent, which is an extent that precedes
10422 @math{I} in the e-order. (Hopefully there will not be very many extents
10423 between @math{I} and the previous extent.)
10427 Let @math{I} be an index, let @math{S} be the stack of extents on
10428 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
10429 be @math{S}'s previous extent.
10431 Theorem 3: The first extent in @math{S} is the first extent that overlaps
10432 any range @math{[I, J]}.
10434 Proof: Any extent that overlaps @math{[I, J]} but does not include
10435 @math{I} must have a start index @math{> I}, and thus be greater than
10436 any extent in @math{S}.
10438 Therefore, finding the first extent that overlaps a range @math{R} is
10439 the same as finding the first extent that overlaps @math{R(0)}.
10441 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
10442 @math{F2} be the first extent that overlaps @math{I2}. Then, either
10443 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
10446 Proof: If @math{F2} does not include @math{I} then its start index is
10447 greater than @math{I} and thus it is greater than any extent in
10448 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
10449 and thus is in @math{S}, and thus @math{F2 >= F}.
10451 @node Extent Fragments
10452 @section Extent Fragments
10453 @cindex extent fragments
10454 @cindex fragments, extent
10456 Imagine that the buffer is divided up into contiguous, non-overlapping
10457 @dfn{runs} of text such that no extent starts or ends within a run
10458 (extents that abut the run don't count).
10460 An extent fragment is a structure that holds data about the run that
10461 contains a particular buffer position (if the buffer position is at the
10462 junction of two runs, the run after the position is used)---the
10463 beginning and end of the run, a list of all of the extents in that run,
10464 the @dfn{merged face} that results from merging all of the faces
10465 corresponding to those extents, the begin and end glyphs at the
10466 beginning of the run, etc. This is the information that redisplay needs
10467 in order to display this run.
10469 Extent fragments have to be very quick to update to a new buffer
10470 position when moving linearly through the buffer. They rely on the
10471 stack-of-extents code, which does the heavy-duty algorithmic work of
10472 determining which extents overly a particular position.
10474 @node Faces, Glyphs, Extents, Top
10478 Not yet documented.
10480 @node Glyphs, Specifiers, Faces, Top
10484 Glyphs are graphical elements that can be displayed in SXEmacs buffers or
10485 gutters. We use the term graphical element here in the broadest possible
10486 sense since glyphs can be as mundane as text or as arcane as a native
10489 In SXEmacs, glyphs represent the uninstantiated state of graphical
10490 elements, i.e. they hold all the information necessary to produce an
10491 image on-screen but the image need not exist at this stage, and multiple
10492 screen images can be instantiated from a single glyph.
10494 @c #### find a place for this discussion
10495 @c The decision to make image specifiers a separate type is debatable.
10496 @c In fact, the design decision to create a separate image specifier
10497 @c type, rather than make glyphs themselves be specifiers, is
10498 @c debatable---the other properties of glyphs are rarely used and could
10499 @c conceivably have been incorporated into the glyph's instantiator.
10500 @c The rarely used glyph types (buffer, pointer, icon) could also have
10501 @c been incorporated into the instantiator.
10503 Glyphs are lazily instantiated by calling one of the glyph
10504 functions. This usually occurs within redisplay when
10505 @code{Fglyph_height} is called. Instantiation causes an image-instance
10506 to be created and cached. This cache is on a per-device basis for all glyphs
10507 except widget-glyphs, and on a per-window basis for widgets-glyphs. The
10508 caching is done by @code{image_instantiate} and is necessary because it
10509 is generally possible to display an image-instance in multiple
10510 domains. For instance if we create a Pixmap, we can actually display
10511 this on multiple windows - even though we only need a single Pixmap
10512 instance to do this. If caching wasn't done then it would be necessary
10513 to create image-instances for every displayable occurrence of a glyph -
10514 and every usage - and this would be extremely memory and cpu intensive.
10516 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
10517 because widget-glyph image-instances on screen are toolkit windows, and
10518 thus cannot be reused in multiple SXEmacs domains. Thus widget-glyphs are
10519 cached on an SXEmacs window basis.
10521 Any action on a glyph first consults the cache before actually
10522 instantiating a widget.
10524 @section Glyph Instantiation
10525 @cindex glyph instantiation
10526 @cindex instantiation, glyph
10528 Glyph instantiation is a hairy topic and requires some explanation. The
10529 guts of glyph instantiation is contained within
10530 @code{image_instantiate}. A glyph contains an image which is a
10531 specifier. When a glyph function - for instance @code{Fglyph_height} -
10532 asks for a property of the glyph that can only be determined from its
10533 instantiated state, then the glyph image is instantiated and an image
10534 instance created. The instantiation process is governed by the specifier
10535 code and goes through a series of steps:
10539 Validation. Instantiation of image instances happens dynamically - often
10540 within the guts of redisplay. Thus it is often not feasible to catch
10541 instantiator errors at instantiation time. Instead the instantiator is
10542 validated at the time it is added to the image specifier. This function
10543 is defined by @code{image_validate} and at a simple level validates
10544 keyword value pairs.
10546 Duplication. The specifier code by default takes a copy of the
10547 instantiator. This is reasonable for most specifiers but in the case of
10548 widget-glyphs can be problematic, since some of the properties in the
10549 instantiator - for instance callbacks - could cause infinite recursion
10550 in the copying process. Thus the image code defines a function -
10551 @code{image_copy_instantiator} - which will selectively copy values.
10552 This is controlled by the way that a keyword is defined either using
10553 @code{IIFORMAT_VALID_KEYWORD} or
10554 @code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
10555 redisplay code relies on instantiator copying to ensure that current and
10556 new instantiators are actually different rather than referring to the
10559 Normalization. Once the instantiator has been copied it must be
10560 converted into a form that is viable at instantiation time. This can
10561 involve no changes at all, but typically involves things like converting
10562 file names to the actual data. This function is defined by
10563 @code{image_going_to_add} and @code{normalize_image_instantiator}.
10565 Instantiation. When an image instance is actually required for display
10566 it is instantiated using @code{image_instantiate}. This involves calling
10567 instantiate methods that are specific to the type of image being
10571 The final instantiation phase also involves a number of steps. In order
10572 to understand these we need to describe a number of concepts.
10574 An image is instantiated in a @dfn{domain}, where a domain can be any
10575 one of a device, frame, window or image-instance. The domain gives the
10576 image-instance context and identity and properties that affect the
10577 appearance of the image-instance may be different for the same glyph
10578 instantiated in different domains. An example is the face used to
10579 display the image-instance.
10581 Although an image is instantiated in a particular domain the
10582 instantiation domain is not necessarily the domain in which the
10583 image-instance is cached. For example a pixmap can be instantiated in a
10584 window be actually be cached on a per-device basis. The domain in which
10585 the image-instance is actually cached is called the
10586 @dfn{governing-domain}. A governing-domain is currently either a device
10587 or a window. Widget-glyphs and text-glyphs have a window as a
10588 governing-domain, all other image-instances have a device as the
10589 governing-domain. The governing domain for an image-instance is
10590 determined using the governing_domain image-instance method.
10592 @section Widget-Glyphs
10593 @cindex widget-glyphs
10595 @section Widget-Glyphs in the MS-Windows Environment
10596 @cindex widget-glyphs in the MS-Windows environment
10597 @cindex MS-Windows environment, widget-glyphs in the
10601 @section Widget-Glyphs in the X Environment
10602 @cindex widget-glyphs in the X environment
10603 @cindex X environment, widget-glyphs in the
10605 Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
10606 Library}) for manipulating the native toolkit objects. This is primarily
10607 so that different toolkits can be supported for widget-glyphs, just as
10608 they are supported for features such as menubars etc.
10610 Lwlib is extremely poorly documented and quite hairy so here is my
10611 understanding of what goes on.
10613 Lwlib maintains a set of widget_instances which mirror the hierarchical
10614 state of Xt widgets. I think this is so that widgets can be updated and
10615 manipulated generically by the lwlib library. For instance
10616 update_one_widget_instance can cope with multiple types of widget and
10617 multiple types of toolkit. Each element in the widget hierarchy is updated
10618 from its corresponding widget_instance by walking the widget_instance
10621 This has desirable properties such as lw_modify_all_widgets which is
10622 called from @file{glyphs-x.c} and updates all the properties of a widget
10623 without having to know what the widget is or what toolkit it is from.
10624 Unfortunately this also has hairy properties such as making the lwlib
10625 code quite complex. And of course lwlib has to know at some level what
10626 the widget is and how to set its properties.
10628 @node Specifiers, Menus, Glyphs, Top
10629 @chapter Specifiers
10632 Not yet documented.
10634 @node Menus, Subprocesses, Specifiers, Top
10638 A menu is set by setting the value of the variable
10639 @code{current-menubar} (which may be buffer-local) and then calling
10640 @code{set-menubar-dirty-flag} to signal a change. This will cause the
10641 menu to be redrawn at the next redisplay. The format of the data in
10642 @code{current-menubar} is described in @file{menubar.c}.
10644 Internally the data in current-menubar is parsed into a tree of
10645 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
10646 by the recursive function @code{menu_item_descriptor_to_widget_value()},
10647 called by @code{compute_menubar_data()}. Such a tree is deallocated
10648 using @code{free_widget_value()}.
10650 @code{update_screen_menubars()} is one of the external entry points.
10651 This checks to see, for each screen, if that screen's menubar needs to
10652 be updated. This is the case if
10656 @code{set-menubar-dirty-flag} was called since the last redisplay. (This
10657 function sets the C variable menubar_has_changed.)
10659 The buffer displayed in the screen has changed.
10661 The screen has no menubar currently displayed.
10664 @code{set_screen_menubar()} is called for each such screen. This
10665 function calls @code{compute_menubar_data()} to create the tree of
10666 widget_value's, then calls @code{lw_create_widget()},
10667 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
10668 to create the X-Toolkit widget associated with the menu.
10670 @code{update_psheets()}, the other external entry point, actually
10671 changes the menus being displayed. It uses the widgets fixed by
10672 @code{update_screen_menubars()} and calls various X functions to ensure
10673 that the menus are displayed properly.
10675 The menubar widget is set up so that @code{pre_activate_callback()} is
10676 called when the menu is first selected (i.e. mouse button goes down),
10677 and @code{menubar_selection_callback()} is called when an item is
10678 selected. @code{pre_activate_callback()} calls the function in
10679 activate-menubar-hook, which can change the menubar (this is described
10680 in @file{menubar.c}). If the menubar is changed,
10681 @code{set_screen_menubars()} is called.
10682 @code{menubar_selection_callback()} enqueues a menu event, putting in it
10683 a function to call (either @code{eval} or @code{call-interactively}) and
10684 its argument, which is the callback function or form given in the menu's
10687 @node Subprocesses, Interface to the X Window System, Menus, Top
10688 @chapter Subprocesses
10689 @cindex subprocesses
10691 The fields of a process are:
10695 A string, the name of the process.
10698 A list containing the command arguments that were used to start this
10702 A function used to accept output from the process instead of a buffer,
10706 A function called whenever the process receives a signal, or @code{nil}.
10709 The associated buffer of the process.
10712 An integer, the Unix process @sc{id}.
10715 A flag, non-@code{nil} if this is really a child process.
10716 It is @code{nil} for a network connection.
10719 A marker indicating the position of the end of the last output from this
10720 process inserted into the buffer. This is often but not always the end
10723 @item kill_without_query
10724 If this is non-@code{nil}, killing SXEmacs while this process is still
10725 running does not ask for confirmation about killing the process.
10727 @item raw_status_low
10728 @itemx raw_status_high
10729 These two fields record 16 bits each of the process status returned by
10730 the @code{wait} system call.
10733 The process status, as @code{process-status} should return it.
10737 If these two fields are not equal, a change in the status of the process
10738 needs to be reported, either by running the sentinel or by inserting a
10739 message in the process buffer.
10742 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
10743 @code{nil} if it uses a pipe.
10746 The file descriptor for input from the process.
10749 The file descriptor for output to the process.
10752 The file descriptor for the terminal that the subprocess is using. (On
10753 some systems, there is no need to record this, so the value is
10757 The name of the terminal that the subprocess is using,
10758 or @code{nil} if it is using pipes.
10761 @node Interface to the X Window System, Categories, Subprocesses, Top
10762 @chapter Interface to the X Window System
10763 @cindex X Window System, interface to the
10765 Mostly undocumented.
10768 * Lucid Widget Library:: An interface to various widget sets.
10771 @node Lucid Widget Library
10772 @section Lucid Widget Library
10773 @cindex Lucid Widget Library
10774 @cindex widget library, Lucid
10775 @cindex library, Lucid Widget
10777 Lwlib is extremely poorly documented and quite hairy. The author(s)
10778 blame that on X, Xt, and Motif, with some justice, but also sufficient
10779 hypocrisy to avoid drawing the obvious conclusion about their own work.
10781 The Lucid Widget Library is composed of two more or less independent
10782 pieces. The first, as the name suggests, is a set of widgets. These
10783 widgets are intended to resemble and improve on widgets provided in the
10784 Motif toolkit but not in the Athena widgets, including menubars and
10785 scrollbars. Recent additions by Andy Piper integrate some ``modern''
10786 widgets by Edward Falk, including checkboxes, radio buttons, progress
10787 gauges, and index tab controls (aka notebooks).
10789 The second piece of the Lucid widget library is a generic interface to
10790 several toolkits for X (including Xt, the Athena widget set, and Motif,
10791 as well as the Lucid widgets themselves) so that core SXEmacs code need
10792 not know which widget set has been used to build the graphical user
10796 * Generic Widget Interface:: The lwlib generic widget interface.
10799 * Checkboxes and Radio Buttons::
10804 @node Generic Widget Interface
10805 @subsection Generic Widget Interface
10806 @cindex widget interface, generic
10808 In general in any toolkit a widget may be a composite object. In Xt,
10809 all widgets have an X window that they manage, but typically a complex
10810 widget will have widget children, each of which manages a subwindow of
10811 the parent widget's X window. These children may themselves be
10812 composite widgets. Thus a widget is actually a tree or hierarchy of
10815 For each toolkit widget, lwlib maintains a tree of @code{widget_values}
10816 which mirror the hierarchical state of Xt widgets (including Motif,
10817 Athena, 3D Athena, and Falk's widget sets). Each @code{widget_value}
10818 has @code{contents} member, which points to the head of a linked list of
10819 its children. The linked list of siblings is chained through the
10820 @code{next} member of @code{widget_value}.
10829 +-------+ next +-------+ next +-------+
10830 | child |----->| child |----->| child |
10831 +-------+ +-------+ +-------+
10835 +-------------+ next +-------------+
10836 | grand child |----->| grand child |
10837 +-------------+ +-------------+
10839 The @code{widget_value} hierarchy of a composite widget with two simple
10840 children and one composite child.
10843 The @code{widget_instance} structure maintains the inverse view of the
10844 tree. As for the @code{widget_value}, siblings are chained through the
10845 @code{next} member. However, rather than naming children, the
10846 @code{widget_instance} tree links to parents.
10855 +-------+ next +-------+ next +-------+
10856 | child |----->| child |----->| child |
10857 +-------+ +-------+ +-------+
10861 +-------------+ next +-------------+
10862 | grand child |----->| grand child |
10863 +-------------+ +-------------+
10865 The @code{widget_value} hierarchy of a composite widget with two simple
10866 children and one composite child.
10869 This permits widgets derived from different toolkits to be updated and
10870 manipulated generically by the lwlib library. For instance
10871 @code{update_one_widget_instance} can cope with multiple types of widget
10872 and multiple types of toolkit. Each element in the widget hierarchy is
10873 updated from its corresponding @code{widget_value} by walking the
10874 @code{widget_value} tree. This has desirable properties. For example,
10875 @code{lw_modify_all_widgets} is called from @file{glyphs-x.c} and
10876 updates all the properties of a widget without having to know what the
10877 widget is or what toolkit it is from. Unfortunately this also has its
10878 hairy properties; the lwlib code quite complex. And of course lwlib has
10879 to know at some level what the widget is and how to set its properties.
10881 The @code{widget_instance} structure also contains a pointer to the root
10882 of its tree. Widget instances are further confi
10886 @subsection Scrollbars
10890 @subsection Menubars
10893 @node Checkboxes and Radio Buttons
10894 @subsection Checkboxes and Radio Buttons
10895 @cindex checkboxes and radio buttons
10896 @cindex radio buttons, checkboxes and
10897 @cindex buttons, checkboxes and radio
10899 @node Progress Bars
10900 @subsection Progress Bars
10901 @cindex progress bars
10902 @cindex bars, progress
10905 @subsection Tab Controls
10906 @cindex tab controls
10909 @node Categories, Index, Interface to the X Window System, Top
10910 @chapter Categories
10921 @cindex doubly-linked lists
10923 Doubly-linked lists (dllists) are widely used throughout the whole
10924 source tree. While portions of their API live at the lisp level, they
10925 are more common at the C level for various purposes, queues, auxiliary
10926 linking of items, or free lists just to name some.
10928 Their head, their tail and their size are protected by mutexes (when
10929 pthreads are available) which provides thread-safe operations.
10931 Dllists can link together arbitrary objects at the C level. They
10932 provide a @code{void*} slot per item. Storing @code{Lisp_Object}s
10933 requires a cast to @code{void*} hence. However, they obey the normal
10934 rules of garbage collection, can be marked and collected. Marking an
10935 ordinary dllist induces a traversal through all the linked objects
10936 (called items) where each item is marked as though it was a
10937 @code{Lisp_Object}.
10939 So in order to avoid interference with arbitrary objects dllists and the
10940 GC's mark phase there is a special form of allocation, we call
10941 @samp{noseeum_dllist}s. Consequently noseeum dllists have to be
10942 manually observed and freed if necessary.
10945 At the moment we provide following functionality:
10949 Lisp_Dllist *make_dllist(void);
10950 Lisp_Dllist *noseeum_make_dllist(void);
10951 void noseeum_free_dllist(Lisp_Dllist*);
10954 Hereby, the @code{noseeum_} functions are intended to operate
10955 independently of the garbage collector.
10959 void *dllist_car(Lisp_Dllist*);
10960 void *dllist_rac(Lisp_Dllist*);
10963 Without altering the dllist in any way these return the element in the
10964 head cell or tail cell respectively. Note: These two elements the only
10965 ones accessible in general. A dllist is -- due to its slightly more
10966 complex navigation information overhead -- not as flexible as ordinary
10967 lisp lists made up of cons cells.
10971 void dllist_prepend_item(Lisp_Dllist*, dllist_item_t*);
10972 void dllist_prepend(Lisp_Dllist*, void*);
10973 void dllist_append_item(Lisp_Dllist*, dllist_item_t*);
10974 void dllist_append(Lisp_Dllist*, void*);
10977 These are modifier functions which add items or elements to the head or
10978 tail of the dllist respectively. An item here is a @code{dllist_item_t}
10979 object which carries the actual element (of type @code{void*}) and the
10980 navigation information. The navigation cell itself need not have valid
10981 navigation information, these are set accordingly in the body of
10982 @code{dllist_prepend_item} or @code{dllist_append_item}. In this speak,
10983 a @code{dllist_prepend} is rewritten to a @code{dllist_prepend_item}
10984 after the data element (of type @code{void*}) has been properly wrapped
10985 into a (newly allocated) navigation cell.
10989 inline dllist_item_t *dllist_transfer_car(Lisp_Dllist*);
10990 void *dllist_pop_car(Lisp_Dllist*);
10991 inline dllist_item_t *dllist_transfer_rac(Lisp_Dllist*);
10992 void *dllist_pop_rac(Lisp_Dllist*);
10995 These are destructive accessors and -- in a way -- the inverse
10996 operations of @code{dllist_append_item}, @code{dllist_append}, etc. The
10997 @code{dllist_transfer} form here cuts off the entire head cell or tail
10998 cell of the dllist, that is including the navigation information. In
10999 contrast, the @code{dllist_pop} form essentially does the same, but
11000 extracts the actual data element (cast to @code{void*}). Moreover,
11001 since the navigation cell on its own is quite useless, it is freed after
11006 Lisp_Dllist *copy_dllist(Lisp_Dllist*);
11007 void dllist_map_inplace(Lisp_Object, Lisp_Object);
11008 void dllist_map_inplace_C(void*(*)(void*), Lisp_Dllist*);
11009 void dllist_map_C(void(*)(void*), Lisp_Dllist*);
11012 Auxiliary stuff, to be documented later.
11020 @include index.texi
11022 @c Print the tables of contents