4 @c This 'ignore' section fools texinfo-all-menus-update into creating
5 @c proper menus for this chapter.
8 This chapter gives the overview of @semantic{} and its goals.
10 With Emacs, regular expressions (and syntax tables) are the basis of
11 identifying components in a programming language source for purposes
12 such as color highlighting. This approach has proved is usefulness,
15 @semantic{} provides a new intrastructure that goes far beyond text
16 analysis based on regular expressions.
18 @semantic{} uses @dfn{parsers} to analyze programming language
19 sources. For languages that can be described using a context-free
20 grammar, parsers can be based on the grammar of the language. Or they
21 can be @dfn{external parsers} implemented using any means. This
22 allows the use of a regular expression parser for non-regular
23 languages, or external programs for speed.
25 @semantic{} provides extensive tools to help support a new language.
26 An original @acronym{LL} parser, and a Bison-like @acronym{LALR}
27 parser are included. So, for a regular language, all that the
28 developer needs to do is write a grammar file along with appropriate
31 @semantic{} allows an uniform representation of language components,
32 and provides a common @acronym{API} so that programmers can develop
33 applications that work for all languages. The distribution includes
34 good set of tools and examples for the application writers, that
35 demonstrate the usefulness of @semantic{}.
37 The following diagram illustrates the benefits of using @semantic{}:
41 The words in all-capital are those that @semantic{} itself provides.
42 Others are current or future languages or applications that are not
43 distributed along with @semantic{}.
52 +---------------+ +--------+ +--------+
53 C --->| C PARSER |--->| | | |
54 +---------------+ | | | |
55 +---------------+ | COMMON | | COMMON |<--- SPEEDBAR
56 Java --->| JAVA PARSER |--->| | | |
57 +---------------+ | PARSE | | PARSE |<--- SENATOR
58 +---------------+ | | | |
59 Python --->| PYTHON PARSER |--->| TREE | | TREE |<--- DOCUMENT
60 +---------------+ | | | |
61 +---------------+ | FORMAT | | API |<--- SEMANTICDB
62 Scheme --->| SCHEME PARSER |--->| | | |
63 +---------------+ | | | |<--- jdee
64 +---------------+ | | | |
65 Texinfo --->| TEXI. PARSER |--->| | | |<--- ecb
66 +---------------+ | | | |
70 +---------------+ | | | |<--- app. 1
71 Lang. A --->| A Parser |--->| | | |
72 +---------------+ | | | |<--- app. 2
73 +---------------+ | | | |
74 Lang. B --->| B Parser |--->| | | |<--- app. 3
75 +---------------+ | | | |
79 +---------------+ | | | |
80 Lang. Y --->| Y Parser |--->| | | |<--- app. ?
81 +---------------+ | | | |
82 +---------------+ | | | |<--- app. ?
83 Lang. Z --->| Z Parser |--->| | | |
84 +---------------+ +--------+ +--------+
88 This is from the Overview chapter of the original semantic.texi.
90 Semantic is a tool primarily for the Emacs-Lisp programmer.
91 However, it comes with ``applications'' that non-programmer might
93 This chapter is mostly for the benefit of these non-programmers
94 as it gives brief descriptions of basic concepts such as
95 grammars, parsers, compiler-compilers, parse-tree, etc.
98 The grammar of a natural language defines rules by which valid phrases
99 and sentences can be composed using words, the fundamental units with
100 which all sentences are created.
101 @cindex context-free grammar
102 In a similar fashion, a ``context-free grammar'' defines the rules by which
103 programs can be composed using the fundamental units of the language,
104 i.e., numbers, symbols, punctuations, etc.
105 Context-free grammars are often specified in a well-known form called
106 @cindex Backus-Naur Form
108 Backus-Naur Form, BNF for short.
109 This is a systematic way of representing context-free grammars
110 such that programs can read files with grammars written in BNF
111 and generate code for ``parser'' of that language.
113 @cindex compiler-compiler
114 YACC (Yet Another Compiler Compiler) is one such program that has been
115 part of UNIX operating systems since the 1970's.
116 YACC is pronounced the same as ``yak'', the long-haired ox found in Asia.
117 The parser generated by YACC is usually a C program.
119 @uref{http://www.gnu.org/software/bison/bison.html , Bison}
120 is also a ``compiler compiler'' that takes BNF grammars and produces
121 parsers in C language.
122 The difference between YACC and Bison is that Bison is
123 @cindex free software
124 @uref{http://www.gnu.org/philosophy/free-sw.html , free software}
125 and upward-compatible with YACC.
126 It also comes with an excellent manual.
128 Semantic is similar in spirit to YACC and Bison.
130 Semantic, however, is referred to as a @dfn{bovinator} rather than
131 as a parser, because it is a lesser cousin of YACC and Bison.
132 It is lesser in that it does not perform a full parse
135 Instead, it @dfn{bovinates}.
136 ``Bovination'' refers to partial parsing which
138 creates @dfn{parse trees} of only the top most
139 expressions rather than parsing every nested expression.
140 This is sufficient for the purposes for which semantic was designed.
141 Semantic is meant to be used within Emacs for providing
142 editor-related features such as code browsers and translators rather
143 than for compiling which requires far more complex and complete parsers.
144 Semantic is not designed to be able to create full parse trees.
147 One key benefit of semantic is that it creates parse trees
149 (perhaps the term @dfn{bovine tree} may be more accurate)
150 with the same structure regardless of the type of language involved.
151 Higher level applications written to work with bovine trees
152 will then work with any language for which the grammar is available.
153 For example, a code browser written today that supports C, C++, and
154 Java may work without any change on other languages that do not even
156 All one has to do is to write the BNF specification for the new language.
157 The rest of the work is done by semantic.
158 For certain languages, it is hard if not impossible to specify the syntax
159 of the language in BNF form, e.g.,
160 @uref{http://www.texinfo.org ,texinfo}
161 and other document oriented languages.
162 Semantic provides a parser for texinfo nevertheless.
163 Instead of BNF grammar, texinfo files are ``parsed'' using
164 @ref{Regexps,regular-expressions,regular-expressions,emacs}.
166 Semantic comes with grammars for these languages:
176 Several tools employing semantic that provide user observable features
177 are listed in @ref{Tools} section.
182 * Semantic Components::
185 @node Semantic Components
186 @section Semantic Components
188 This chapter gives an overview of major components of @semantic{} and
189 how they interact with each other to perform its job.
191 The first step of parsing is to break up the input file into its
192 fundamental components. This step is called lexical analysis. The
193 output of the lexical analyzer is a list of tokens that make up the
197 syntax table, keywords list, and options
201 input file ----> Lexer ----> token stream
204 The next step is the parsing shown below.
210 token stream ---> Parser ----> parse tree
213 The end result, the parse tree, is created based on the parser tables,
214 which are in the internal representation of the language grammar used by
217 The @semantic{} database provides caching of the parse trees by saving them
218 into files named @file{semantic.cache} automatically when loading them
219 when appropriate instead of re-parsing. The reason for this is to save the
220 time it takes to parse a file which could take several seconds or more
223 Finally, @semantic{} provides an @acronym{API} for the Emacs Lisp
224 programmer to access the information in the parse tree.
227 @c LocalWords: API LALR