1 \input texinfo @c -*-texinfo-*-
3 @setfilename bovine.info
4 @set TITLE Bovine parser development
5 @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
6 @settitle @value{TITLE}
8 @c *************************************************************************
10 @c *************************************************************************
12 @c Merge all indexes into a single index for now.
13 @c We can always separate them later into two or more as needed.
20 @c @footnotestyle separate
26 This manual documents Bovine parser development in Semantic
28 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam
29 Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce
30 Copyright @copyright{} 2002, 2003 Richard Y. Kim
33 Permission is granted to copy, distribute and/or modify this document
34 under the terms of the GNU Free Documentation License, Version 1.1 or
35 any later version published by the Free Software Foundation; with the
36 Invariant Sections being list their titles, with the Front-Cover Texts
37 being list, and with the Back-Cover Texts being list. A copy of the
38 license is included in the section entitled ``GNU Free Documentation
46 * Semantic bovine parser development: (bovine).
54 @c @setchapternewpage odd
55 @c @setchapternewpage off
58 This file documents parser development with the bovine parser generator
59 @emph{Infrastructure for parser based text analysis in Emacs}
61 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
67 @author by @value{AUTHOR}
68 @vskip 0pt plus 1 fill
69 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
71 @vskip 0pt plus 1 fill
77 @include semanticheader.texi
80 @c *************************************************************************
82 @c *************************************************************************
88 The @dfn{bovine} parser is the original @semantic{} parser, and is an
89 implementation of an @acronym{LL} parser. It is good for simple
90 languages. It has many conveniences making grammar writing easy. The
91 conveniences make it less powerful than a Bison-like @acronym{LALR}
92 parser. For more information, @inforef{top, the Wisent Parser Manual,
95 Bovine @acronym{LL} grammars are stored in files with a @file{.by}
96 extension. When compiled, the contents is converted into a file of
97 the form @file{NAME-by.el}. This, in turn is byte compiled.
98 @inforef{top, Grammar Framework Manual, grammar-fw}.
101 * Starting Rules:: The starting rules for the grammar.
102 * Bovine Grammar Rules:: Rules used to parse a language
103 * Optional Lambda Expression:: Actions to take when a rule is matched
104 * Bovine Examples:: Simple Samples
105 * GNU Free Documentation License::
110 @chapter Starting Rules
112 In Bison, one and only one nonterminal is designated as the ``start''
113 symbol. In @semantic{}, one or more nonterminals can be designated as
114 the ``start'' symbol. They are declared following the @code{%start}
115 keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.
117 If no @code{%start} keyword is used in a grammar, then the very first
118 is used. Internally the first start nonterminal is targeted by the
119 reserved symbol @code{bovine-toplevel}, so it can be found by the
122 To find locally defined variables, the local context handler needs to
123 parse the body of functional code. The @code{scopestart} declaration
124 specifies the name of a nonterminal used as the goal to parse a local
125 context, @inforef{scopestart Decl, ,grammar-fw}. Internally the
126 scopestart nonterminal is targeted by the reserved symbol
127 @code{bovine-inner-scope}, so it can be found by the parser harness.
129 @node Bovine Grammar Rules
130 @chapter Bovine Grammar Rules
132 The rules are what allow the compiler to create tags from a language
133 file. Once the setup is done in the prologue, you can start writing
134 rules. @inforef{Grammar Rules, ,grammar-fw}.
137 @var{result} : @var{components1} @var{optional-semantic-action1})
138 | @var{components2} @var{optional-semantic-action2}
142 @var{result} is a nonterminal, that is a symbol synthesized in your grammar.
143 @var{components} is a list of elements that are to be matched if @var{result}
144 is to be made. @var{optional-semantic-action} is an optional sequence
145 of simplified Emacs Lisp expressions for concocting the parse tree.
147 In bison, each time an element of @var{components} is found, it is
148 @dfn{shifted} onto the parser stack. (The stack of matched elements.)
149 When all @var{components}' elements have been matched, it is
150 @dfn{reduced} to @var{result}. @xref{(bison)Algorithm}.
152 A particular @var{result} written into your grammar becomes
153 the parser's goal. It is designated by a @code{%start} statement
154 (@pxref{Starting Rules}). The value returned by the associated
155 @var{optional-semantic-action} is the parser's result. It should be
156 a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
159 @var{components} is made up of symbols. A symbol such as @code{FOO}
160 means that a syntactic token of class @code{FOO} must be matched.
163 * How Lexical Tokens Match::
164 * Grammar-to-Lisp Details::
165 * Order of components in rules::
168 @node How Lexical Tokens Match
169 @section How Lexical Tokens Match
171 A lexical rule must be used to define how to match a lexical token.
179 Means that @code{FOO} is a reserved language keyword, matched as such
180 by looking up into a keyword table, @inforef{keyword Decl,
181 ,grammar-fw}. This is because @code{"foo"} will be converted to
182 @code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
183 won't be available any other way.
185 If we specify our token in this way:
188 %token <symbol> FOO "foo"
191 then @code{FOO} will match the string @code{"foo"} explicitly, but it
192 won't do so at the lexical level, allowing use of the text
193 @code{"foo"} in other forms of regular expressions.
195 In that case, @code{FOO} is a @code{symbol}-type token. To match, a
196 @code{symbol} must first be encountered, and then it must
197 @code{string-match "foo"}.
201 Be especially careful to remember that @code{"foo"}, and more
202 generally the %token's match-value string, is a regular expression!
205 Non symbol tokens are also allowed. For example:
208 %token <punctuation> PERIOD "[.]"
210 filename : symbol PERIOD symbol
214 @code{PERIOD} is a @code{punctuation}-type token that will explicitly
215 match one period when used in the above rule.
219 @code{symbol}, @code{punctuation}, etc., are predefined lexical token
220 types, based on the @dfn{syntax class}-character associations
224 @node Grammar-to-Lisp Details
225 @section Grammar-to-Lisp Details
227 For the bovinator, lexical token matching patterns are @emph{inlined}.
228 When the grammar-to-lisp converter encounters a lexical token
229 declaration of the form:
232 %token <@var{type}> @var{token-name} @var{match-value}
235 It substitutes every occurrences of @var{token-name} in rules, by its
239 @var{type} @var{match-value}
245 %token <symbol> MOOSE "moose"
251 Will generate this pseudo equivalent-rule:
254 find_a_moose: symbol "moose" ;; invalid syntax!
258 Thus, from the bovinator point of view, the @var{components} part of a
259 rule is made up of symbols and strings. A string in the mix means
260 that the previous symbol must have the additional constraint of
261 exactly matching it, as described in @ref{How Lexical Tokens Match}.
265 For the bovinator, this task was mixed into the language definition to
266 simplify implementation, though Bison's technique is more efficient.
269 @node Order of components in rules
270 @section Order of components in rules
272 If a rule has multiple components, order is important, for example
275 headerfile : symbol PERIOD symbol
280 would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
281 The bovine parser will first attempt to match the long form, and then
282 the short form. If they were in reverse order, then the long form
283 would never be tested.
285 @c @xref{Default syntactic tokens}.
287 @node Optional Lambda Expression
288 @chapter Optional Lambda Expressions
290 The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
291 a bovine lambda. This lambda has special short-cuts to simplify
292 reading the semantic action definition. An @acronym{OLE} like this:
298 results in a lambda return which consists entirely of the string
299 or object found by matching the first (zeroth) element of match.
300 An @acronym{OLE} like this:
306 executes @code{foo} on the first argument, and then splices its return
307 into the return list whereas:
313 executes @code{foo}, and that is placed in the return list.
315 Here are other things that can appear inline:
319 The first object matched.
322 The first object spliced into the list (assuming it is a list from a
326 The first object matched, placed in a list. i.e. @code{( $1 )}.
329 The symbol @code{foo} (exactly as displayed).
332 A function call to foo which is stuck into the return list.
335 A function call to foo which is spliced into the return list.
338 A function call to foo which is stuck into the return list in a list.
340 @item (EXPAND @var{$1} @var{nonterminal} @var{depth})
341 A list starting with @code{EXPAND} performs a recursive parse on the
342 token passed to it (represented by @samp{$1} above.) The
343 @dfn{semantic list} is a common token to expand, as there are often
344 interesting things in the list. The @var{nonterminal} is a symbol in
345 your table which the bovinator will start with when parsing.
346 @var{nonterminal}'s definition is the same as any other nonterminal.
347 @var{depth} should be at least @samp{1} when descending into a
350 @item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
351 Is like @code{EXPAND}, except that the parser will iterate over
352 @var{nonterminal} until there are no more matches. (The same way the
353 parser iterates over the starting rule (@pxref{Starting Rules}). This
354 lets you have much simpler rules in this specific case, and also lets
355 you have positional information in the returned tokens, and error
358 @item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
359 This is used for creating an association list. Each @var{symbol} is
360 included in the list if the associated @var{value} is non-@code{nil}.
361 While the items are all listed explicitly, the created structure is an
362 association list of the form:
365 ((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
368 @item (TAG @var{name} @var{class} [@var{attributes}])
369 This creates one tag in the current buffer.
373 Is a string that represents the tag in the language.
376 Is the kind of tag being create, such as @code{function}, or
377 @code{variable}, though any symbol will work.
380 Is an optional set of labeled values such as @w{@code{:constant-flag t :parent
384 @item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
385 @itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
386 @itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
387 @itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
388 @itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
389 @itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
390 Create a tag with @var{name} of respectively the class
391 @code{variable}, @code{function}, @code{type}, @code{include},
392 @code{package}, and @code{code}.
393 See @inforef{Creating Tags, , semantic-appdev} for the lisp
394 functions these translate into.
397 If the symbol @code{%quotemode backquote} is specified, then use
398 @code{,@@} to splice a list in, and @code{,} to evaluate the expression.
399 This lets you send @code{$1} as a symbol into a list instead of having
402 @node Bovine Examples
420 which, if it matched the string @samp{"A"}, would return
426 If this rule were used like this:
429 %token <punctuation> EQUAL "="
431 assign: any-symbol EQUAL any-symbol
436 it would match @samp{"A=B"}, and return
442 The letters @samp{A} and @samp{B} come back in lists because
443 @samp{any-symbol} is a nonterminal, not an actual lexical element.
445 To get a better result with nonterminals, use @asis{,} to splice lists
449 %token <punctuation> EQUAL "="
451 assign: any-symbol EQUAL any-symbol
462 @node GNU Free Documentation License
463 @appendix GNU Free Documentation License
478 @c Following comments are for the benefit of ispell.
480 @c LocalWords: bovinator inlined