1 \input texinfo @c -*-texinfo-*-
3 @setfilename wisent.info
4 @set TITLE Wisent Parser Development
5 @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
6 @settitle @value{TITLE}
8 @c *************************************************************************
10 @c *************************************************************************
12 @c Merge all indexes into a single index for now.
13 @c We can always separate them later into two or more as needed.
20 @c @footnotestyle separate
26 This manual documents the Wisent parser generator.
28 Copyright @copyright{} 2001, 2002, 2003, 2004, 2007 David Ponce
30 Some texts are borrowed or adapted from the manual of Bison version
31 1.35. The text in section entitled ``Understanding the automaton'' is
32 adapted from the section ``Understanding Your Parser'' in the manual
33 of Bison version 1.49.
35 Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
36 1999, 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
39 Permission is granted to copy, distribute and/or modify this document
40 under the terms of the GNU Free Documentation License, Version 1.1 or
41 any later version published by the Free Software Foundation; with the
42 Invariant Sections being list their titles, with the Front-Cover Texts
43 being list, and with the Back-Cover Texts being list. A copy of the
44 license is included in the section entitled ``GNU Free Documentation
52 * Semantic Wisent parser development: (wisent).
60 @c @setchapternewpage odd
61 @c @setchapternewpage off
64 This file documents Application Development with Semantic.
65 @emph{Infrastructure for parser based text analysis in Emacs}
67 Copyright @copyright{} 2001, 2002, 2003, 2004 @value{AUTHOR}
73 @author by @value{AUTHOR}
74 @vskip 0pt plus 1 fill
75 Copyright @copyright{} 2001, 2002, 2003, 2004 @value{AUTHOR}
77 @vskip 0pt plus 1 fill
83 @include semanticheader.texi
87 @c *************************************************************************
89 @c *************************************************************************
95 Wisent (the European Bison ;-) is an Emacs Lisp implementation of the
96 GNU Compiler Compiler Bison.
98 This manual describes how to use Wisent to develop grammars for
99 programming languages, and how to use grammars to parse language
100 source in Emacs buffers.
102 It also describes how Wisent is used with the @semantic{} tool set
103 described in the @ref{Top, Semantic Manual, Semantic Manual, semantic}.
110 * GNU Free Documentation License::
114 @node Wisent Overview
115 @chapter Wisent Overview
117 @dfn{Wisent} (the European Bison) is an implementation in Emacs Lisp
118 of the GNU Compiler Compiler Bison. Its code is a port of the C code
119 of GNU Bison 1.28 & 1.31.
121 For more details on the basic concepts for understanding Wisent, it is
122 worthwhile to read the @ref{Top, Bison Manual, bison}.
124 @uref{http://www.gnu.org/manual/bison/html_node/index.html}.
127 Wisent can generate compilers compatible with the @semantic{} tool set.
128 See the @ref{Top, Semantic Manual, , semantic}.
130 It benefits from these Bison features:
134 It uses a fast but not so space-efficient encoding for the parse
135 tables, described in Corbett's PhD thesis from Berkeley:
137 @cite{Static Semantics in Compiler Error Recovery}@*
138 June 1985, Report No. UCB/CSD 85/251.
142 For generating the lookahead sets, Wisent uses the well-known
143 technique of F. DeRemer and A. Pennello they described in:
145 @cite{Efficient Construction of LALR(1) Lookahead Sets}@*
146 October 1982, ACM TOPLS Vol 4 No 4.
150 Wisent resolves shift/reduce conflicts using operator precedence and
154 Parser error recovery is accomplished using rules which match the
155 special token @code{error}.
158 Nevertheless there are some fundamental differences between Bison and
163 Wisent is intended to be used in Emacs. It reads and produces Emacs
164 Lisp data structures. All the additional code used in grammars is
168 Contrary to Bison, Wisent does not generate a parser which combines
169 Emacs Lisp code and grammar constructs. They exist separately.
170 Wisent reads the grammar from a Lisp data structure and then generates
171 grammar constructs as tables. Afterward, the derived tables can be
172 included and byte-compiled in separate Emacs Lisp files, and be used
173 at a later time by the Wisent's parser engine.
176 Wisent allows multiple start nonterminals and allows a call to the
177 parsing function to be made for a particular start nonterminal. For
178 example, this is particularly useful to parse a region of an Emacs
179 buffer. @semantic{} heavily depends on the availability of this feature.
183 @chapter Wisent Grammar
185 @cindex context-free grammar
187 In order for Wisent to parse a language, it must be described by a
188 @dfn{context-free grammar}. That is a grammar specified as rules that
189 can be applied regardless of context. For more information, see
190 @ref{Language and Grammar, , , bison}, in the Bison manual.
194 The formal grammar is formulated using @dfn{terminal} and
195 @dfn{nonterminal} items. Terminals can be Emacs Lisp symbols or
196 characters, and nonterminals are symbols only.
199 Terminals (also known as @dfn{tokens}) represent the lexical
200 elements of the language like numbers, strings, etc..
202 For example @samp{PLUS} can represent the operator @samp{+}.
204 Nonterminal symbols are described by rules:
208 RESULT @equiv{} COMPONENTS@dots{}
212 @samp{RESULT} is a nonterminal that this rule describes and
213 @samp{COMPONENTS} are various terminals and nonterminals that are put
214 together by this rule.
216 For example, this rule:
220 exp @equiv{} exp PLUS exp
224 Says that two groupings of type @samp{exp}, with a @samp{PLUS} token
225 in between, can be combined into a larger grouping of type @samp{exp}.
230 * Compiling a grammar::
234 @node Grammar format, Example, Wisent Grammar, Wisent Grammar
235 @comment node-name, next, previous, up
236 @section Grammar format
238 @cindex grammar format
239 To be acceptable by Wisent a context-free grammar must respect a
240 particular format. That is, must be represented as an Emacs Lisp list
243 @code{(@var{terminals} @var{assocs} . @var{non-terminals})}
247 Is the list of terminal symbols used in the grammar.
249 @cindex associativity
251 Specify the associativity of @var{terminals}. It is @code{nil} when
252 there is no associativity defined, or an alist of
253 @w{@code{(@var{assoc-type} . @var{assoc-value})}} elements.
255 @var{assoc-type} must be one of the @code{default-prec},
256 @code{nonassoc}, @code{left} or @code{right} symbols. When
257 @var{assoc-type} is @code{default-prec}, @var{assoc-value} must be
258 @code{nil} or @code{t} (the default). Otherwise it is a list of
259 tokens which must have been previously declared in @var{terminals}.
261 For details, see @ref{Contextual Precedence, , , bison}, in the
265 Is the list of nonterminal definitions. Each definition has the form:
267 @code{(@var{nonterm} . @var{rules})}
269 Where @var{nonterm} is the nonterminal symbol defined and
270 @var{rules} the list of rules that describe this nonterminal. Each
273 @code{(@var{components} [@var{precedence}] [@var{action}])}
279 Is a list of various terminals and nonterminals that are put together
286 (exp ((exp ?+ exp)) ;; exp: exp '+' exp
291 Says that two groupings of type @samp{exp}, with a @samp{+} token in
292 between, can be combined into a larger grouping of type @samp{exp}.
294 @cindex grammar coding conventions
295 By convention, a nonterminal symbol should be in lower case, such as
296 @samp{exp}, @samp{stmt} or @samp{declaration}. Terminal symbols
297 should be upper case to distinguish them from nonterminals: for
298 example, @samp{INTEGER}, @samp{IDENTIFIER}, @samp{IF} or
299 @samp{RETURN}. A terminal symbol that represents a particular keyword
300 in the language is conventionally the same as that keyword converted
301 to upper case. The terminal symbol @code{error} is reserved for error
304 @cindex middle-rule actions
305 Scattered among the components can be @dfn{middle-rule} actions.
306 Usually only @var{action} is provided (@pxref{action}).
308 If @var{components} in a rule is @code{nil}, it means that the rule
309 can match the empty string. For example, here is how to define a
310 comma-separated sequence of zero or more @samp{exp} groupings:
314 (expseq (nil) ;; expseq: ;; empty
315 ((expseq1)) ;; | expseq1
318 (expseq1 ((exp)) ;; expseq1: exp
319 ((expseq1 ?, exp)) ;; | expseq1 ',' exp
324 @cindex precedence level
326 Assign the rule the precedence of the given terminal item, overriding
327 the precedence that would be deduced for it, that is the one of the
328 last terminal in it. Notice that only terminals declared in
329 @var{assocs} have a precedence level. The altered rule precedence
330 then affects how conflicts involving that rule are resolved.
332 @var{precedence} is an optional vector of one terminal item.
334 Here is how @var{precedence} solves the problem of unary minus.
335 First, declare a precedence for a fictitious terminal symbol named
336 @code{UMINUS}. There are no tokens of this type, but the symbol
337 serves to stand for its precedence:
341 ((default-prec t) ;; This is the default
347 Now the precedence of @code{UMINUS} can be used in specific rules:
351 (exp @dots{} ;; exp: @dots{}
352 ((exp ?- exp)) ;; | exp '-' exp
354 ((?- exp) [UMINUS]) ;; | '-' exp %prec UMINUS
360 If you forget to append @code{[UMINUS]} to the rule for unary minus,
361 Wisent silently assumes that minus has its usual precedence. This
362 kind of problem can be tricky to debug, since one typically discovers
363 the mistake only by testing the code.
365 Using @code{(default-prec nil)} declaration makes it easier to
366 discover this kind of problem systematically. It causes rules that
367 lack a @var{precedence} modifier to have no precedence, even if the
368 last terminal symbol mentioned in their components has a declared
371 If @code{(default-prec nil)} is in effect, you must specify
372 @var{precedence} for all rules that participate in precedence conflict
373 resolution. Then you will see any shift/reduce conflict until you
374 tell Wisent how to resolve it, either by changing your grammar or by
375 adding an explicit precedence. This will probably add declarations to
376 the grammar, but it helps to protect against incorrect rule
379 The effect of @code{(default-prec nil)} can be reversed by giving
380 @code{(default-prec t)}, which is the default.
382 For more details, see @ref{Contextual Precedence, , , bison}, in the
385 It is important to understand that @var{assocs} declarations defines
386 associativity but also assign a precedence level to terminals. All
387 terminals declared in the same @code{left}, @code{right} or
388 @code{nonassoc} association get the same precedence level. The
389 precedence level is increased at each new association.
391 On the other hand, @var{precedence} explicitly assign the precedence
392 level of the given terminal to a rule.
394 @cindex semantic actions
395 @item @anchor{action}action
396 An action is an optional Emacs Lisp function call, like this:
400 The result of an action determines the semantic value of a rule.
402 From an implementation standpoint, the function call will be embedded
403 in a lambda expression, and several useful local variables will be
409 Where @var{n} is a positive integer. Like in Bison, the value of
410 @code{$@var{n}} is the semantic value of the @var{n}th element of
411 @var{components}, starting from 1. It can be of any Lisp data
414 @vindex $region@var{n}
416 Where @var{n} is a positive integer. For each @code{$@var{n}}
417 variable defined there is a corresponding @code{$region@var{n}}
418 variable. Its value is a pair @code{(@var{start-pos} .
419 @var{end-pos})} that represent the start and end positions (in the
420 lexical input stream) of the @code{$@var{n}} value. It can be
421 @code{nil} when the component positions are not available, like for an
422 empty string component for example.
426 Its value is the leftmost and rightmost positions of input data
427 matched by all @var{components} in the rule. This is a pair
428 @code{(@var{leftmost-pos} . @var{rightmost-pos})}. It can be
429 @code{nil} when components positions are not available.
433 This variable is initialized with the nonterminal symbol
434 (@var{nonterm}) the rule belongs to. It could be useful to improve
435 error reporting or debugging. It is also used to automatically
436 provide incremental re-parse entry points for @semantic{} tags
437 (@pxref{Wisent Semantic}).
441 The value of @code{$action} is the symbolic name of the current
442 semantic action (@pxref{Debugging actions}).
445 When an action is not specified a default value is supplied, it is
446 @code{(identity $1)}. This means that the default semantic value of a
447 rule is the value of its first component. Excepted for a rule
448 matching the empty string, for which the default action is to return
453 @node Example, Compiling a grammar, Grammar format, Wisent Grammar
454 @comment node-name, next, previous, up
457 @cindex grammar example
458 Here is an example to parse simple infix arithmetic expressions. See
459 @ref{Infix Calc, , , bison}, in the Bison manual for details.
467 ;; Terminal associativity & precedence
478 (format "%s %s" $1 $2))
492 (string-to-number $1))
514 In the bison-like @dfn{WY} format (@pxref{Wisent Semantic}) the
515 grammar looks like this:
521 %nonassoc '=' ;; comparison
524 %left NEG ;; negation--unary minus
525 %right '^' ;; exponentiation
532 (format "%s %s" $1 $2)
546 (string-to-number $1)
569 @node Compiling a grammar, Conflicts, Example, Wisent Grammar
570 @comment node-name, next, previous, up
571 @section Compiling a grammar
574 After providing a context-free grammar in a suitable format, it must
575 be translated into a set of tables (an @dfn{automaton}) that will be
576 used to derive the parser. Like Bison, Wisent translates grammars that
577 must be @dfn{LALR(1)}.
579 @cindex LALR(1) grammar
580 @cindex look-ahead token
581 A grammar is @acronym{LALR(1)} if it is possible to tell how to parse
582 any portion of an input string with just a single token of look-ahead:
583 the @dfn{look-ahead token}. See @ref{Language and Grammar, , ,
584 bison}, in the Bison manual for more information.
586 @cindex grammar compilation
587 Grammar translation (compilation) is achieved by the function:
589 @cindex compiling a grammar
590 @vindex wisent-single-start-flag
591 @findex wisent-compile-grammar
592 @defun wisent-compile-grammar grammar &optional start-list
593 Compile @var{grammar} and return an @acronym{LALR(1)} automaton.
595 Optional argument @var{start-list} is a list of start symbols
596 (nonterminals). If @code{nil} the first nonterminal defined in the
597 grammar is the default start symbol. If @var{start-list} contains
598 only one element, it defines the start symbol. If @var{start-list}
599 contains more than one element, all are defined as potential start
600 symbols, unless @code{wisent-single-start-flag} is non-@code{nil}. In
601 that case the first element of @var{start-list} defines the start
602 symbol and others are ignored.
604 The @acronym{LALR(1)} automaton is a vector of the form:
606 @code{[@var{actions gotos starts functions}]}
610 A state/token matrix telling the parser what to do at every state
611 based on the current look-ahead token. That is shift, reduce, accept
612 or error. See also @ref{Wisent Parsing}.
615 A state/nonterminal matrix telling the parser the next state to go to
616 after reducing with each rule.
619 An alist which maps the allowed start symbols (nonterminals) to
620 lexical tokens that will be first shifted into the parser stack.
623 An obarray of semantic action symbols. A semantic action is actually
624 an Emacs Lisp function (lambda expression).
628 @node Conflicts, , Compiling a grammar, Wisent Grammar
629 @comment node-name, next, previous, up
632 Normally, a grammar should produce an automaton where at each state
633 the parser has only one action to do (@pxref{Wisent Parsing}).
635 @cindex ambiguous grammar
636 In certain cases, a grammar can produce an automaton where, at some
637 states, there are more than one action possible. Such a grammar is
638 @dfn{ambiguous}, and generates @dfn{conflicts}.
640 @cindex deterministic automaton
641 The parser can't be driven by an automaton which isn't completely
642 @dfn{deterministic}, that is which contains conflicts. It is
643 necessary to resolve the conflicts to eliminate them. Wisent resolves
644 conflicts like Bison does.
646 @cindex grammar conflicts
647 @cindex conflicts resolution
648 There are two sorts of conflicts:
651 @cindex shift/reduce conflicts
652 @item shift/reduce conflicts
653 When either a shift or a reduction would be valid at the same state.
655 Such conflicts are resolved by choosing to shift, unless otherwise
656 directed by operator precedence declarations.
657 See @ref{Shift/Reduce , , , bison}, in the Bison manual for more
660 @cindex reduce/reduce conflicts
661 @item reduce/reduce conflicts
662 That occurs if there are two or more rules that apply to the same
663 sequence of input. This usually indicates a serious error in the
666 Such conflicts are resolved by choosing to use the rule that appears
667 first in the grammar, but it is very risky to rely on this. Every
668 reduce/reduce conflict must be studied and usually eliminated. See
669 @ref{Reduce/Reduce , , , bison}, in the Bison manual for more
674 * Grammar Debugging::
675 * Understanding the automaton::
678 @node Grammar Debugging
679 @subsection Grammar debugging
681 @cindex grammar debugging
682 @cindex grammar verbose description
683 To help writing a new grammar, @code{wisent-compile-grammar} can
684 produce a verbose report containing a detailed description of the
685 grammar and parser (equivalent to what Bison reports with the
686 @option{--verbose} option).
688 To enable the verbose report you can set to non-@code{nil} the
691 @vindex wisent-verbose-flag
692 @deffn Option wisent-verbose-flag
693 non-@code{nil} means to report verbose information on generated parser.
696 Or interactively use the command:
698 @findex wisent-toggle-verbose-flag
699 @deffn Command wisent-toggle-verbose-flag
700 Toggle whether to report verbose information on generated parser.
703 The verbose report is printed in the temporary buffer
704 @code{*wisent-log*} when running interactively, or in file
705 @file{wisent.output} when running in batch mode. Different
706 reports are separated from each other by a line like this:
710 *** Wisent @var{source-file} - 2002-06-27 17:33
714 where @var{source-file} is the name of the Emacs Lisp file from which
715 the grammar was read. See @ref{Understanding the automaton}, for
716 details on the verbose report.
720 To help debugging the grammar compiler itself, you can set this
721 variable to print the content of some internal data structures:
723 @vindex wisent-debug-flag
724 @defvar wisent-debug-flag
725 non-@code{nil} means enable some debug stuff.
729 @node Understanding the automaton
730 @subsection Understanding the automaton
732 @cindex understanding the automaton
733 This section (took from the manual of Bison 1.49) describes how to use
734 the verbose report printed by @code{wisent-compile-grammar} to
735 understand the generated automaton, to tune or fix a grammar.
737 We will use the following example:
741 (let ((wisent-verbose-flag t)) ;; Print a verbose report!
742 (wisent-compile-grammar
743 '((NUM STR) ; %token NUM STR
745 ((left ?+ ?-) ; %left '+' '-';
746 (left ?*)) ; %left '*'
749 ((exp ?+ exp)) ; exp '+' exp
750 ((exp ?- exp)) ; | exp '-' exp
751 ((exp ?* exp)) ; | exp '*' exp
752 ((exp ?/ exp)) ; | exp '/' exp
760 'nil) ; no %start declarations
765 When evaluating the above expression, grammar compilation first issues
766 the following two clear messages:
770 Grammar contains 1 useless nonterminals and 1 useless rules
771 Grammar contains 7 shift/reduce conflicts
775 The @samp{*wisent-log*} buffer details things!
777 The first section reports conflicts that were solved using precedence
778 and/or associativity:
782 Conflict in state 7 between rule 1 and token '+' resolved as reduce.
783 Conflict in state 7 between rule 1 and token '-' resolved as reduce.
784 Conflict in state 7 between rule 1 and token '*' resolved as shift.
785 Conflict in state 8 between rule 2 and token '+' resolved as reduce.
786 Conflict in state 8 between rule 2 and token '-' resolved as reduce.
787 Conflict in state 8 between rule 2 and token '*' resolved as shift.
788 Conflict in state 9 between rule 3 and token '+' resolved as reduce.
789 Conflict in state 9 between rule 3 and token '-' resolved as reduce.
790 Conflict in state 9 between rule 3 and token '*' resolved as reduce.
794 The next section reports useless tokens, nonterminal and rules (note
795 that useless tokens might be used by the scanner):
799 Useless nonterminals:
804 Terminals which are not used:
815 The next section lists states that still have conflicts:
819 State 7 contains 1 shift/reduce conflict.
820 State 8 contains 1 shift/reduce conflict.
821 State 9 contains 1 shift/reduce conflict.
822 State 10 contains 4 shift/reduce conflicts.
826 The next section reproduces the grammar used:
841 And reports the uses of the symbols:
845 Terminals, with rules where they appear
857 Nonterminals, with rules where they appear
860 on left: 1 2 3 4 5, on right: 1 2 3 4
864 The report then details the automaton itself, describing each state
865 with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
866 item is a production rule together with a point (marked by @samp{.})
867 that the input cursor.
873 NUM shift, and go to state 1
879 State 0 corresponds to being at the very beginning of the parsing, in
880 the initial rule, right before the start symbol (@samp{exp}). When
881 the parser returns to this state right after having reduced a rule
882 that produced an @samp{exp}, it jumps to state 2. If there is no such
883 transition on a nonterminal symbol, and the lookahead is a @samp{NUM},
884 then this token is shifted on the parse stack, and the control flow
885 jumps to state 1. Any other lookahead triggers a parse error.
893 exp -> NUM . (rule 5)
895 $default reduce using rule 5 (exp)
899 the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead
900 (@samp{$default}), the parser will reduce it. If it was coming from
901 state 0, then, after this reduction it will return to state 0, and
902 will jump to state 2 (@samp{exp: go to state 2}).
908 exp -> exp . '+' exp (rule 1)
909 exp -> exp . '-' exp (rule 2)
910 exp -> exp . '*' exp (rule 3)
911 exp -> exp . '/' exp (rule 4)
913 $EOI shift, and go to state 11
914 '+' shift, and go to state 3
915 '-' shift, and go to state 4
916 '*' shift, and go to state 5
917 '/' shift, and go to state 6
921 In state 2, the automaton can only shift a symbol. For instance,
922 because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
923 @samp{+}, it will be shifted on the parse stack, and the automaton
924 control will jump to state 3, corresponding to the item
925 @samp{exp -> exp . '+' exp}:
931 exp -> exp '+' . exp (rule 1)
933 NUM shift, and go to state 1
939 Since there is no default action, any other token than those listed
940 above will trigger a parse error.
942 The interpretation of states 4 to 6 is straightforward:
948 exp -> exp '-' . exp (rule 2)
950 NUM shift, and go to state 1
958 exp -> exp '*' . exp (rule 3)
960 NUM shift, and go to state 1
968 exp -> exp '/' . exp (rule 4)
970 NUM shift, and go to state 1
976 As was announced in beginning of the report, @samp{State 7 contains 1
977 shift/reduce conflict.}:
983 exp -> exp . '+' exp (rule 1)
984 exp -> exp '+' exp . (rule 1)
985 exp -> exp . '-' exp (rule 2)
986 exp -> exp . '*' exp (rule 3)
987 exp -> exp . '/' exp (rule 4)
989 '*' shift, and go to state 5
990 '/' shift, and go to state 6
992 '/' [reduce using rule 1 (exp)]
993 $default reduce using rule 1 (exp)
997 Indeed, there are two actions associated to the lookahead @samp{/}:
998 either shifting (and going to state 6), or reducing rule 1. The
999 conflict means that either the grammar is ambiguous, or the parser
1000 lacks information to make the right decision. Indeed the grammar is
1001 ambiguous, as, since we did not specify the precedence of @samp{/},
1002 the sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM
1003 / NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM +
1004 NUM) / NUM}, which corresponds to reducing rule 1.
1006 Because in @acronym{LALR(1)} parsing a single decision can be made,
1007 Wisent arbitrarily chose to disable the reduction, see
1008 @ref{Conflicts}. Discarded actions are reported in between square
1011 Note that all the previous states had a single possible action: either
1012 shifting the next token and going to the corresponding state, or
1013 reducing a single rule. In the other cases, i.e., when shifting
1014 @emph{and} reducing is possible or when @emph{several} reductions are
1015 possible, the lookahead is required to select the action. State 7 is
1016 one such state: if the lookahead is @samp{*} or @samp{/} then the
1017 action is shifting, otherwise the action is reducing rule 1. In other
1018 words, the first two items, corresponding to rule 1, are not eligible
1019 when the lookahead is @samp{*}, since we specified that @samp{*} has
1020 higher precedence that @samp{+}. More generally, some items are
1021 eligible only with some set of possible lookaheads.
1023 States 8 to 10 are similar:
1029 exp -> exp . '+' exp (rule 1)
1030 exp -> exp . '-' exp (rule 2)
1031 exp -> exp '-' exp . (rule 2)
1032 exp -> exp . '*' exp (rule 3)
1033 exp -> exp . '/' exp (rule 4)
1035 '*' shift, and go to state 5
1036 '/' shift, and go to state 6
1038 '/' [reduce using rule 2 (exp)]
1039 $default reduce using rule 2 (exp)
1045 exp -> exp . '+' exp (rule 1)
1046 exp -> exp . '-' exp (rule 2)
1047 exp -> exp . '*' exp (rule 3)
1048 exp -> exp '*' exp . (rule 3)
1049 exp -> exp . '/' exp (rule 4)
1051 '/' shift, and go to state 6
1053 '/' [reduce using rule 3 (exp)]
1054 $default reduce using rule 3 (exp)
1060 exp -> exp . '+' exp (rule 1)
1061 exp -> exp . '-' exp (rule 2)
1062 exp -> exp . '*' exp (rule 3)
1063 exp -> exp . '/' exp (rule 4)
1064 exp -> exp '/' exp . (rule 4)
1066 '+' shift, and go to state 3
1067 '-' shift, and go to state 4
1068 '*' shift, and go to state 5
1069 '/' shift, and go to state 6
1071 '+' [reduce using rule 4 (exp)]
1072 '-' [reduce using rule 4 (exp)]
1073 '*' [reduce using rule 4 (exp)]
1074 '/' [reduce using rule 4 (exp)]
1075 $default reduce using rule 4 (exp)
1079 Observe that state 10 contains conflicts due to the lack of precedence
1080 of @samp{/} wrt @samp{+}, @samp{-}, and @samp{*}, but also because the
1081 associativity of @samp{/} is not specified.
1083 Finally, the state 11 (plus 12) is named the @dfn{final state}, or the
1084 @dfn{accepting state}:
1090 $EOI shift, and go to state 12
1100 The end of input is shifted @samp{$EOI shift,} and the parser exits
1101 successfully (@samp{go to state 12}, that terminates).
1103 @node Wisent Parsing
1104 @chapter Wisent Parsing
1106 @cindex bottom-up parser
1107 @cindex shift-reduce parser
1108 The Wisent's parser is what is called a @dfn{bottom-up} or
1109 @dfn{shift-reduce} parser which repeatedly:
1114 That is pushes the value of the last lexical token read (the
1115 look-ahead token) into a value stack, and reads a new one.
1119 That is replaces a nonterminal by its semantic value. The values of
1120 the components which form the right hand side of a rule are popped
1121 from the value stack and reduced by the semantic action of this rule.
1122 The result is pushed back on top of value stack.
1125 The parser will stop on:
1130 When all input has been successfully parsed. The semantic value of
1131 the start nonterminal is on top of the value stack.
1133 @cindex syntax error
1135 When a syntax error (an unexpected token in input) has been detected.
1136 At this point the parser issues an error message and either stops or
1137 calls a recovery routine to try to resume parsing.
1140 @cindex table-driven parser
1141 The above elementary actions are driven by the @acronym{LALR(1)}
1142 automaton built by @code{wisent-compile-grammar} from a context-free
1145 The Wisent's parser is entered by calling the function:
1147 @findex wisent-parse
1148 @defun wisent-parse automaton lexer &optional error start
1149 Parse input using the automaton specified in @var{automaton}.
1153 Is an @acronym{LALR(1)} automaton generated by
1154 @code{wisent-compile-grammar} (@pxref{Wisent Grammar}).
1157 Is a function with no argument called by the parser to obtain the next
1158 terminal (token) in input (@pxref{Writing a lexer}).
1161 Is an optional reporting function called when a parse error occurs.
1162 It receives a message string to report. It defaults to the function
1163 @code{wisent-message} (@pxref{Report errors}).
1166 Specify the start symbol (nonterminal) used by the parser as its goal.
1167 It defaults to the start symbol defined in the grammar
1168 (@pxref{Wisent Grammar}).
1172 The following two normal hooks permit to do some useful processing
1173 respectively before to start parsing, and after the parser terminated.
1175 @vindex wisent-pre-parse-hook
1176 @defvar wisent-pre-parse-hook
1177 Normal hook run just before entering the @var{LR} parser engine.
1180 @vindex wisent-post-parse-hook
1181 @defvar wisent-post-parse-hook
1182 Normal hook run just after the @var{LR} parser engine terminated.
1190 * Debugging actions::
1193 @node Writing a lexer
1194 @section What the parser must receive
1196 It is important to understand that the parser does not parse
1197 characters, but lexical tokens, and does not know anything about
1198 characters in text streams!
1200 @cindex lexical analysis
1203 Reading input data to produce lexical tokens is performed by a lexer
1204 (also called a scanner) in a lexical analysis step, before the syntax
1205 analysis step performed by the parser. The parser automatically calls
1206 the lexer when it needs the next token to parse.
1208 @cindex lexical tokens
1209 A Wisent's lexer is an Emacs Lisp function with no argument. It must
1210 return a valid lexical token of the form:
1212 @code{(@var{token-class value} [@var{start} . @var{end}])}
1216 Is a category of lexical token identifying a terminal as specified in
1217 the grammar (@pxref{Wisent Grammar}). It can be a symbol or a character
1221 Is the value of the lexical token. It can be of any valid Emacs Lisp
1226 Are the optionals beginning and end positions of @var{value} in the
1230 When there are no more tokens to read the lexer must return the token
1231 @code{(list wisent-eoi-term)} to each request.
1233 @vindex wisent-eoi-term
1234 @defvar wisent-eoi-term
1235 Predefined constant, End-Of-Input terminal symbol.
1238 @code{wisent-lex} is an example of a lexer that reads lexical tokens
1239 produced by a @semantic{} lexer, and translates them into lexical tokens
1240 suitable to the Wisent parser. See also @ref{Wisent Lex}.
1242 To call the lexer in a semantic action use the function
1243 @code{wisent-lexer}. See also @ref{Actions goodies}.
1245 @node Actions goodies
1246 @section Variables and macros useful in grammar actions.
1248 @vindex wisent-input
1249 @defvar wisent-input
1250 The last token read.
1251 This variable only has meaning in the scope of @code{wisent-parse}.
1254 @findex wisent-lexer
1256 Obtain the next terminal in input.
1259 @findex wisent-region
1260 @defun wisent-region &rest positions
1261 Return the start/end positions of the region including
1262 @var{positions}. Each element of @var{positions} is a pair
1263 @w{@code{(@var{start-pos} . @var{end-pos})}} or @code{nil}. The
1264 returned value is the pair @w{@code{(@var{min-start-pos} .
1265 @var{max-end-pos})}} or @code{nil} if no @var{positions} are
1270 @section The error reporting function
1272 @cindex error reporting
1273 When the parser encounters a syntax error it calls a user-defined
1274 function. It must be an Emacs Lisp function with one argument: a
1275 string containing the message to report.
1277 By default the parser uses this function to report error messages:
1279 @findex wisent-message
1280 @defun wisent-message string &rest args
1281 Print a one-line message if @code{wisent-parse-verbose-flag} is set.
1282 Pass @var{string} and @var{args} arguments to @dfn{message}.
1287 @code{wisent-message} uses the following function to print lexical
1290 @defun wisent-token-to-string token
1291 Return a printed representation of lexical token @var{token}.
1294 The general printed form of a lexical token is:
1296 @w{@code{@var{token}(@var{value})@@@var{location}}}
1299 To control the verbosity of the parser you can set to non-@code{nil}
1302 @vindex wisent-parse-verbose-flag
1303 @deffn Option wisent-parse-verbose-flag
1304 non-@code{nil} means to issue more messages while parsing.
1307 Or interactively use the command:
1309 @findex wisent-parse-toggle-verbose-flag
1310 @deffn Command wisent-parse-toggle-verbose-flag
1311 Toggle whether to issue more messages while parsing.
1314 When the error reporting function is entered the variable
1315 @code{wisent-input} contains the unexpected token as returned by the
1318 The error reporting function can be called from a semantic action too
1319 using the special macro @code{wisent-error}. When called from a
1320 semantic action entered by error recovery (@pxref{Error recovery}) the
1321 value of the variable @code{wisent-recovering} is non-@code{nil}.
1323 @node Error recovery
1324 @section Error recovery
1326 @cindex error recovery
1327 The error recovery mechanism of the Wisent's parser conforms to the
1328 one Bison uses. See @ref{Error Recovery, , , bison}, in the Bison
1332 To recover from a syntax error you must write rules to recognize the
1333 special token @code{error}. This is a terminal symbol that is
1334 automatically defined and reserved for error handling.
1336 When the parser encounters a syntax error, it pops the state stack
1337 until it finds a state that allows shifting the @code{error} token.
1338 After it has been shifted, if the old look-ahead token is not
1339 acceptable to be shifted next, the parser reads tokens and discards
1340 them until it finds a token which is acceptable.
1342 @cindex error recovery strategy
1343 Strategies for error recovery depend on the choice of error rules in
1344 the grammar. A simple and useful strategy is simply to skip the rest
1345 of the current statement if an error is detected:
1349 (stmnt (( error ?; )) ;; on error, skip until ';' is read
1354 It is also useful to recover to the matching close-delimiter of an
1355 opening-delimiter that has already been parsed:
1359 (primary (( ?@{ expr ?@} ))
1366 @cindex error recovery actions
1367 Note that error recovery rules may have actions, just as any other
1368 rules can. Here are some predefined hooks, variables, functions or
1369 macros, useful in such actions:
1371 @vindex wisent-nerrs
1372 @defvar wisent-nerrs
1373 The number of parse errors encountered so far.
1376 @vindex wisent-recovering
1377 @defvar wisent-recovering
1378 non-@code{nil} means that the parser is recovering.
1379 This variable only has meaning in the scope of @code{wisent-parse}.
1382 @findex wisent-error
1383 @defun wisent-error msg
1384 Call the user supplied error reporting function with message
1385 @var{msg} (@pxref{Report errors}).
1387 For an example of use, @xref{wisent-skip-token}.
1390 @findex wisent-errok
1392 Resume generating error messages immediately for subsequent syntax
1395 The parser suppress error message for syntax errors that happens
1396 shortly after the first, until three consecutive input tokens have
1397 been successfully shifted.
1399 Calling @code{wisent-errok} in an action, make error messages resume
1400 immediately. No error messages will be suppressed if you call it in
1401 an error rule's action.
1403 For an example of use, @xref{wisent-skip-token}.
1406 @findex wisent-clearin
1407 @defun wisent-clearin
1408 Discard the current lookahead token.
1409 This will cause a new lexical token to be read.
1411 In an error rule's action the previous lookahead token is reanalyzed
1412 immediately. @code{wisent-clearin} may be called to clear this token.
1414 For example, suppose that on a parse error, an error handling routine
1415 is called that advances the input stream to some point where parsing
1416 should once again commence. The next symbol returned by the lexical
1417 scanner is probably correct. The previous lookahead token ought to
1418 be discarded with @code{wisent-clearin}.
1420 For an example of use, @xref{wisent-skip-token}.
1423 @findex wisent-abort
1425 Abort parsing and save the lookahead token.
1428 @findex wisent-set-region
1429 @defun wisent-set-region start end
1430 Change the region of text matched by the current nonterminal.
1431 @var{start} and @var{end} are respectively the beginning and end
1432 positions of the region occupied by the group of components associated
1433 to this nonterminal. If @var{start} or @var{end} values are not a
1434 valid positions the region is set to @code{nil}.
1436 For an example of use, @xref{wisent-skip-token}.
1439 @vindex wisent-discarding-token-functions
1440 @defvar wisent-discarding-token-functions
1441 List of functions to be called when discarding a lexical token.
1442 These functions receive the lexical token discarded.
1443 When the parser encounters unexpected tokens, it can discards them,
1444 based on what directed by error recovery rules. Either when the
1445 parser reads tokens until one is found that can be shifted, or when an
1446 semantic action calls the function @code{wisent-skip-token} or
1447 @code{wisent-skip-block}.
1448 For language specific hooks, make sure you define this as a local
1451 For example, in @semantic{}, this hook is set to the function
1452 @code{wisent-collect-unmatched-syntax} to collect unmatched lexical
1453 tokens (@pxref{Useful functions}).
1456 @findex wisent-skip-token
1457 @defun wisent-skip-token
1458 @anchor{wisent-skip-token}
1459 Skip the lookahead token in order to resume parsing.
1461 Must be used in error recovery semantic actions.
1463 It typically looks like this:
1467 (wisent-message "%s: skip %s" $action
1468 (wisent-token-to-string wisent-input))
1470 'wisent-discarding-token-functions wisent-input)
1477 @findex wisent-skip-block
1478 @defun wisent-skip-block
1479 Safely skip a block in order to resume parsing.
1481 Must be used in error recovery semantic actions.
1483 A block is data between an open-delimiter (syntax class @code{(}) and
1484 a matching close-delimiter (syntax class @code{)}):
1488 (a parenthesized block)
1489 [a block between brackets]
1490 @{a block between braces@}
1494 The following example uses @code{wisent-skip-block} to safely skip a
1495 block delimited by @samp{LBRACE} (@code{@{}) and @samp{RBRACE}
1496 (@code{@}}) tokens, when a syntax error occurs in
1497 @samp{other-components}:
1501 (block ((LBRACE other-components RBRACE))
1504 (wisent-skip-block))
1510 @node Debugging actions
1511 @section Debugging semantic actions
1513 @cindex semantic action symbols
1514 Each semantic action is represented by a symbol interned in an
1515 @dfn{obarray} that is part of the @acronym{LALR(1)} automaton
1516 (@pxref{Compiling a grammar}). @code{symbol-function} on a semantic
1517 action symbol return the semantic action lambda expression.
1519 A semantic action symbol name has the form
1520 @code{@var{nonterminal}:@var{index}}, where @var{nonterminal} is the
1521 name of the nonterminal symbol the action belongs to, and @var{index}
1522 is an action sequence number within the scope of @var{nonterminal}.
1523 For example, this nonterminal definition:
1528 line [@code{input:0}]
1530 (format "%s %s" $1 $2) [@code{input:1}]
1535 Will produce two semantic actions, and associated symbols:
1539 A default action that returns @code{$1}.
1542 That returns @code{(format "%s %s" $1 $2)}.
1545 @cindex debugging semantic actions
1546 Debugging uses the Lisp debugger to investigate what is happening
1547 during execution of semantic actions.
1548 Three commands are available to debug semantic actions. They receive
1552 @item The automaton that contains the semantic action.
1554 @item The semantic action symbol.
1557 @findex wisent-debug-on-entry
1558 @deffn Command wisent-debug-on-entry automaton function
1559 Request @var{automaton}'s @var{function} to invoke debugger each time it is called.
1560 @var{function} must be a semantic action symbol that exists in @var{automaton}.
1563 @findex wisent-cancel-debug-on-entry
1564 @deffn Command wisent-cancel-debug-on-entry automaton function
1565 Undo effect of @code{wisent-debug-on-entry} on @var{automaton}'s @var{function}.
1566 @var{function} must be a semantic action symbol that exists in @var{automaton}.
1569 @findex wisent-debug-show-entry
1570 @deffn Command wisent-debug-show-entry automaton function
1571 Show the source of @var{automaton}'s semantic action @var{function}.
1572 @var{function} must be a semantic action symbol that exists in @var{automaton}.
1575 @node Wisent Semantic
1576 @chapter How to use Wisent with Semantic
1579 This section presents how the Wisent's parser can be used to produce
1580 @dfn{tags} for the @semantic{} tool set.
1582 @semantic{} tags form a hierarchy of Emacs Lisp data structures that
1583 describes a program in a way independent of programming languages.
1584 Tags map program declarations, like functions, methods, variables,
1585 data types, classes, includes, grammar rules, etc..
1587 @cindex WY grammar format
1588 To use the Wisent parser with @semantic{} you have to define
1589 your grammar in @dfn{WY} form, a grammar format very close
1590 to the one used by Bison.
1592 Please @inforef{top, Semantic Grammar Framework Manual, grammar-fw}
1593 for more information on @semantic{} grammars.
1600 @node Grammar styles
1601 @section Grammar styles
1603 @cindex grammar styles
1604 @semantic{} parsing heavily depends on how you wrote the grammar.
1605 There are mainly two styles to write a Wisent's grammar intended to be
1606 used with the @semantic{} tool set: the @dfn{Iterative style} and the
1607 @dfn{Bison style}. Each one has pros and cons, and in certain cases
1608 it can be worth a mix of the two styles!
1614 * Start nonterminals::
1615 * Useful functions::
1618 @node Iterative style, Bison style, Grammar styles, Grammar styles
1619 @subsection Iterative style
1621 @cindex grammar iterative style
1622 The @dfn{iterative style} is the preferred style to use with @semantic{}.
1623 It relies on an iterative parser back-end mechanism which parses start
1624 nonterminals one at a time and automagically skips unexpected lexical
1627 Compared to rule-based iterative functions (@pxref{Bison style}),
1628 iterative parsers are better in that they can handle obscure errors
1632 Each start nonterminal must produces a @dfn{raw tag} by calling a
1633 @code{TAG}-like grammar macro with appropriate parameters. See also
1634 @ref{Start nonterminals}.
1636 @cindex expanded tag
1637 Then, each parsing iteration automatically translates a raw tag into
1638 @dfn{expanded tags}, updating the raw tag structure with internal
1639 properties and buffer related data.
1641 After parsing completes, it results in a tree of expanded tags.
1643 The following example is a snippet of the iterative style Java grammar
1644 provided in the @semantic{} distribution in the file
1645 @file{wisent-java-tags.wy}.
1650 ;; Alternate entry points
1651 ;; - Needed by partial re-parse
1652 %start formal_parameter
1654 ;; - Needed by EXPANDFULL clauses
1655 %start formal_parameters
1658 formal_parameter_list
1660 (EXPANDFULL $1 formal_parameters)
1668 | formal_parameter COMMA
1669 | formal_parameter RPAREN
1673 : formal_parameter_modifier_opt type variable_declarator_id
1674 (VARIABLE-TAG $3 $2 nil :typemodifiers $1)
1680 It shows the use of the @code{EXPANDFULL} grammar macro to parse a
1681 @samp{PAREN_BLOCK} which contains a @samp{formal_parameter_list}.
1682 @code{EXPANDFULL} tells to recursively parse @samp{formal_parameters}
1683 inside @samp{PAREN_BLOCK}. The parser iterates until it digested all
1684 available input data inside the @samp{PAREN_BLOCK}, trying to match
1685 any of the @samp{formal_parameters} rules:
1692 @item @samp{formal_parameter COMMA}
1694 @item @samp{formal_parameter RPAREN}
1697 At each iteration it will return a @samp{formal_parameter} raw tag,
1698 or @code{nil} to skip unwanted (single @samp{LPAREN} or @samp{RPAREN}
1699 for example) or unexpected input data. Those raw tags will be
1700 automatically expanded by the iterative back-end parser.
1703 @subsection Bison style
1705 @cindex grammar bison style
1706 What we call the @dfn{Bison style} is the traditional style of Bison's
1707 grammars. Compared to iterative style, it is not straightforward to
1708 use grammars written in Bison style in @semantic{}. Mainly because such
1709 grammars are designed to parse the whole input data in one pass, and
1710 don't use the iterative parser back-end mechanism (@pxref{Iterative
1711 style}). With Bison style the parser is called once to parse the
1712 grammar start nonterminal.
1714 The following example is a snippet of the Bison style Java grammar
1715 provided in the @semantic{} distribution in the file
1716 @file{wisent-java.wy}.
1720 %start formal_parameter
1723 formal_parameter_list
1724 : formal_parameter_list COMMA formal_parameter
1731 : formal_parameter_modifier_opt type variable_declarator_id
1733 (VARIABLE-TAG $3 $2 :typemodifiers $1)
1739 The first consequence is that syntax errors are not automatically
1740 handled by @semantic{}. Thus, it is necessary to explicitly handle
1741 them at the grammar level, providing error recovery rules to skip
1742 unexpected input data.
1744 The second consequence is that the iterative parser can't do automatic
1745 tag expansion, except for the start nonterminal value. It is
1746 necessary to explicitly expand tags from concerned semantic actions by
1747 calling the grammar macro @code{EXPANDTAG} with a raw tag as
1748 parameter. See also @ref{Start nonterminals}, for incremental
1749 re-parse considerations.
1752 @subsection Mixed style
1754 @cindex grammar mixed style
1759 %start prologue epilogue declaration nonterminal rule
1774 SYMBOL COLON rules SEMI
1775 (TAG $1 'nonterminal :children $3)
1780 (apply 'nconc (nreverse $1))
1793 name type comps prec action elt)
1796 (TAG name 'rule :type type :value comps :prec prec :expr action)
1802 This example shows how iterative and Bison styles can be combined in
1803 the same grammar to obtain a good compromise between grammar
1804 complexity and an efficient parsing strategy in an interactive
1807 @samp{nonterminal} is parsed using iterative style via the main
1808 @samp{grammar} rule. The semantic action uses the @code{TAG} macro to
1809 produce a raw tag, automagically expanded by @semantic{}.
1811 But @samp{rules} part is parsed in Bison style! Why?
1813 Rule delimiters are the colon (@code{:}), that follows the nonterminal
1814 name, and a final semicolon (@code{;}). Unfortunately these
1815 delimiters are not @code{open-paren}/@code{close-paren} type, and the
1816 Emacs' syntactic analyzer can't easily isolate data between them to
1817 produce a @samp{RULES_PART} parenthesis-block-like lexical token.
1818 Consequently it is not possible to use @code{EXPANDFULL} to iterate in
1819 @samp{RULES_PART}, like this:
1824 SYMBOL COLON rules SEMI
1825 (TAG $1 'nonterminal :children $3)
1829 RULES_PART ;; @strong{Map a parenthesis-block-like lexical token}
1830 (EXPANDFULL $1 'rules)
1843 name type comps prec action elt)
1845 (TAG name 'rule :type type :value comps :prec prec :expr action)
1851 In such cases, when it is difficult for Emacs to obtain
1852 parenthesis-block-like lexical tokens, the best solution is to use the
1853 traditional Bison style with error recovery!
1855 In some extreme cases, it can also be convenient to extend the lexer,
1856 to deliver new lexical tokens, to simplify the grammar.
1858 @node Start nonterminals
1859 @subsection Start nonterminals
1861 @cindex start nonterminals
1862 @cindex @code{reparse-symbol} property
1863 When you write a grammar for @semantic{}, it is important to carefully
1864 indicate the start nonterminals. Each one defines an entry point in
1865 the grammar, and after parsing its semantic value is returned to the
1866 back-end iterative engine. Consequently:
1868 @strong{The semantic value of a start nonterminal must be a produced
1869 by a TAG like grammar macro}.
1871 Start nonterminals are declared by @code{%start} statements. When
1872 nothing is specified the first nonterminal that appears in the grammar
1873 is the start nonterminal.
1875 Generally, the following nonterminals must be declared as start
1879 @item The main grammar entry point
1884 @item nonterminals passed to @code{EXPAND}/@code{EXPANDFULL}
1886 These grammar macros recursively parse a part of input data, based on
1887 rules of the given nonterminal.
1889 For example, the following will parse @samp{PAREN_BLOCK} data using
1890 the @samp{formal_parameters} rules:
1894 formal_parameter_list
1896 (EXPANDFULL $1 formal_parameters)
1901 The semantic value of @samp{formal_parameters} becomes the value of
1902 the @code{EXPANDFULL} expression. It is a list of @semantic{} tags
1903 spliced in the tags tree.
1905 Because the automaton must know that @samp{formal_parameters} is a
1906 start symbol, you must declare it like this:
1910 %start formal_parameters
1916 @cindex incremental re-parse
1917 @cindex reparse-symbol
1918 The @code{EXPANDFULL} macro has a side effect it is important to know,
1919 related to the incremental re-parse mechanism of @semantic{}: the
1920 nonterminal symbol parameter passed to @code{EXPANDFULL} also becomes
1921 the @code{reparse-symbol} property of the tag returned by the
1922 @code{EXPANDFULL} expression.
1924 When buffer's data mapped by a tag is modified, @semantic{}
1925 schedules an incremental re-parse of that data, using the tag's
1926 @code{reparse-symbol} property as start nonterminal.
1928 @strong{The rules associated to such start symbols must be carefully
1929 reviewed to ensure that the incremental parser will work!}
1931 Things are a little bit different when the grammar is written in Bison
1934 @strong{The @code{reparse-symbol} property is set to the nonterminal
1935 symbol the rule that explicitly uses @code{EXPANDTAG} belongs to.}
1944 name type comps prec action elt)
1947 (TAG name 'rule :type type :value comps :prec prec :expr action)
1953 Set the @code{reparse-symbol} property of the expanded tag to
1954 @samp{rule}. A important consequence is that:
1956 @strong{Every nonterminal having any rule that calls @code{EXPANDTAG}
1957 in a semantic action, should be declared as a start symbol!}
1959 @node Useful functions
1960 @subsection Useful functions
1962 Here is a description of some predefined functions it might be useful
1963 to know when writing new code to use Wisent in @semantic{}:
1965 @findex wisent-collect-unmatched-syntax
1966 @defun wisent-collect-unmatched-syntax input
1967 Add @var{input} lexical token to the cache of unmatched tokens, in
1968 variable @code{semantic-unmatched-syntax-cache}.
1970 See implementation of the function @code{wisent-skip-token} in
1971 @ref{Error recovery}, for an example of use.
1975 @section The Wisent Lex lexer
1977 @findex semantic-lex
1978 The lexical analysis step of @semantic{} is performed by the general
1979 function @code{semantic-lex}. For more information, @inforef{Writing
1980 Lexers, ,semantic-langdev}.
1982 @code{semantic-lex} produces lexical tokens of the form:
1986 @code{(@var{token-class start} . @var{end})}
1992 Is a symbol that identifies a lexical token class, like @code{symbol},
1993 @code{string}, @code{number}, or @code{PAREN_BLOCK}.
1997 Are the start and end positions of mapped data in the input buffer.
2000 The Wisent's parser doesn't depend on the nature of analyzed input
2001 stream (buffer, string, etc.), and requires that lexical tokens have a
2002 different form (@pxref{Writing a lexer}):
2006 @code{(@var{token-class value} [@var{start} . @var{end}])}
2010 @cindex lexical token mapping
2011 @code{wisent-lex} is the default Wisent's lexer used in @semantic{}.
2013 @vindex wisent-lex-istream
2016 Return the next available lexical token in Wisent's form.
2018 The variable @code{wisent-lex-istream} contains the list of lexical
2019 tokens produced by @code{semantic-lex}. Pop the next token available
2020 and convert it to a form suitable for the Wisent's parser.
2023 Mapping of lexical tokens as produced by @code{semantic-lex} into
2024 equivalent Wisent lexical tokens is straightforward:
2028 (@var{token-class start} . @var{end})
2029 @result{} (@var{token-class value start} . @var{end})
2033 @var{value} is the input @code{buffer-substring} from @var{start} to
2036 @node GNU Free Documentation License
2037 @appendix GNU Free Documentation License
2052 @c Following comments are for the benefit of ispell.
2054 @c LocalWords: Wisent automagically wisent Wisent's LALR obarray