cgit.sxemacs.org Git - packages/blob - xemacs-packages/semantic/doc/bovine.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename bovine.info
   4 @set TITLE  Bovine parser development
   5 @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
   6 @settitle @value{TITLE}
   7
   8 @c *************************************************************************
   9 @c @ Header
  10 @c *************************************************************************
  11
  12 @c Merge all indexes into a single index for now.
  13 @c We can always separate them later into two or more as needed.
  14 @syncodeindex vr cp
  15 @syncodeindex fn cp
  16 @syncodeindex ky cp
  17 @syncodeindex pg cp
  18 @syncodeindex tp cp
  19
  20 @c @footnotestyle separate
  21 @c @paragraphindent 2
  22 @c @@smallbook
  23 @c %**end of header
  24
  25 @copying
  26 This manual documents Bovine parser development in Semantic
  27
  28 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam
  29 Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce
  30 Copyright @copyright{} 2002, 2003 Richard Y. Kim
  31
  32 @quotation
  33 Permission is granted to copy, distribute and/or modify this document
  34 under the terms of the GNU Free Documentation License, Version 1.1 or
  35 any later version published by the Free Software Foundation; with the
  36 Invariant Sections being list their titles, with the Front-Cover Texts
  37 being list, and with the Back-Cover Texts being list.  A copy of the
  38 license is included in the section entitled ``GNU Free Documentation
  39 License''.
  40 @end quotation
  41 @end copying
  42
  43 @ifinfo
  44 @dircategory Emacs
  45 @direntry
  46 * Semantic bovine parser development: (bovine).
  47 @end direntry
  48 @end ifinfo
  49
  50 @iftex
  51 @finalout
  52 @end iftex
  53
  54 @c @setchapternewpage odd
  55 @c @setchapternewpage off
  56
  57 @ifinfo
  58 This file documents parser development with the bovine parser generator
  59 @emph{Infrastructure for parser based text analysis in Emacs}
  60
  61 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
  62 @end ifinfo
  63
  64 @titlepage
  65 @sp 10
  66 @title @value{TITLE}
  67 @author by @value{AUTHOR}
  68 @vskip 0pt plus 1 fill
  69 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
  70 @page
  71 @vskip 0pt plus 1 fill
  72 @insertcopying
  73 @end titlepage
  74 @page
  75
  76 @c MACRO inclusion
  77 @include semanticheader.texi
  78
  79
  80 @c *************************************************************************
  81 @c @ Document
  82 @c *************************************************************************
  83 @contents
  84
  85 @node top
  86 @top @value{TITLE}
  87
  88 The @dfn{bovine} parser is the original @semantic{} parser, and is an
  89 implementation of an @acronym{LL} parser.  It is good for simple
  90 languages.  It has many conveniences making grammar writing easy.  The
  91 conveniences make it less powerful than a Bison-like @acronym{LALR}
  92 parser.  For more information, @inforef{top, the Wisent Parser Manual,
  93 wisent}.
  94
  95 Bovine @acronym{LL} grammars are stored in files with a @file{.by}
  96 extension.  When compiled, the contents is converted into a file of
  97 the form @file{NAME-by.el}.  This, in turn is byte compiled.
  98 @inforef{top, Grammar Framework Manual, grammar-fw}.
  99
 100 @menu
 101 * Starting Rules::              The starting rules for the grammar.
 102 * Bovine Grammar Rules::        Rules used to parse a language
 103 * Optional Lambda Expression::  Actions to take when a rule is matched
 104 * Bovine Examples::             Simple Samples
 105 * GNU Free Documentation License::
 106 * Index::
 107 @end menu
 108
 109 @node Starting Rules
 110 @chapter Starting Rules
 111
 112 In Bison, one and only one nonterminal is designated as the ``start''
 113 symbol.  In @semantic{}, one or more nonterminals can be designated as
 114 the ``start'' symbol.  They are declared following the @code{%start}
 115 keyword separated by spaces.  @inforef{start Decl, ,grammar-fw}.
 116
 117 If no @code{%start} keyword is used in a grammar, then the very first
 118 is used.  Internally the first start nonterminal is targeted by the
 119 reserved symbol @code{bovine-toplevel}, so it can be found by the
 120 parser harness.
 121
 122 To find locally defined variables, the local context handler needs to
 123 parse the body of functional code.  The @code{scopestart} declaration
 124 specifies the name of a nonterminal used as the goal to parse a local
 125 context, @inforef{scopestart Decl, ,grammar-fw}.  Internally the
 126 scopestart nonterminal is targeted by the reserved symbol
 127 @code{bovine-inner-scope}, so it can be found by the parser harness.
 128
 129 @node Bovine Grammar Rules
 130 @chapter Bovine Grammar Rules
 131
 132 The rules are what allow the compiler to create tags from a language
 133 file.  Once the setup is done in the prologue, you can start writing
 134 rules.  @inforef{Grammar Rules, ,grammar-fw}.
 135
 136 @example
 137 @var{result} : @var{components1} @var{optional-semantic-action1})
 138        | @var{components2} @var{optional-semantic-action2}
 139        ;
 140 @end example
 141
 142 @var{result} is a nonterminal, that is a symbol synthesized in your grammar.
 143 @var{components} is a list of elements that are to be matched if @var{result}
 144 is to be made.  @var{optional-semantic-action} is an optional sequence
 145 of simplified Emacs Lisp expressions for concocting the parse tree.
 146
 147 In bison, each time an element of @var{components} is found, it is
 148 @dfn{shifted} onto the parser stack.  (The stack of matched elements.)
 149 When all @var{components}' elements have been matched, it is
 150 @dfn{reduced} to @var{result}.  @xref{(bison)Algorithm}.
 151
 152 A particular @var{result} written into your grammar becomes
 153 the parser's goal.  It is designated by a @code{%start} statement
 154 (@pxref{Starting Rules}).  The value returned by the associated
 155 @var{optional-semantic-action} is the parser's result.  It should be
 156 a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
 157 semantic-appdev}.
 158
 159 @var{components} is made up of symbols.  A symbol such as @code{FOO}
 160 means that a syntactic token of class @code{FOO} must be matched.
 161
 162 @menu
 163 * How Lexical Tokens Match::
 164 * Grammar-to-Lisp Details::
 165 * Order of components in rules::
 166 @end menu
 167
 168 @node How Lexical Tokens Match
 169 @section How Lexical Tokens Match
 170
 171 A lexical rule must be used to define how to match a lexical token.
 172
 173 For instance:
 174
 175 @example
 176 %keyword FOO "foo"
 177 @end example
 178
 179 Means that @code{FOO} is a reserved language keyword, matched as such
 180 by looking up into a keyword table, @inforef{keyword Decl,
 181 ,grammar-fw}.  This is because @code{"foo"} will be converted to
 182 @code{FOO} in the lexical analysis stage.  Thus the symbol @code{FOO}
 183 won't be available any other way.
 184
 185 If we specify our token in this way:
 186
 187 @example
 188 %token <symbol> FOO "foo"
 189 @end example
 190
 191 then @code{FOO} will match the string @code{"foo"} explicitly, but it
 192 won't do so at the lexical level, allowing use of the text
 193 @code{"foo"} in other forms of regular expressions.
 194
 195 In that case, @code{FOO} is a @code{symbol}-type token.  To match, a
 196 @code{symbol} must first be encountered, and then it must
 197 @code{string-match "foo"}.
 198
 199 @table @strong
 200 @item Caution:
 201 Be especially careful to remember that @code{"foo"}, and more
 202 generally the %token's match-value string, is a regular expression!
 203 @end table
 204
 205 Non symbol tokens are also allowed.  For example:
 206
 207 @example
 208 %token <punctuation> PERIOD "[.]"
 209
 210 filename : symbol PERIOD symbol
 211          ;
 212 @end example
 213
 214 @code{PERIOD} is a @code{punctuation}-type token that will explicitly
 215 match one period when used in the above rule.
 216
 217 @table @strong
 218 @item Please Note:
 219 @code{symbol}, @code{punctuation}, etc., are predefined lexical token
 220 types, based on the @dfn{syntax class}-character associations
 221 currently in effect.
 222 @end table
 223
 224 @node Grammar-to-Lisp Details
 225 @section Grammar-to-Lisp Details
 226
 227 For the bovinator, lexical token matching patterns are @emph{inlined}.
 228 When the grammar-to-lisp converter encounters a lexical token
 229 declaration of the form:
 230
 231 @example
 232 %token <@var{type}> @var{token-name} @var{match-value}
 233 @end example
 234
 235 It substitutes every occurrences of @var{token-name} in rules, by its
 236 expanded form:
 237
 238 @example
 239 @var{type} @var{match-value}
 240 @end example
 241
 242 For example:
 243
 244 @example
 245 %token <symbol> MOOSE "moose"
 246
 247 find_a_moose: MOOSE
 248             ;
 249 @end example
 250
 251 Will generate this pseudo equivalent-rule:
 252
 253 @example
 254 find_a_moose: symbol "moose"   ;; invalid syntax!
 255             ;
 256 @end example
 257
 258 Thus, from the bovinator point of view, the @var{components} part of a
 259 rule is made up of symbols and strings.  A string in the mix means
 260 that the previous symbol must have the additional constraint of
 261 exactly matching it, as described in @ref{How Lexical Tokens Match}.
 262
 263 @table @strong
 264 @item Please Note:
 265 For the bovinator, this task was mixed into the language definition to
 266 simplify implementation, though Bison's technique is more efficient.
 267 @end table
 268
 269 @node Order of components in rules
 270 @section Order of components in rules
 271
 272 If a rule has multiple components, order is important, for example
 273
 274 @example
 275 headerfile : symbol PERIOD symbol
 276            | symbol
 277            ;
 278 @end example
 279
 280 would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
 281 The bovine parser will first attempt to match the long form, and then
 282 the short form.  If they were in reverse order, then the long form
 283 would never be tested.
 284
 285 @c @xref{Default syntactic tokens}.
 286
 287 @node Optional Lambda Expression
 288 @chapter Optional Lambda Expressions
 289
 290 The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
 291 a bovine lambda.  This lambda has special short-cuts to simplify
 292 reading the semantic action definition.  An @acronym{OLE} like this:
 293
 294 @example
 295 ( $1 )
 296 @end example
 297
 298 results in a lambda return which consists entirely of the string
 299 or object found by matching the first (zeroth) element of match.
 300 An @acronym{OLE} like this:
 301
 302 @example
 303 ( ,(foo $1) )
 304 @end example
 305
 306 executes @code{foo} on the first argument, and then splices its return
 307 into the return list whereas:
 308
 309 @example
 310 ( (foo $1) )
 311 @end example
 312
 313 executes @code{foo}, and that is placed in the return list.
 314
 315 Here are other things that can appear inline:
 316
 317 @table @code
 318 @item $1
 319 The first object matched.
 320
 321 @item ,$1
 322 The first object spliced into the list (assuming it is a list from a
 323 non-terminal).
 324
 325 @item '$1
 326 The first object matched, placed in a list.  i.e. @code{( $1 )}.
 327
 328 @item foo
 329 The symbol @code{foo} (exactly as displayed).
 330
 331 @item (foo)
 332 A function call to foo which is stuck into the return list.
 333
 334 @item ,(foo)
 335 A function call to foo which is spliced into the return list.
 336
 337 @item '(foo)
 338 A function call to foo which is stuck into the return list in a list.
 339
 340 @item (EXPAND @var{$1} @var{nonterminal} @var{depth})
 341 A list starting with @code{EXPAND} performs a recursive parse on the
 342 token passed to it (represented by @samp{$1} above.)  The
 343 @dfn{semantic list} is a common token to expand, as there are often
 344 interesting things in the list.  The @var{nonterminal} is a symbol in
 345 your table which the bovinator will start with when parsing.
 346 @var{nonterminal}'s definition is the same as any other nonterminal.
 347 @var{depth} should be at least @samp{1} when descending into a
 348 semantic list.
 349
 350 @item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
 351 Is like @code{EXPAND}, except that the parser will iterate over
 352 @var{nonterminal} until there are no more matches.  (The same way the
 353 parser iterates over the starting rule (@pxref{Starting Rules}). This
 354 lets you have much simpler rules in this specific case, and also lets
 355 you have positional information in the returned tokens, and error
 356 skipping.
 357
 358 @item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
 359 This is used for creating an association list.  Each @var{symbol} is
 360 included in the list if the associated @var{value} is non-@code{nil}.
 361 While the items are all listed explicitly, the created structure is an
 362 association list of the form:
 363
 364 @example
 365 ((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
 366 @end example
 367
 368 @item (TAG @var{name} @var{class} [@var{attributes}])
 369 This creates one tag in the current buffer.
 370
 371 @table @var
 372 @item name
 373 Is a string that represents the tag in the language.
 374
 375 @item class
 376 Is the kind of tag being create, such as @code{function}, or
 377 @code{variable}, though any symbol will work.
 378
 379 @item attributes
 380 Is an optional set of labeled values such as @w{@code{:constant-flag t :parent
 381 "parenttype"}}.
 382 @end table
 383
 384 @item  (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
 385 @itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
 386 @itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
 387 @itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
 388 @itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
 389 @itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
 390 Create a tag with @var{name} of respectively the class
 391 @code{variable}, @code{function}, @code{type}, @code{include},
 392 @code{package}, and @code{code}.
 393 See @inforef{Creating Tags, , semantic-appdev} for the lisp
 394 functions these translate into.
 395 @end table
 396
 397 If the symbol @code{%quotemode backquote} is specified, then use
 398 @code{,@@} to splice a list in, and @code{,} to evaluate the expression.
 399 This lets you send @code{$1} as a symbol into a list instead of having
 400 it expanded inline.
 401
 402 @node Bovine Examples
 403 @chapter Examples
 404
 405 The rule:
 406
 407 @example
 408 any-symbol: symbol
 409           ;
 410 @end example
 411
 412 is equivalent to
 413
 414 @example
 415 any-symbol: symbol
 416             ( $1 )
 417           ;
 418 @end example
 419
 420 which, if it matched the string @samp{"A"}, would return
 421
 422 @example
 423 ( "A" )
 424 @end example
 425
 426 If this rule were used like this:
 427
 428 @example
 429 %token <punctuation> EQUAL "="
 430 @dots{}
 431 assign: any-symbol EQUAL any-symbol
 432         ( $1 $3 )
 433       ;
 434 @end example
 435
 436 it would match @samp{"A=B"}, and return
 437
 438 @example
 439 ( ("A") ("B") )
 440 @end example
 441
 442 The letters @samp{A} and @samp{B} come back in lists because
 443 @samp{any-symbol} is a nonterminal, not an actual lexical element.
 444
 445 To get a better result with nonterminals, use @asis{,} to splice lists
 446 in like this:
 447
 448 @example
 449 %token <punctuation> EQUAL "="
 450 @dots{}
 451 assign: any-symbol EQUAL any-symbol
 452         ( ,$1 ,$3 )
 453       ;
 454 @end example
 455
 456 which would return
 457
 458 @example
 459 ( "A" "B" )
 460 @end example
 461
 462 @node GNU Free Documentation License
 463 @appendix GNU Free Documentation License
 464
 465 @include fdl.texi
 466
 467 @node Index
 468 @unnumbered Index
 469 @printindex cp
 470
 471 @iftex
 472 @contents
 473 @summarycontents
 474 @end iftex
 475
 476 @bye
 477
 478 @c Following comments are for the benefit of ispell.
 479
 480 @c  LocalWords:  bovinator inlined