Download presentation
Presentation is loading. Please wait.
Published byDoreen Richard Modified over 9 years ago
1
1 Parsers and Grammar
2
2 Categories of Grammar Rules Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype [ = expression ] {, datatype [ = expression ] } ; access ::= ' public ' | ' protected ' | ' private ' Statements. assignment, if, for, while, do_while Expressions, such as the examples in these slides. Structures such as statement blocks, methods, and entire classes. StatementBlock ::= ' {' { Statement; } ' }'
3
3 Parsing Algorithms (1) Broadly divided into LL and LR. LL algorithms match input directly to left-side symbols, then choose a right-side production that matches the tokens. This is top-down parsing LR algorithms try to match tokens to the right-side productions, then replace groups of tokens with the left-side nonterminal. They continue until the entire input has been "reduced" to the start symbol LALR (look-ahead LR) are a special case of LR; they require a few restrictions to the LR case Reference: Sebesta, section 4.3 - 4.5.
4
4 Parsing Algorithms (2) Look ahead: algorithms must look at next token(s) to decide between alternate productions for current tokens LALR(1) means LALR with 1 token look-ahead LL(1) means LL with 1 token look-ahead LL algorithms are simpler and easier to visualize. LR algorithms are more powerful: can parse some grammars that LL cannot, such as left recursion. yacc, bison, and CUP generate LALR(1) parsers Recursive-descent is a useful LL algorithm that "every computer professional should know" [Louden].
5
5 Top-down (LL) Parsing Example For the input: z = (2*x + 5)*y - 7; tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ; Grammar rules (as before): assignment => ID = expression ; expression => expression + term | expression - term | term term => term * factor | term / factor | factor factor => ( expression ) | ID | NUMBER
6
6 Top-down Parsing Example (2) The top-down parser tries to match input to left sides. ID = ( NUMBER * ID + NUMBER )* ID - NUMBER ; assignment ID = expression ID = expression - term ; ID = term - term ; ID = term * factor - term ; ID = factor * factor - term ; ID = ( expression * factor - term ; ID = ( expression + term ) * factor - term ; ID = ( term + term ) * factor - term ; ID = ( term * factor + term )* factor - term ; ID = ( factor * ID + factor )* factor - term ; ID = ( NUMBER * ID + NUMBER )* factor - term ; ID = ( NUMBER * ID + NUMBER )* ID - factor ; ID = ( NUMBER * ID + NUMBER )* ID - ID ;
7
7 Top-down Parsing Example (3) Problem in example: we had to look ahead many tokens in order to know which production to use. This isn't necessary provided that we know the grammar is parsable using LL (top-down) methods. There are conditions on the grammar that we can test to verify this. (see: The Parsing Problem) Later we will study the recursive-descent algorithm which does top-down parsing with minimal look- ahead.
8
8 Bottoms-up (LR) Parsing Example (1) tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ; parser:ID... read (shift) first token factor...reduce factor =...shift FAIL: Can't match any rules (reduce) Backtrack and try again ID = ( NUMBER...shift ID = ( factor...reduce ID = ( term *...sh/reduce ID = ( term * ID...shift ID = ( term * factor...reduce ID = ( term...reduce ID = ( term +...shift ID = ( expression + NUMBER...reduce/sh ID = ( expression + factor...reduce ID = ( expression + term...reduce Action
9
9 Bottoms-up Parsing Example (2) tokens: ID = ( NUMBER * ID + NUMBER ) * ID -NUMBER; input:ID = ( expression...reduce ID = ( expression )...shift ID = factor... reduce ID = factor *... shift ID = term * ID...reduce/sh ID = term * factor...reduce ID = term... reduce ID = term -...shift ID = expression -... reduce ID = expression - NUMBER...shift ID = expression - factor...reduce ID = expression - term...reduce ID = expression ;shift assignmentreduce SUCCESS!! Start Symbol
10
10 Bottoms-up Parsing Example (3) LR parsing processes the input stream from the Left and tries to match the input to the Right side of a production. When something matches, it reduces the expression to a left side non-terminal symbol. Repeat the process until the entire input stream is matched. This could potentially be an O(n 3 ) task, but Knuth and others devised a table-based algorithm that is O(n).
11
11 The Parsing Problem
12
12 The Parsing Problem Top-down parsers must decide which production to use based on the current symbol, and perhaps "peeking" at the next symbol (or two...). Predictive parser: a parser that bases its actions on the next available token (called single symbol look- ahead). Two conditions are necessary: [see Louden, p. 108- 110]
13
13 The Parsing Problem (cont.) Condition 1: the ability to choose between multiple alternatives, such as: A 1 | 2 |... | n define First( ) = set of all tokens that can be the first token for any production cascade that produces symbol then a predictive parser can be used for rule A if: First( 1 ) First( 2 )... First( n ) is empty. Condition 2: the ability of the parser to detect presence of an optional element, such as A [ ]. Can the parser detect for certain when is present?
14
14 The Parsing Problem (cont.) Example: list expr [list]. How do we know that list isn't part of expr ? define Follow( ) = set of all tokens that can follow the non-terminal some production. Use a special symbol ($) to represent the end of input if can be the end of input. Example: Follow( factor ) = { +, -, *, /, ), $ } while Follow( term ) = { *, /, ), $ } then a predictive parser can detect the presence of optional symbol if First( ) Follow( ) is empty.
15
15 Review and Thought Questions
16
16 Lexics vs. Syntax vs. Semantics Division between lexical and syntactic structure is not fixed: number can be a token or defined by a grammar rule. Implementation can often decide: scanners are faster parsers are more flexible error checking of number format as regex is simpler Division between syntax and semantics is not fixed: we could define separate rules for IntegerNumber and FloatingPtNumber, IntegerTerm, FloatingPtTerm,... in order to specify which mixed-mode operations are allowed. or specify as part of semantics
17
17 Numbers: Scan or Parse? We can construct numbers from digits using the scanner or parser. Which is easier / better ? Scanner: Define numbers as tokens: number : [-]\d+ Parser: grammar rules define numbers (digits are tokens): number ' - ' unsignednumber | unsignednumber unsignednumber => unsignednumber digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
18
18 Is Java 'Class' grammar context-free? A class may have static and instance attributes. An inner class or local class have same syntax as top- level class, but: may not contain static members (except static constants) inner class may access outer class using OuterClass.this local class cannot be "public" Does this means the syntax for a class depends on context?
19
19 Alternative operator notation Some languages use prefix notation: operator comes first expr + expr expr | * expr expr | NUMBER Examples: * + 2 3 4 means (2 + 3) * 4 + 2 * 3 4 means 2 + (3 * 4) Using prefix notation, we don't have to worry about precedence of different operators in BNF rules !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.