Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.

Similar presentations


Presentation on theme: "1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype."— Presentation transcript:

1 1 Parsers and Grammar

2 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype [ = expression ] {, datatype [ = expression ] } ; access ::= ' public ' | ' protected ' | ' private '  Statements. assignment, if, for, while, do_while  Expressions, such as the examples in these slides.  Structures such as statement blocks, methods, and entire classes. StatementBlock ::= ' {' { Statement; } ' }'

3 3 Parsing Algorithms (1)  Broadly divided into LL and LR. LL algorithms match input directly to left-side symbols, then choose a right-side production that matches the tokens. This is top-down parsing LR algorithms try to match tokens to the right-side productions, then replace groups of tokens with the left-side nonterminal. They continue until the entire input has been "reduced" to the start symbol LALR (look-ahead LR) are a special case of LR; they require a few restrictions to the LR case  Reference: Sebesta, section 4.3 - 4.5.

4 4 Parsing Algorithms (2)  Look ahead: algorithms must look at next token(s) to decide between alternate productions for current tokens LALR(1) means LALR with 1 token look-ahead LL(1) means LL with 1 token look-ahead  LL algorithms are simpler and easier to visualize.  LR algorithms are more powerful: can parse some grammars that LL cannot, such as left recursion.  yacc, bison, and CUP generate LALR(1) parsers  Recursive-descent is a useful LL algorithm that "every computer professional should know" [Louden].

5 5 Top-down (LL) Parsing Example For the input: z = (2*x + 5)*y - 7; tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ; Grammar rules (as before): assignment => ID = expression ; expression => expression + term | expression - term | term term => term * factor | term / factor | factor factor => ( expression ) | ID | NUMBER

6 6 Top-down Parsing Example (2) The top-down parser tries to match input to left sides. ID = ( NUMBER * ID + NUMBER )* ID - NUMBER ; assignment ID = expression ID = expression - term ; ID = term - term ; ID = term * factor - term ; ID = factor * factor - term ; ID = ( expression * factor - term ; ID = ( expression + term ) * factor - term ; ID = ( term + term ) * factor - term ; ID = ( term * factor + term )* factor - term ; ID = ( factor * ID + factor )* factor - term ; ID = ( NUMBER * ID + NUMBER )* factor - term ; ID = ( NUMBER * ID + NUMBER )* ID - factor ; ID = ( NUMBER * ID + NUMBER )* ID - ID ;

7 7 Top-down Parsing Example (3)  Problem in example: we had to look ahead many tokens in order to know which production to use.  This isn't necessary provided that we know the grammar is parsable using LL (top-down) methods.  There are conditions on the grammar that we can test to verify this. (see: The Parsing Problem)  Later we will study the recursive-descent algorithm which does top-down parsing with minimal look- ahead.

8 8 Bottoms-up (LR) Parsing Example (1) tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ; parser:ID... read (shift) first token factor...reduce factor =...shift FAIL: Can't match any rules (reduce) Backtrack and try again ID = ( NUMBER...shift ID = ( factor...reduce ID = ( term *...sh/reduce ID = ( term * ID...shift ID = ( term * factor...reduce ID = ( term...reduce ID = ( term +...shift ID = ( expression + NUMBER...reduce/sh ID = ( expression + factor...reduce ID = ( expression + term...reduce Action

9 9 Bottoms-up Parsing Example (2) tokens: ID = ( NUMBER * ID + NUMBER ) * ID -NUMBER; input:ID = ( expression...reduce ID = ( expression )...shift ID = factor... reduce ID = factor *... shift ID = term * ID...reduce/sh ID = term * factor...reduce ID = term... reduce ID = term -...shift ID = expression -... reduce ID = expression - NUMBER...shift ID = expression - factor...reduce ID = expression - term...reduce ID = expression ;shift assignmentreduce SUCCESS!! Start Symbol

10 10 Bottoms-up Parsing Example (3)  LR parsing processes the input stream from the Left and tries to match the input to the Right side of a production.  When something matches, it reduces the expression to a left side non-terminal symbol.  Repeat the process until the entire input stream is matched.  This could potentially be an O(n 3 ) task, but Knuth and others devised a table-based algorithm that is O(n).

11 11 The Parsing Problem

12 12 The Parsing Problem  Top-down parsers must decide which production to use based on the current symbol, and perhaps "peeking" at the next symbol (or two...).  Predictive parser: a parser that bases its actions on the next available token (called single symbol look- ahead).  Two conditions are necessary: [see Louden, p. 108- 110]

13 13 The Parsing Problem (cont.) Condition 1: the ability to choose between multiple alternatives, such as: A   1 |  2 |... |  n define First(  ) = set of all tokens that can be the first token for any production cascade that produces symbol  then a predictive parser can be used for rule A if: First(  1 )  First(  2 )...  First(  n ) is empty. Condition 2: the ability of the parser to detect presence of an optional element, such as A   [  ]. Can the parser detect for certain when  is present?

14 14 The Parsing Problem (cont.) Example: list  expr [list]. How do we know that list isn't part of expr ? define Follow(  ) = set of all tokens that can follow the non-terminal  some production. Use a special symbol ($) to represent the end of input if  can be the end of input. Example: Follow( factor ) = { +, -, *, /, ), $ } while Follow( term ) = { *, /, ), $ } then a predictive parser can detect the presence of optional symbol  if First(  )  Follow(  ) is empty.

15 15 Review and Thought Questions

16 16 Lexics vs. Syntax vs. Semantics  Division between lexical and syntactic structure is not fixed: number can be a token or defined by a grammar rule.  Implementation can often decide: scanners are faster parsers are more flexible error checking of number format as regex is simpler  Division between syntax and semantics is not fixed: we could define separate rules for IntegerNumber and FloatingPtNumber, IntegerTerm, FloatingPtTerm,... in order to specify which mixed-mode operations are allowed. or specify as part of semantics

17 17 Numbers: Scan or Parse? We can construct numbers from digits using the scanner or parser. Which is easier / better ?  Scanner: Define numbers as tokens: number :  [-]\d+  Parser: grammar rules define numbers (digits are tokens): number  ' - ' unsignednumber | unsignednumber unsignednumber => unsignednumber digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

18 18 Is Java 'Class' grammar context-free?  A class may have static and instance attributes.  An inner class or local class have same syntax as top- level class, but: may not contain static members (except static constants) inner class may access outer class using OuterClass.this local class cannot be "public"  Does this means the syntax for a class depends on context?

19 19 Alternative operator notation  Some languages use prefix notation: operator comes first expr  + expr expr | * expr expr | NUMBER  Examples: * + 2 3 4 means (2 + 3) * 4 + 2 * 3 4 means 2 + (3 * 4)  Using prefix notation, we don't have to worry about precedence of different operators in BNF rules !


Download ppt "1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype."

Similar presentations


Ads by Google