Parsing #2 Leonidas Fegaras.

Slides:

Advertisements

Similar presentations

Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.

Advertisements

Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.

CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.

Bhaskar Bagchi (11CS10058) Lecture Slides( 9 th Sept. 2013)

Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)

1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.

Lecture #8, Feb. 7, 2007 Shift-reduce parsing,

COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.

Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.

Bottom-up parsing Goal of parser : build a derivation

Syntax and Semantics Structure of programming languages.

Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.

Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.

Syntax and Semantics Structure of programming languages.

Chapter 5: Bottom-Up Parsing (Shift-Reduce)

Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.

1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.

Bernd Fischer RW713: Compiler and Software Language Engineering.

Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.

Bottom-Up Parsing Algorithms LR(k) parsing L: scan input Left to right R: produce Rightmost derivation k tokens of lookahead LR(0) zero tokens of look-ahead.

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.

1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.

Lecture 5: LR Parsing CS 540 George Mason University.

Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.

Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 

Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.

COMPILER CONSTRUCTION

Syntax and Semantics Structure of programming languages.

Announcements/Reading

Parsing #1 Leonidas Fegaras.

Programming Languages Translator

Bottom-up parsing Goal of parser : build a derivation

Compiler Baojian Hua LR Parsing Compiler Baojian Hua

Chapter 2 :: Programming Language Syntax

Unit-3 Bottom-Up-Parsing.

UNIT - 3 SYNTAX ANALYSIS - II

Table-driven parsing Parsing performed by a finite state machine.

Chapter 4 Syntax Analysis.

Compiler Construction

Fall Compiler Principles Lecture 4: Parsing part 3

LALR Parsing Canonical sets of LR(1) items

Bottom-Up Syntax Analysis

Syntax Analysis Part II

Fall Compiler Principles Lecture 4: Parsing part 3

Implementing a LR parser engine

Subject Name:COMPILER DESIGN Subject Code:10CS63

Top-Down Parsing CS 671 January 29, 2008.

CSC 4181 Compiler Construction Parsing

LR Parsing – The Tables Lecture 11 Wed, Feb 16, 2005.

4d Bottom Up Parsing.

Lecture 4: Syntax Analysis: Botom-Up Parsing Noam Rinetzky

Lecture (From slides by G. Necula & R. Bodik)

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

Bottom Up Parsing.

LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and

Chapter 4. Syntax Analysis (2)

LR Parsing. Parser Generators.

Parsing #2 Leonidas Fegaras.

Fall Compiler Principles Lecture 4: Parsing part 3

Kanat Bolazar February 16, 2010

Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.

Compiler Construction

Chapter 4. Syntax Analysis (2)

Chap. 3 BOTTOM-UP PARSING

Compiler design Bottom-up parsing: Canonical LR and LALR

Presentation transcript:

Parsing #2 Leonidas Fegaras

Bottom-up Parsing Rightmost derivations; use of rules from right to left Uses a stack to push symbols the concatenation of the stack symbols with the rest of the input forms a valid bottom-up derivation E - num x-2*y$ stack input Derivation: E-num*id$ Two operations: reduction: if a postfix of the stack matches the RHS of a rule (called a handle), replace the handle with the LHS nonterminal of the rule eg, reduce the stack x * E + E by the rule E ::= E + E new stack: x * E shifting: if no handle is found, push the current input token on the stack and read the next symbol Also known as shift-reduce parsing

Example Input: x-2*y$ Stack rest-of-the-input Action 1) id - num * id $ shift 2) id - num * id $ reduce by rule 8 3) F - num * id $ reduce by rule 6 4) T - num * id $ reduce by rule 3 5) E - num * id $ shift 6) E - num * id $ shift 7) E - num * id $ reduce by rule 7 8) E - F * id $ reduce by rule 6 9) E - T * id $ shift 10) E - T * id $ shift 11) E - T * id $ reduce by rule 8 12) E - T * F $ reduce by rule 4 13) E - T $ reduce by rule 2 14) E $ shift 15) S accept (reduce by 0)‏ 0) S :: = E $ 1) E ::= E + T 2) E ::= E - T 3) E ::= T 4) T ::= T * F 5) T ::= T / F 6) T ::= F 7) F ::= num 8) F ::= id

Machinery Need to decide when to shift or reduce, and if reduce, by which rule use a DFA to recognize handles Example 0) S ::= R $ 1) R ::= R b 2) R ::= a state 2: accept (reduce by 0)‏ state 3: reduce by 2 state 4: reduce by 1 The DFA is represented by an ACTION and a GOTO table The stack contains state numbers now ACTION GOTO a b $ S R 0 s3 1 1 s4 s2 2 a a a 3 r2 r2 r2 4 r1 r1 r1

The Shift-Reduce Parser push(0); read_next_token(); for(;;)‏ { s = top(); /* current state is taken from top of stack */ if (ACTION[s,current_token] == 'si') /* shift and go to state i */ { push(i); } else if (ACTION[s,current_token] == 'ri') /* reduce by rule i: X ::= A1...An */ { perform pop() n times; s = top(); /* restore state before reduction from top of stack */ push(GOTO[s,X]); /* state after reduction */ else if (ACTION[s,current_token] == 'a')‏ success!! else error();

Example Example: parsing abb$ ACTION GOTO a b $ S R 0 s3 1 1 s4 s2 2 a a a 3 r2 r2 r2 4 r1 r1 r1 Example: parsing abb$ Stack rest-of-input Action 0 abb$ s3 0 3 bb$ r2 (pop, push GOTO[0,R] since R ::= a)‏ 0 1 bb$ s4 0 1 4 b$ r1 (pop twice, push GOTO[0,R] since R ::= R b)‏ 0 1 b$ s4 0 1 4 $ r1 (pop twice, push GOTO[0,R] since R ::= R b)‏ 0 1 $ s2 0 1 2 accept

Table Construction Problem: given a CFG, construct the finite automaton (DFA) that recognizes handles The DFA states are itemsets (sets of items)‏ An item is a rule with a dot at the RHS eg, possible items for the rule E ::= E + E: E ::= . E + E E ::= E . + E E ::= E + . E E ::= E + E . The dot indicates how far we have progressed using this rule to parse the input eg, the item E ::= E + . E indicates that we are using the rule E ::= E + E we have parsed E, we have seen the token +, and we are ready to parse another E

Table Construction (cont.)‏ The items in a itemset indicate different possibilities that will have to be resolved later by reading more input tokens eg, the itemset: T ::= ( E . )‏ E ::= E . + T corresponds to a DFA state where we don't know whether we are looking at an ( E ) handle or an E + T handle it will be ( E ) if the next token is )‏ it will be E + T if the next token is + When the dot is at the end of an item, we have found a handle eg, T ::= ( E ) . it corresponds to a reduction by T ::= ( E )‏ reduce/reduce conflict: an itemset should never have more than one item with a dot at the end otherwise we can't choose a handle

Closure of an Itemset The closure of an item X ::= a . t b is the singleton set that contains the item X ::= a . t b only X ::= a . Y b is the set consisting of the item itself, plus all rules for Y with the dot at the beginning of the RHS, plus the closures of these items eg, the closure of the item E ::= E + . T is the set: E ::= E + . T T ::= . T * F T ::= . T / F T ::= . F F ::= . num F ::= . id The closure of an itemset is the union of closures of all items in the itemset

Constructing the DFA The initial state of the DFA (state 0) is the closure of the item S ::= . a, where S ::= a is the first rule of the grammar For each itemset, if there is an item X ::= a . s b in an itemset, where s is a symbol, we have a transition labeled by s to an itemset that contains X ::= a s . b But if we have more than one item with a dot before the same symbol s, say X ::= a . s b and Y ::= c . s d, then the new itemset contains both X ::= a s . b and Y ::= c s . d we need to get the closure of the new itemset we need to check if this itemset has appeared before so that we don't create it again

Example #1 0) S ::= R $ 1) R ::= R b 2) R ::= a

Example #2 0) S' ::= S $ 1) S ::= B B 2) B ::= a B 3) B ::= c

Example #3 S ::= E $ E ::= ( L )‏ E ::= ( )‏ E ::= id L ::= L , E

LR(0)‏ If an itemset has more than one reduction (an item with the dot at the end), it is a reduce/reduce conflict If an itemset has at least one shifting (an outgoing transition to another state) and at least one reduction, it is a shift/reduce conflict A grammar is LR(0) if it doesn't have any reduce/reduce or shift/reduce conflict 1) S ::= E $ 2) E ::= E + T 3) | T 4) T ::= T * F 5) | F 6) F ::= id 7) | ( E )‏ S ::= . E $ T E ::= T . E ::= . E + T T ::= T . * F E ::= . T T ::= . T * F T ::= . F F ::= . id F ::= . ( E )

SLR(1)‏ There is an easy fix for some of the shift/reduce or reduce/reduce errors requires to look one token ahead (called the lookahead token)‏ Steps to resolve the conflict of an itemset: for each shift item Y ::= b . c you find FIRST(c)‏ for each reduction item X ::= a . you find FOLLOW(X)‏ if each FOLLOW(X) do not overlap with any of the other sets, you have resolved the conflict! eg, for the itemset with E ::= T . and T ::= T . * F FOLLOW(E) = { $, +, ) } FIRST(* F) = { * } no overlapping! This is a SLR(1) grammar, which is more powerful than LR(0)‏

LR(1)‏ The SLR(1) trick doesn't always work S ::= E $ E ::= L = R | R L ::= * R | id R ::= L For a reduction item X ::= a . we need a smaller (more precise) set of lookahead tokens to tell us when to reduce called expected lookahead tokens must be a subset of or equal to FOLLOW(X) (hopefully subset)‏ they are context-sensitive => finer control S ::= . E $ E ::= . L = R L E ::= L . = R E ::= . R R ::= L . L ::= . * R L ::= . id R ::= . L

LR(1) Items Now each item is associated with expected lookahead tokens eg, L ::= * . R =$ the tokens = and $ are the expected lookahead tokens they are only useful in a reduction item: L ::= * R . =$ it indicates that we reduce when the lookahead token from input is = or $ LR(1) grammar: at each shift/reduce reduce/reduce conflict, the expected lookahead tokens of the reductions must not overlap with the first tokens after dot (ie, the FIRST(c) in Y ::= b . c)‏ Rules for constructing the LR(1) DFA: for a transition from A ::= a . s b by a symbol s, propagate the expected lookaheads when you add the item B ::= . c to form the closure of A ::= a . B b with expected lookahead tokens t, s, ..., the expected lookahead tokens of B ::= . c are FIRST(bt) U FIRST(bs) U ...

Example S ::= E $ E ::= L = R | R L ::= * R | id R ::= L S ::= . E $ ? E ::= . L = R $ L E ::= L . = R $ E ::= . R $ R ::= L . $ L ::= . * R =$ L ::= . id =$ R ::= . L $

LALR(1)‏ If the lookaheads s1 and s2 are different, then the items A ::= a s1 and A ::= a s2 are different this results to a large number of states since the combinations of expected lookahead symbols can be very large. We can combine the two states into one by creating an item A := a s3 where s3 is the union of s1 and s2 LALR(1) is weaker than LR(1) but more powerful than SLR(1)‏ LALR(1) and LR(0) have the same number of states Easy construction of LALR(1) itemsets: start with LR(0) items and propagate lookaheads as in LR(1)‏ don't create a new itemset if the LR(0) items are the same just union together the lookaheads of the corresponding items you may have to propagate lookaheads by looping through the same itemsets until you cannot add more Most parser generators are LALR(1), including CUP

Example S ::= E $ E ::= E + E | E * E | ( E )‏ | id | num E ::= . E + E +*$ E S ::= E . $ ? * E ::= E * . E +*$ E ::= . E * E +*$ E ::= E . + E +*$ E ::= . E + E +*$ E ::= . ( E ) +*$ E ::= E . * E +*$ E ::= . E * E +*$ E ::= . id +*$ E ::= . ( E ) +*$ E ::= . num +*$ E ::= E * E . +*$ E E ::= . id +*$ E ::= E . + E +*$ E ::= . num +*$ E ::= E . * E +*$

Practical Considerations How to avoid reduce/reduce and shift/reduce conflicts: left recursion is good, right recursion is bad left recursion uses less stack than right recursion left recursion produces left associative trees right recursion produces right associative trees L ::= id , L L ::= L , id | id | id Most shift/reduce errors are easy to remove by assigning precedence and associativity to operators S ::= E $ E ::= E + E | E * E | ( E )‏ | id | num + and * are left-associative * has higher precedence than +

Practical Considerations (cont.)‏ How precedence and associativity work? the precedence and associativity of a rule comes from the last terminal at the RHS of the rule eg, the rule E ::= E + E has the same precedence and associativity as + you can force the precedence of a rule in CUP: eg, E ::= MINUS E %prec UMINUS in a state with a shift/reduce conflict and you are reading a token t if the precedence of t is lower than that of the reduction rule, you reduce if the precedence of t is equal to that of the reduction rule, if the rule has left associativity, you reduce otherwise you shift Reduce/reduce conflicts are hopeless the parser generator always reduces using the rule listed first fatal error

Error Recovery All the empty entries in the ACTION and GOTO tables correspond to syntax errors We can either report it and stop the parsing continue parsing finding more errors (error recovery)‏ Error recovery in CUP: S ::= L = E ; S ::= . L = E ; S ::= error . ; S ::= error ; . | { SL } ; S ::= . { SL } ; | error ; S ::= . error ; SL ::= S ; ... | SL S ; In case of an error, the parser pops out elements from the stack until it finds an error state where it can proceed then it discards tokens from the input until a restart is possible error

The Calculator Parser terminal LP, RP, COMMA, SEMI, ASSIGN, IF, THEN, ELSE, AND, OR, NOT, QUIT, PLUS, TIMES, MINUS, DIV, EQ, LT, GT, LE, NE, GE, FALSE, TRUE, DEFINE; terminal String ID; terminal Integer INT; terminal Float REALN; terminal String STRINGT; non terminal exp, string, name; non terminal expl, names; non terminal item, prog; precedence nonassoc ELSE; precedence right OR; precedence right AND; precedence nonassoc NOT; precedence left EQ, LT, GT, LE, GE, NE; precedence left PLUS, MINUS; precedence left TIMES, DIV;

The Calculator Parser (cont.)‏ start with prog; prog ::= item SEMI | prog item SEMI ; item ::= exp | QUIT | ID ASSIGN exp | DEFINE ID LP names RP EQ exp name ::= ID string ::= STRINGT expl ::= expl COMMA exp | exp names ::= names COMMA name | name

The Calculator Parser (cont.)‏ exp ::= INT | REALN | TRUE | FALSE | name | string | LP exp RP | IF exp THEN exp ELSE exp | exp EQ exp | exp LT exp | exp GT exp | exp LE exp | exp NE exp | exp GE exp | exp PLUS exp | exp MINUS exp | exp TIMES exp | exp DIV exp | exp OR exp | exp AND exp | NOT exp | name LP expl RP ;