Bottom-up parsing Goal of parser : build a derivation

Slides:



Advertisements
Similar presentations
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Advertisements

Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
LR Parsing Table Costruction
Bhaskar Bagchi (11CS10058) Lecture Slides( 9 th Sept. 2013)
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Pertemuan 12, 13, 14 Bottom-Up Parsing
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom Up Parsing.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
LR(k) Grammar David Rodriguez-Velazquez CS6800-Summer I, 2009 Dr. Elise De Doncker.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
LALR Parsing Canonical sets of LR(1) items
LESSON 24.
Syntax and Semantics Structure of programming languages.
LR Parsing Compiler Baojian Hua
Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
4. Bottom-up Parsing Chih-Hung Wang
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Three kinds of bottom-up LR parser SLR “Simple LR” –most restrictions on eligible grammars –built quite directly from items as just shown LR “Canonical.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Eliminating Left-Recursion Where some of a nonterminal’s productions are left-recursive, top-down parsing is not possible “Immediate” left-recursion can.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Bottom-Up Parsing.
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Compiler Construction
Fall Compiler Principles Lecture 4: Parsing part 3
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
Syntax Analysis Part II
Subject Name:COMPILER DESIGN Subject Code:10CS63
LR Parsing – The Tables Lecture 11 Wed, Feb 16, 2005.
Parsing #2 Leonidas Fegaras.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Parsing #2 Leonidas Fegaras.
Kanat Bolazar February 16, 2010
Chap. 3 BOTTOM-UP PARSING
Presentation transcript:

Bottom-up parsing Goal of parser : build a derivation Top-down parser : build a derivation by working from the start symbol towards the input. Builds parse tree from root to leaves Builds leftmost derivation Bottom-up parser : build a derivation by working from the input back toward the start symbol Builds parse tree from leaves to root Builds reverse rightmost derivation

Bottom-up parsing The parser looks for a substring of the parse tree's frontier... ...that matches the rhs of a production and ...whose reduction to the non-terminal on the lhs represents on step along the reverse of a rightmost derivation Such a substring is called a handle. Important: Not all substrings that match a rhs are handles.

Bottom-up parsing techniques Shift-reduce parsing Shift input symbols until a handle is found. Then, reduce the substring to the non-terminal on the lhs of the corresponding production. Operator-precedence parsing Based on shift-reduce parsing. Identifies handles based on precedence rules.

Example: Shift-reduce parsing STACK ACTION $ Shift Grammar: $ id1 Reduce (rule 5) 1. S  E 2. E  E + E 3. E  E * E 4. E  num 5. E  id $ E Shift $ E + Shift $ E + num Reduce (rule 4) $ E + E Shift $ E + E * Shift $ E + E * id2 Reduce (rule 5) Input to parse: id1 + num * id2 $ E + E * E Reduce (rule 3) $ E + E Reduce (rule 2) $ E Reduce (rule 1) Handles: underlined $ S Accept

Shift-Reduce parsing A shift-reduce parser has 4 actions: Shift -- next input symbol is shifted onto the stack Reduce -- handle is at top of stack pop handle push appropriate lhs Accept -- stop parsing & report success Error -- call error reporting/recovery routine

Shift-Reduce parsing How can we know when we have found a handle? Analyze the grammar beforehand. Build tables Look ahead in the input LR(1) parsers recognize precisely those languages in which one symbol of look-ahead is enough to determine whether to reduce or shift. L : for left-to-right parse of the input R : for reverse rightmost derivation 1: for one symbol of lookahead

How does it work? Read input, one token at a time Use stack to keep track of current state The state at the top of the stack summarizes the information below. The stack contains information about what has been parsed so far. Use parsing table to determine action based on current state and look-ahead symbol. How do we build a parsing table?

LR parsing techniques SLR (not in the book) Canonical LR Simple LR parsing Easy to implement, not strong enough Uses LR(0) items Canonical LR Larger parser but powerful Uses LR(1) items LALR (not in the book) Condensed version of canonical LR May introduce conflicts

Class examples id  F T T * F E E + T E' S'  S S  L = R S  R L 

Finding handles As a shift/reduce parser processes the input, it must keep track of all potential handles. For example, consider the usual expression grammar and the input string x+y. Suppose the parser has processed x and reduced it to E. Then, the current state can be represented by E • +E where • means that an E has already been parsed and that +E is a potential suffix, which, if found, will result in a successful parse. Our goal is to eventually reach state E+E•, which represents an actual handle and should result in the reduction EE+E

LR parsing Typically, LR parsing works by building an automaton where each state represents what has been parsed so far and what we hope to parse in the future. In other words, states contain productions with dots, as described earlier. Such productions are called items States containing handles (meaning the dot is all the way to the right end of the production) lead to actual reductions depending on the lookahead.

SLR parsing SLR parsers build automata where states contain items (a.k.a. LR(0) items) and reductions are decided based on FOLLOW set information. We will build an SLR table for the augmented grammar S'S S  L=R S  R L  *R L  id R  L

SLR parsing closure of a state: When parsing begins, we have not parsed any input at all and we hope to parse an S. This is represented by S'S. Note that in order to parse that S, we must either parse an L=R or an R. This is represented by SL=R and SR closure of a state: if AaBb represents the current state and B is a production, then add B   to the state. Justification: aBb means that we hope to see a B next. But parsing a B is equivalent to parsing a , so we can say that we hope to see a  next

SLR parsing Use the closure operation to define states containing LR(0) items. The first state will be: From this state, if we parse, say, an id, then we go to state If, after some steps we parse input that reduces to an L, then we go to state S' S S   L=R S   R L   *R L   id R   L L  id  S  L =R R  L 

SLR parsing Continuing the same way, we define all LR(0) item states: R   L L   *R L   id S' S S   L=R S   R L   *R L   id R   L S' S  S  L=R  I0 I9 id L I3 S  L =R R  L  = I2 L * * L  * R R   L L   id L   * R id R I5 L R  L  I7 I3 L  id  R id L  *R  I8 * I4 S  R 

SLR parsing The automaton and the FOLLOW sets tell us how to build the parsing table: Shift actions If from state i, you can go to state j when parsing a token t, then slot [i,t] of the table should contain action "shift and go to state j", written sj Reduce actions If a state i contains a handle A, then slot [i, t] of the table should contain action "reduce using A", for all tokens t that are in FOLLOW (A). This is written r(A) The reasoning is that if the lookahead is a symbol that may follow A, then a reduction A should lead closer to a successful parse. continued on next slide

SLR parsing The automaton and the FOLLOW sets tell us how to build the parsing table: Reduce actions, continued Transitions on non-terminals represent several steps together that have resulted in a reduction. For example, if we are in state 0 and parse a bit of input that ends up being reduced to an L, then we should go to state 2. Such actions are recorded in a separate part of the parsing table, called the GOTO part.

SLR parsing Before we can build the parsing table, we need to compute the FOLLOW sets: S' S S  L=R S  R L  *R L  id R  L FOLLOW(S') = {$} FOLLOW(S) = {$} FOLLOW(L) = {$, =} FOLLOW(R) = {$, =}

SLR parsing state action goto id = * $ S L R 0 s3 s5 1 2 4 1 accept 2 s6/r(RL) 3 r(Lid) r(Lid) 4 r(SR) 5 s3 s5 7 8 6 s3 s5 7 9 7 r(RL) r(RL) 8 r(L*R) r(L*R) 9 r(SL=R) Note the shift/reduce conflict on state 2 when the lookahead is an =

Conflicts in LR parsing There are two types of conflicts in LR parsing: shift/reduce On some particular lookahead it is possible to shift or reduce The if/else ambiguity would give rise to a shift/reduce conflict reduce/reduce This occurs when a state contains more than one handle that may be reduced on the same lookahead.

Conflicts in SLR parsing The parser we built has a shift/reduce conflict. Does that mean that the original grammar was ambiguous? Not necessarily. Let's examine the conflict: it seems to occur when we have parsed an L and are seeing an =. A reduce at that point would turn the L into an R. However, note that a reduction at that point would never actually lead to a successful parse. In practice, L should only be reduced to an R when the lookahead is EOF ($). An easy way to understand this is by considering that L represents l-values while R represents r-values.

Conflicts in SLR parsing The conflict occurred because we made a decision about when to reduce based on what token may follow a non-terminal at any time. However, the fact that a token t may follow a non-terminal N in some derivation does not necessarily imply that t will follow N in some other derivation. SLR parsing does not make a distinction.

Conflicts in SLR parsing SLR parsing is weak. Solution : instead of using general FOLLOW information, try to keep track of exactly what tokens many follow a non-terminal in each possible derivation and perform reductions based on that knowledge. Save this information in the states. This gives rise to LR(1) items: items where we also save the possible lookaheads.

Canonical LR(1) parsing In the beginning, all we know is that we have not read any input (S'S), we hope to parse an S and after that we should expect to see a $ as lookahead. We write this as: S'S, $ Now, consider a general item A, x. It means that we have parsed an , we hope to parse  and after those we should expect an x. Recall that if there is a production , we should add  to the state. What kind of lookahead should we expect to see after we have parsed ? We should expect to see whatever starts a . If  is empty or can vanish, then we should expect to see an x after we have parsed  (and reduced it to B)

Canonical LR(1) parsing The closure function for LR(1) items is then defined as follows: For each item A, x in state I, each production  in the grammar, and each terminal b in FIRST(x), add , b to I If a state contains core item  with multiple possible lookaheads b1, b2,..., we write , b1/b2 as shorthand for , b1 and , b2

Canonical LR(1) parsing S L=  R, $ R   L, $ L   *R, $ L   id, $ S' S, $ S   L=R, $ S   R, $ L   *R, =/$ L   id, =/$ R   L, $ S' S , $ SL=R, $ I0 id L Lid, $ I3' S  L =R, $ R  L , $ = I2 * L R L, $ I7' * L *R, $ R  L, $ L  id, $ L  *R, $ L *R, =/$ R  L, =/$ L  id, =/$ L  *R, =/$ L id id R I5 I5' I3' L *R , $ I3 L  id , =/$ R id I8' * L R * I4 S  R, =/$ L *R , =/$ I8 R L, =/$ I7

Canonical LR(1) parsing The table is created in the same way as SLR, except we now use the possible lookahead tokens saved in each state, instead of the FOLLOW sets. Note that the conflict that had appeared in the SLR parser is now gone. However, the LR(1) parser has many more states. This is not very practical. It may be possible to merge states!

LALR(1) parsing This is the result of an effort to reduce the number of states in an LR(1) parser. We notice that some states in our LR(1) automaton have the same core items and differ only in the possible lookahead information. Furthermore, their transitions are similar. States I3 and I3', I5 and I5', I7 and I7', I8 and I8' We shrink our parser by merging such states. SLR : 10 states, LR(1): 14 states, LALR(1) : 10 states

LALR(1) parsing I1 I9 S R I6 S L=  R, $ R   L, $ L   *R, $ L   id, $ S' S, $ S   L=R, $ S   R, $ L   *R, =/$ L   id, =/$ R   L, $ S' S , $ SL=R, $ I0 id L I3 S  L =R, $ R  L , $ = I2 * L * L *R, =/$ R  L, =/$ L  id, =/$ L  *R, =/$ id R I5 I3 L  id , =/$ R L, =/$ I7 id L R * I4 S  R, =/$ L *R , =/$ I8

Conflicts in LALR(1) parsing Note that the conflict that had vanished when we created the LR(1) parser has not reappeared. Can LALR(1) parsers introduce conflicts that did not exist in the LR(1) parser? Unfortunately YES. BUT, only reduce/reduce conflicts.

Conflicts in LALR(1) parsing LALR(1) parsers cannot introduce shift/reduce conflicts. Such conflicts are caused when a lookahead is the same as a token on which we can shift. They depend on the core of the item. But we only merge states that had the same core to begin with. The only way for an LALR(1) parser to have a shift/reduce conflict is if one existed already in the LR(1) parser. LALR(1) parsers can introduce reduce/reduce conflicts. Here's a situation when this might happen: A  B , x A  C , y A  B  , y A  C , x A  B  , x/y A  C , x/y merges with to give: