Chap. 3 BOTTOM-UP PARSING

Slides:



Advertisements
Similar presentations
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Advertisements

CH4.1 CSE244 SLR Parsing Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
Review: LR(k) parsers a1 … a2 … an $ LR parsing program Action goto Sm xm … s1 x1 s0 output input stack Parsing table.
1 May 22, May 22, 2015May 22, 2015May 22, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
LR Parsing Table Costruction
Bhaskar Bagchi (11CS10058) Lecture Slides( 9 th Sept. 2013)
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Pertemuan 12, 13, 14 Bottom-Up Parsing
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Chapter 4-2 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR Other.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
LALR Parsing Canonical sets of LR(1) items
 an efficient Bottom-up parser for a large and useful class of context-free grammars.  the “ L ” stands for left-to-right scan of the input; the “ R.
Joey Paquet, 2000, 2002, 2012, Lecture 6 Bottom-Up Parsing.
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
SLR PARSING TECHNIQUES Submitted By: Abhijeet Mohapatra 04CS1019.
 an efficient Bottom-up parser for a large and useful class of context-free grammars.  the “ L ” stands for left-to-right scan of the input; the “ R.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
1 LR Parsers  The most powerful shift-reduce parsing (yet efficient) is: LR(k) parsing. LR(k) parsing. left to right right-most k lookhead scanning derivation.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
LR Parser: LR parsing is a bottom up syntax analysis technique that can be applied to a large class of context free grammars. L is for left –to –right.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Three kinds of bottom-up LR parser SLR “Simple LR” –most restrictions on eligible grammars –built quite directly from items as just shown LR “Canonical.
Bottom-Up Parsing Algorithms LR(k) parsing L: scan input Left to right R: produce Rightmost derivation k tokens of lookahead LR(0) zero tokens of look-ahead.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Lec04-bottomupparser 4/13/2018 LR Parsing.
CSc 453 Syntax Analysis (Parsing)
LR Parsing – The Items Lecture 10 Fri, Feb 13, 2004.
Compiler design Bottom-up parsing Concepts
Bottom-Up Parsing.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
The role of parser Lexical Analyzer Parser Rest of Front End Symbol
lec04-bottomupparser June 6, 2018 Bottom-Up Parsing
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Syntactic Analysis and Parsing
Compiler Construction
Fall Compiler Principles Lecture 4: Parsing part 3
LALR Parsing Canonical sets of LR(1) items
UNIT 2 - SYNTAX ANALYSIS Role of the parser Writing grammars
Bottom-Up Syntax Analysis
Canonical LR Parsing Tables
Syntax Analysis Part II
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lecture 9 SLR Parse Table Construction
4d Bottom Up Parsing.
Lecture 8 Bottom Up Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Bottom Up Parsing.
Compiler SLR Parser.
4d Bottom Up Parsing.
Syntax Analysis - 3 Chapter 4.
Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
Presentation transcript:

Chap. 3 BOTTOM-UP PARSING

LR parsers The technique called LR(k) ("L" is for left-to-right scanning of the input, the "R" for constructing a rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions) parsing can be used to parse a large class of context-free grammars. When (k) is omitted, k is assumed to be 1.

LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written. -          The LR parsing method is the most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods. -          An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input. It is a lot of work to construct an LR parser by hand, fortunately there exist specialized tools called LR parser generators.

There exist 3 techniques for constructing an LR parsing table for a grammar. -          The first method is called simple LR (SLR) is the easiest to implement, but the least powerful of the three. -          The second method, called canonical LR, is the most powerful, and the most expensive. -          The third method, called lookahead LR (LALR) is intermediate in power and cost between the other two.

Input LR Parsing Program Stack action goto a1 … ai an $ sm  Output Xm   Input a1 … ai an $ LR Parsing Program Stack sm  Output Xm sm-1 Xm-1 s0 action goto

An LR parser consists of an input, a stack, a driver program, and a parsing table that has two parts (action and goto). The driver program is the same for all LR parsers; only the parsing table changes from one parser to another. The parsing program reads characters from an input buffer one at a time. The program uses a stack to store a string of the form s0X1s1X2s2X3…Xmsm, where sm is on top. Each Xi is a grammar symbol and each si is a symbol called a state. Each state symbol summarizes the information contained in the stack below it, and the combination of the state symbol on top of the stack and the current input symbol are used to index the parsing table and determine the shift-reduce parsing decision.

The parsing table consists of two parts, a parsing action function action and a goto function goto. The program driving the LR parser determines sm, the state currently on top of the stack, and ai, the current input symbol. It then consults action[sm, ai], the parsing action table entry for state sm and input ai, which can have one of 4 values: 1.        shift s, where s is a state, 2.        reduce by a grammar production A, 3.        accept, and 4.        error.

The function goto takes a state and grammar symbol as arguments and produces a state. In fact, the goto function of a parsing table is a transition table of a deterministic finite automaton that recognizes viable prefixes of G. The initial state of this DFA is the state initially put on top of the LR parser stack.

A configuration of an LR parser is a pair whose first component is the stack contents and whose second component is the unexpended input: (s0X1s1X2…Xmsm , aiai+1…an$) The next move of the parser is determined by reading the current input symbol ai, and the state on top of the stack sm, and then consulting the parsing action table entry action[sm, ai]. The configurations resulting after each of the four types of move are as follows:

     1.  If action[sm, ai] = shift s, the parser executes a shift move, the parser executes a shift move entering the configuration (s0X1s1X2…Xmsmais , ai+1…an$). Here the parser has shifted both the current input symbol ai and the next state s, which is goto[sm,ai] onto the stack ; ai+1 becomes the current input symbol. 2.        If action[sm, ai] = reduce A , then parser executes a reduce move entering the configuration (s0X1s1X2…Xm-rsm-rAs , aiai+1…an$) where s=GOTO[sm-r,A] and r is the length of , the right side of the production. Here the parser first popped 2r symbols off the stack (r state symbols and r grammar symbols), exposing state sm-r. The parser then pushed both A, the left side of the production, and s, the entry for goto[sm-r,A], onto the stack. The current input symbol is not changed in a reduce move. For the LR parsers Xm-r+1… Xm, the sequence of grammar symbols popped off the stack, will always match , the right side of the reducing production. 3.        If action[sm, ai]=ACCEPT, parsing is completed. 4.        If action[sm, ai]=ERROR, the parser calls an error recovery routine.

LR parsing algorithm Input : An input string w and an LR parsing table with functions action and goto for a grammar G. Output : If w is in L(G), a bottom-up parse for w; otherwise, an error indication. Method : Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer. The parser then executes the following program until an accept or error action is encountered. set ip to point to the first symbol of w$ ;

repeat forever let s be the state on top of the stack and a the symbol pointed by ip ; if action[s, a]=shift s’ then begin push a then s’ on top of the stack; advance ip to the next input symbol end else if action[s, a]=reduce by Aβ then begin pop 2*| β| symbols off the stack; let s’ be the state now on top of the stack ; push A then goto[s’,A] on top of the stack; output the production Aβ else if action[s, a]=Accept then return /* Success */ else Error()

Example E  E + T (1) E  T (2) T  T * F (3) T  F (4) F  (E) (5) F  id (6)

LR grammar A grammar for which we can construct a parsing table is said to be an LR grammar. There is a significant difference between LL and LR grammars. For a grammar to be LR(k), we must be able to recognize the occurrence of the right side of a production, having seen all of what is derived from that right side with k input symbols of lookahead. This requirement is far less stringent than that for LL(k) grammars where we must be able to recognize the use of a production seeing only the first k symbols of what its right side derives. Thus, LR grammars can describe more languages than LL grammars.

Constructing SLR Parsing Tables An LR(0) item (item for short) of a grammar G is a production of G with a dot at some position of the right side. Thus, production AXYZ yields the four items A  .XYZ A  X.YZ A  XY.Z A  XYZ.

The Closure operation If I is the set of items for a grammar G, then closure(I) is the set of items constructed from I by the two rules: 1.        Initially, every item in I is added to closure(I). 2.        If A  .B is in closure(I) and  is a production, then add the item B. to closure(I), if it is not already there. We apply this rule until no more new items can be added to closure(I). Intuitively, A  .B in closure(I) indicates that, at some point in the parsing process, we think we might next see a substring derivable from B as input. If B is a production, we also expect we might see a substring derivable from  at this point. For this reason we also include B. in closure(I).

Example Consider the augmented Expression grammar : E’  E E  E + T|T T  T * F|F F  (E)|id

If I is the set of one item {[E’  E]}, then Closure(I) contains the items : T .T*F T  .F F  .(E) F  .id

Function Closure(I) begin J :=I repeat for each item A  .B in J and each production B   of G such that B  . is not in J do add B  . to J. until no more items can be added to J ; return J end

The Goto operation Goto(I,X) where I is a set of items and X is a grammar symbol. Goto(I,X) is defined to be the closure of the set of all items [A  X.] such that [A  .X] is in I. Intuitively, if I is the set of items that are valid for some viable prefix , then goto(I,X) is the set of items that are valid for the viable prefix X. Example If I is the set of two items {[E’  E], [E  E.+T]} then goto(I,+) consists of E  E +.T T  .T*F T  .F F  .(E) F  .id

The sets-of-Items construction procedure items(G'); begin C :={Closure({[S’  .S]})} ; repeat for each set of items I in C and each grammar symbol X such that goto(I,X) is not empty and not in C do add goto(I,X) to C until no more sets of items can be added to C end

Example I0 : E’  .E, E  .E+T, E .T, T  .T*F, T  .F, F .(E), F  .id I1 : E’  E., E  E.+T I2 : E  T., T  T.*F I3 : T  F. I4 : F  (.E) , E  .E+T, E .T, T  .T*F, T  .F, F .(E), F  .id I5 : F  id. I6 : E  E+.T, T  .T*F, T  .F, F .(E), F  .id I7 : T  T*.F, F .(E), F  .id I8 : F  (E.), E  E.+T I9 : E  E+T., T  T.*F I10 : T  T*F. I11 : F  (E).

SLR parsing table Input. An augmented grammar G'. Output. The SLR parsing table functions action and goto for G'. Method. 1.        Construct C = {I0, I1, …In}, the set of LR(0) items for G’. 2.        State i is constructed from Ii. The parsing actions for state i are determined as follows : If [A  .a] is in Ii et goto[Ii,a] = Ij, then set action[i,a] with shift j. Here a must be a terminal. b) If [A  .] is in Ii, then set action [i,a] with reduce by "A  " for all a of FOLLOW(A) ; here A may not be S’. c) If [S’  S.] is in Ii, then set action[i,$] with "accept"

If any conflicting actions are generated by the above rules, we say the grammar is not SLR(1). 1.        The goto transitions for state i are constructed for all nonterminals A using the rule: If goto[Ii,A]=Ij, then goto[i,A]=j 2.        All entries not defined by rules (2) and (3) are made "error". 3.        The initial state of the parser is the one constructed from the set of items containing [S’  .S]

Example Every SLR(1) grammar is unambigious, but there are many unambigious grammars that are not SLR(1). Consider  the grammar with productions   S  L=R S  R L  *R L  id R  L We can consider L and R as standing for L-value and R-value, respectively, and * operator indicating "contents of"

I0 : S’  .S, S  .L=R, L  .*R, L  .id, S  .R, R  .L I2 : S  L.=R, R  L. I3 : S  R. I4 : L  *.R, R  .L, L  .id, L  .*R I5 : L  id. I6 : S  L=.R, R  .L, L  .*R, L  .id I7 : L  *R. I8 : R  L. I9 : S  L=R.

Consider the set of items I2. action[2,=]=shift 6 FOLLOW(L)=FOLLOW(R)={$,=} action[2,=]=reduce by R  L. So state 2 has a shift/reduce conflict.