10/10/2002© 2002 Hal Perkins & UW CSED-1 CSE 582 – Compilers LR Parsing Hal Perkins Autumn 2002.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compiler Designs and Constructions
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Bottom-up Parsing A general style of bottom-up syntax analysis, known as shift-reduce parsing. Two types of bottom-up parsing: Operator-Precedence parsing.
Mooly Sagiv and Roman Manevich School of Computer Science
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Pertemuan 12, 13, 14 Bottom-Up Parsing
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
Chapter 4-2 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR Other.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
CH4.1 CSE244 Introduction to LR Parsing Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box.
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
Lexical and syntax analysis
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
BOTTOM-UP PARSING Sathu Hareesh Babu 11CS10039 G-8.
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Parsing V LR(1) Parsers. LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 token) for handle recognition.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Syntax and Semantics Structure of programming languages.
CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Parsing IV Bottom-up Parsing
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
Subject Name:COMPILER DESIGN Subject Code:10CS63
4d Bottom Up Parsing.
BOTTOM UP PARSING Lecture 16.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Bottom Up Parsing.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Bottom-Up Parsing “Shift-Reduce” Parsing
Kanat Bolazar February 16, 2010
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

10/10/2002© 2002 Hal Perkins & UW CSED-1 CSE 582 – Compilers LR Parsing Hal Perkins Autumn 2002

10/10/2002© 2002 Hal Perkins & UW CSED-2 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts

10/10/2002© 2002 Hal Perkins & UW CSED-3 LR(1) Parsing We’ll look at LR(1) parsers Left to right scan, Rightmost derivation, 1 symbol lookahead Most practical programming languages have an LR(1) grammar LALR(1), SLR(1), etc. – subsets of LR(1)

10/10/2002© 2002 Hal Perkins & UW CSED-4 Bottom-Up Parsing Idea: Read the input left to right Whenever we’ve matched the right hand side of a production, reduce it to the appropriate non-terminal and add that non-terminal to the parse tree The upper edge of this partial parse tree is known as the frontier

10/10/2002© 2002 Hal Perkins & UW CSED-5 Example Grammar S ::= aABe A ::= Abc | b B ::= d Bottom-up Parse a b b c d e

10/10/2002© 2002 Hal Perkins & UW CSED-6 Details The bottom-up parser reconstructs a reverse rightmost derivation Given the rightmost derivation S =>  1 =>  2 =>…=>  n-2 =>  n-1 =>  n = w the parser will first discover  n-1 =>  n, then  n-2 =>  n-1, etc. Parsing terminates when  1 reduced to S (success), or No match can be found (syntax error)

10/10/2002© 2002 Hal Perkins & UW CSED-7 How Do We Automate This? Key: given what we’ve already seen and the next input symbol, decide what to do. Choices: Perform a reduction Look ahead further Can reduce A=>  if both of these hold: A=>  is a valid production A=>  is a step in this rightmost derivation This is known as a shift-reduce parser

10/10/2002© 2002 Hal Perkins & UW CSED-8 Sentential Forms If S =>* , the string  is called a sentential form of the of the grammar In the derivation S =>  1 =>  2 =>…=>  n-2 =>  n-1 =>  n = w each of the  i are sentential forms A sentential form in a rightmost derivation is called a right-sentential form (similarly for leftmost and left-sentential)

10/10/2002© 2002 Hal Perkins & UW CSED-9 Handles Informally, a substring of the tree frontier that matches the right side of a production Even if A::=  is a production,  is a handle only if it matches the frontier at a point where A::=  was used in the derivation  may appear in many other places in the frontier without being a handle for that production

10/10/2002© 2002 Hal Perkins & UW CSED-10 Handles (cont.) Formally, a handle of a right-sentential form  is a production A ::=  and a position in  where  may be replaced by A to produce the previous right- sentential form in the rightmost derivation of 

10/10/2002© 2002 Hal Perkins & UW CSED-11 Handle Examples In the derivation S => aABe => aAde => aAbcde => abbcde abbcde is a right sentential form whose handle is A::=b at position 2 aAbcde is a right sentential form whose handle is A::=Abc at position 4 Note: some books take the left of the match as the position (e.g., Dragon Book)

10/10/2002© 2002 Hal Perkins & UW CSED-12 Implementing Shift-Reduce Parsers Key Data structures A stack holding the frontier of the tree A string with the remaining input

10/10/2002© 2002 Hal Perkins & UW CSED-13 Shift-Reduce Parser Operations Reduce – if the top of the stack is the right side of a handle A::= , pop the right side  and push the left side A. Shift – push the next input symbol onto the stack Accept – announce success Error – syntax error discovered

10/10/2002© 2002 Hal Perkins & UW CSED-14 Shift-Reduce Example StackInputAction $abbcde$shift S ::= aABe A ::= Abc | b B ::= d

10/10/2002© 2002 Hal Perkins & UW CSED-15 How Do We Automate This? Def. Viable prefix – a prefix of a right- sentential form that can appear on the stack of the shift-reduce parser Equivalent: a prefix of a right-sentential form that does not continue past the rightmost handle of that sentential form Idea: Construct a DFA to recognize viable prefixes given the stack and remaining input Perform reductions when we recognize them

10/10/2002© 2002 Hal Perkins & UW CSED-16 DFA for prefixes of S ::= aABe A ::= Abc | b B ::= d start a A ::= b B ::= d bd Abc A ::= Abc B e S ::= aABe accept $

10/10/2002© 2002 Hal Perkins & UW CSED-17 Trace StackInput $abbcde$ S ::= aABe A ::= Abc | b B ::= d start a A ::= b B ::= d bd Abc A ::= Abc B e S ::= aABe accept $

10/10/2002© 2002 Hal Perkins & UW CSED-18 Observations Way too much backtracking We want the parser to run in time proportional to the length of the input Where the heck did this DFA come from anyway? From the underlying grammar We’ll defer construction details for now

10/10/2002© 2002 Hal Perkins & UW CSED-19 Avoiding DFA Rescanning Observation: after a reduction, the contents of the stack are the same as before except for the new non-terminal on top  Scanning the stack will take us through the same transitions as before until the last one  If we record state numbers on the stack, we can go directly to the appropriate state when we pop the right hand side of a production from the stack

10/10/2002© 2002 Hal Perkins & UW CSED-20 Stack Change the stack to contain pairs of states and symbols from the grammar $s 0 X 1 S 1 X 1 S 1 … X n S n State s 0 represents the accept state (Not always added – depends on particular presentation) Observation: in an actual parser, only the state numbers need to be pushed, since they implicitly contain the symbol information, but for explanations, it’s clearer to use both.

10/10/2002© 2002 Hal Perkins & UW CSED-21 Encoding the DFA in a Table A shift-reduce parser’s DFA can be encoded in two tables One row for each state action table encodes what to do given the current state and the next input symbol goto table encodes the transitions to take after a reduction

10/10/2002© 2002 Hal Perkins & UW CSED-22 Actions (1) Given the current state and input symbol, the main possible actions are si – shift the input symbol and state i onto the stack (i.e., shift and move to state i ) rj – reduce using grammar production j The production number tells us how many pairs to pop off the stack

10/10/2002© 2002 Hal Perkins & UW CSED-23 Actions (2) Other possible action table entries accept blank – no transition – syntax error A LR parser will detect an error as soon as possible on a left-to-right scan A real compiler needs to produce an error message, recover, and continue parsing when this happens

10/10/2002© 2002 Hal Perkins & UW CSED-24 Goto When a reduction is done, pairs are popped from the stack revealing a state uncovered_s on the top of the stack goto[uncovered_s, A] is the new state to push on the stack when reducing production A ::=  (after popping  )

10/10/2002© 2002 Hal Perkins & UW CSED-25 Reminder: DFA for S ::= aABe A ::= Abc | b B ::= d start a A ::= b B ::= d bd Abc A ::= Abc B e S ::= aABe accept $

10/10/2002© 2002 Hal Perkins & UW CSED-26 LR Parse Table for  S ::= aABe  A ::= Abc  A ::= b  B ::= d State actiongoto abcde$AB 1s2acc 2s4g3 3s6s5g8 4r3 5r4 6s7 7r2 8s9 9r1

10/10/2002© 2002 Hal Perkins & UW CSED-27 LR Parsing Algorithm (1) word = scanner.getToken(); while (true) { s = top of stack; if (action[s, word] = si ) { push word; push i (state); word = scanner.getToken(); } else if (action[s, word] = rj ) { pop 2 * length of right side of production j (2*|  |); uncovered_s = top of stack; push left side A of production j ; push state goto[uncovered_s, A]; } } else if (action[s, word] = accept ) { return; } else { // no entry in action table report syntax error; halt or attempt recovery; }

10/10/2002© 2002 Hal Perkins & UW CSED-28 Example Stack Input $ abbcde$ S actiongoto abcde$AB 1s2ac 2s4g3 3s6s5g8 4r3 5r4 6s7 7r2 8s9 9r1  S ::= aABe  A ::= Abc  A ::= b  B ::= d

10/10/2002© 2002 Hal Perkins & UW CSED-29 LR States Idea is that each state encodes The set of all possible productions that we could be looking at, given the current state of the parse, and Where we are in the right hand side of each of those productions

10/10/2002© 2002 Hal Perkins & UW CSED-30 Items An item is a production with a dot in the right hand side Example: Items for production A ::= XY A ::=.XY A ::= X.Y A ::= XY. Idea: The dot represents a position in the production

10/10/2002© 2002 Hal Perkins & UW CSED-31 DFA for S ::= aABe A ::= Abc | b B ::= d S ::=.aABe S ::= a.ABe A ::=.Abc A ::=.b A ::= b. accept $ a b S ::= aA.Be A ::= A.bc B ::=.d A B ::= d. d b A ::= Ab.c A ::= Abc. c B S ::= aAB.e e S ::= aABe

10/10/2002© 2002 Hal Perkins & UW CSED-32 Problems with Grammars Grammars can cause to problems when constructing a LR parser Shift-reduce conflicts Reduce-reduce conflicts

10/10/2002© 2002 Hal Perkins & UW CSED-33 Shift-Reduce Conflicts Situation: both a shift and a reduce are possible at a given point in the parse (equivalently: in a particular state of the DFA) Classic example: if-else statement S ::= ifthen S | ifthen S else S

10/10/2002© 2002 Hal Perkins & UW CSED-34 Parser States for State 3 has a shift- reduce conflict Can shift past else into state 4 (s4) Can reduce (r1) S ::= ifthen S (Note: other S ::=.ifthen items not included in states 2-4 to save space)  S ::= ifthen S  S ::= ifthen S else S S ::=.ifthen S S ::=.ifthen S else S ifthen 1 S ::= ifthen.S S ::= ifthen.S else S S 2 S ::= ifthen S. S ::= ifthen S.else S else 3 S ::= ifthen S else.S 4

10/10/2002© 2002 Hal Perkins & UW CSED-35 Solving Shift-Reduce Conflicts Fix the grammar Use a parse tool with a “longest match” rule – i.e., if there is a conflict, choose to shift instead of reduce Does exactly what we want for if-else case Moral: a few shift-reduce conflicts are fine, but be sure that they do what you want

10/10/2002© 2002 Hal Perkins & UW CSED-36 Reduce-Reduce Conflicts Situation: two different reductions are possible in a given state Contrived example S ::= A S ::= B A ::= x B ::= x

10/10/2002© 2002 Hal Perkins & UW CSED-37 Parser States for State 2 has a reduce-reduce conflict (r3, r4) S ::=.A S ::=.B A ::=.x B ::=.x x 1 A ::= x. B ::= x. 2  S ::= A  S ::= B  A ::= x  B ::= x

10/10/2002© 2002 Hal Perkins & UW CSED-38 Handling Reduce-Reduce Conflicts These normally indicate a serious problem with the grammar. Fixes Use a different kind of parser generator that takes lookahead information into account when constructing the states (LR(1) instead of SLR(1) for example) Most practical tools already use this information Fix the grammar

10/10/2002© 2002 Hal Perkins & UW CSED-39 Another Reduce-Reduce Conflict Suppose the grammar separates arithmetic and boolean expressions expr ::= aexp | bexp aexp ::= aexp * aident | aident bexp ::= bexp && bident | bident aident ::= id bident ::= id This will create a reduce-reduce conflict

10/10/2002© 2002 Hal Perkins & UW CSED-40 Covering Grammars A solution is to merge aident and bident into a single non-terminal (or use id in place of aident and bident everywhere they appear) This is a covering grammar Includes some programs that are not generated by the original grammar Use the type checker or other static semantic analysis to weed out illegal programs later

10/10/2002© 2002 Hal Perkins & UW CSED-41 Coming Attractions Constructing LR tables We’ll present a simple version (SLR(0)) in lecture, then talk about extending it to LR(1) LL parsers and recursive descent Continue reading ch. 3