Bottom-Up Parsing.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compiler Designs and Constructions
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Chapter 4-2 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR Other.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
– 1 – CSCE 531 Spring 2006 Lecture 8 Bottom Up Parsing Topics Overview Bottom-Up Parsing Handles Shift-reduce parsing Operator precedence parsing Readings:
Bottom-up parsing Goal of parser : build a derivation
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
BOTTOM-UP PARSING Sathu Hareesh Babu 11CS10039 G-8.
Joey Paquet, 2000, 2002, 2012, Lecture 6 Bottom-Up Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Lec04-bottomupparser 4/13/2018 LR Parsing.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Chapter 4 - Parsing CSCE 343.
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Compiler design Bottom-up parsing Concepts
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
lec04-bottomupparser June 6, 2018 Bottom-Up Parsing
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Revision ? E  TE  E   + TE  |  T  FT  T   * FT  | 
Parsing IV Bottom-up Parsing
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
Syntax Analysis Part II
4 (c) parsing.
Shift Reduce Parsing Unit -3
Chapter-3 Scanning and Parsing
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lexical and Syntax Analysis
4d Bottom Up Parsing.
BOTTOM UP PARSING Lecture 16.
Lecture 8 Bottom Up Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Bottom Up Parsing.
R.Rajkumar Asst.Professor CSE
Parsing IV Bottom-up Parsing
Compiler SLR Parser.
Bottom-Up Parsing “Shift-Reduce” Parsing
4d Bottom Up Parsing.
Kanat Bolazar February 16, 2010
BNF 9-Apr-19.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Chap. 3 BOTTOM-UP PARSING
4d Bottom Up Parsing.
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

Bottom-Up Parsing

lec04-bottomupparser June 6, 2018 Bottom-Up Parsing A bottom-up parser creates the parse tree of the given input starting from leaves towards the root. S  ...   (the right-most derivation of )  (the bottom-up parser finds the right-most derivation in the reverse order) Bottom-up parsing is also known as SR parsing because its two main actions are shift and reduce. At each shift action, the current symbol in the input string is pushed to a stack. At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a production) will replaced by the non-terminal at the left side of that production. There are also two more actions: accept and error.

Shift-Reduce Parsing A shift-reduce parser tries to reduce the given input string into the starting symbol. a string  the starting symbol reduced to At each reduction step, a substring of the input matching to the right side of a production rule is replaced by the non-terminal at the left side of that production rule. Rightmost Derivation: S   Shift-Reduce Parser finds: S  ...   * rm rm rm

Shift-Reduce Parsing -- Example S  aABb input string: aaabb A  aA | a aaAbb B  bB | b aAbb  reduction aABb S S  aABb  aAbb  aaAbb  aaabb How do we know which substring to be replaced at each reduction step? rm rm rm rm

Handle Informally, a handle of a string is a substring that matches the right side of a production rule. But not every substring matches the right side of a production rule is handle Handle: It is a symbol or right side of any production rule, taken and replaced by left side to a production to the start symbol If the grammar is unambiguous, then every right-sentential form of the grammar has exactly one handle.

Handle Pruning A right-most derivation in reverse can be obtained by handle-pruning. S=0  1  2  ...  n-1  n=  input string Start from n, find a handle Ann in n, and replace n in by An to get n-1. Then find a handle An-1n-1 in n-1, and replace n-1 in by An-1 to get n-2. Repeat this, until we reach S. rm rm rm rm rm

A Shift-Reduce Parser E  E+T | T Right-Most Derivation of id+id*id T  T*F | F E  E+T  E+T*F  E+T*id  E+F*id F  (E) | id  E+id*id  T+id*id  F+id*id  id+id*id Right-Most Sentential Form Reducing Production id+id*id F  id F+id*id T  F T+id*id E  T E+id*id F  id E+F*id T  F E+T*id F  id E+T*F T  T*F E+T E  E+T E Handles are red and underlined in the right-sentential forms.

A Stack Implementation of A Shift-Reduce Parser There are four possible actions of a shift-parser action: Shift : The next input symbol is shifted onto the top of the stack. Reduce: Replace the handle on the top of the stack by the non-terminal. Accept: Successful completion of parsing. Error: Parser discovers a syntax error, and calls an error recovery routine. Initial stack just contains only the end-marker $. The end of the input string is marked by the end-marker $.

A Stack Implementation of A Shift-Reduce Parser E  E+T | T ; T  T*F | F ; F  (E) | id RMD id+id*id ; E  E+T  E+T*F  E+T*id  E+F*id  E+id*id  T+id*id  F+id*id  id+id*id Stack Input Action $ id+id*id$ shift $id +id*id$ reduce by F  id Parse Tree $F +id*id$ reduce by T  F $T +id*id$ reduce by E  T E 8 $E +id*id$ shift $E+ id*id$ shift E 3 + T 7 $E+id *id$ reduce by F  id $E+F *id$ reduce by T  F T 2 T 5 * F 6 $E+T *id$ shift $E+T* id$ shift F 1 F 4 id $E+T*id $ reduce by F  id $E+T*F $ reduce by T  T*F id id $E+T $ reduce by E  E+T $E $ accept

Shift-Reduce Parsers There are two main categories of shift-reduce parsers LR-Parsers covers wide range of grammars. SLR – simple LR parser LR – most general LR parser LALR – intermediate LR parser (look ahead LR parser) SLR, LR and LALR work same, only their parsing tables are different. Operator-Precedence Parser simple, but only a small class of grammars.

LR Parsers The most powerful shift-reduce parsing (yet efficient) is: LR parsing. left to right right-most scanning derivation LR parsing is attractive because: LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient. An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.

LR Parsing Algorithm input stack output a1 ... ai an $ Sm Xm Sm-1 Xm-1 Action Table terminals and $ s t four different a actions t e Goto Table non-terminal t each item is a a state number

LR – Parsing Algorithm Step 1 : The stack is initialized with [0]. The current state will always be the state that is at the top of the stack. Step 2: Given the current state and the current terminal on the input stream an action is looked up in the action table. There are four cases: a shift sn: the current terminal is removed from the input stream the state n is pushed onto the stack and becomes the current state a reduce rm: the number m is written to the output stream for every symbol in the right-hand side of rule m a state is removed from the stack an accept: the string is accepted no action: a syntax error is reported Step 3 : Step 2 is repeated until either the string is accepted or a syntax error is reported.

SLR parsing A problem with LL(1) parsing is that most grammars need extensive rewriting to get them into a form that allows unique choice of production. A class of bottom-up methods for parsing called LR parsers exist which accept a much larger class of grammars. The main advantage of LR parsing is that less rewriting is required to get a grammar in acceptable form, also there are languages for which there exist LR-acceptable grammars but no LL(1) grammars. We start our discussion with SLR for the following reasons: It is simpler. In practice, LALR(1) handles few grammars that are not also handled by SLR. When a grammar is in the SLR class, the parse-tables produced by SLR are identical to those produced by LALR(1). Understanding of SLR principles is sufficient to know how a grammar should be rewritten when a LALR(1) parser generator rejects it.

SLR parsing The letters SLR stand for Simple, Left and Right. Left input is read from left to right & Right  a RMD is built. LR parsers are table-driven bottom-up parsers and use two kinds of actions involving the input stream and a stack: shift: A symbol is read from the input and pushed on the stack. reduce: On the stack, a number of symbols that are identical to the right-hand side of a production are replaced by the left-hand side of that production. When all of the input is read, the stack will have a single element, which will be the start symbol of the grammar. Our aim is to make the choice of action depend only on the next input symbol and the symbol on top of the stack. To achieve this, we construct a DFA.

SLR parsing Conceptually, this DFA reads the contents of the stack, starting from the bottom. If the DFA is in an accepting state when it reaches the top of the stack, it will cause reduction by a production that is determined by the state and the next input symbol. If the DFA is not in an accepting state, it will cause a shift. Letting the DFA read the entire stack at every action is not very efficient, so, instead, we keep track of the DFA state every time we push an element on the stack, storing the state as part of the stack element.

SLR parsing We represent the DFA as a table, where we cross-index a DFA state with a symbol (terminal or nonterminal) and find one of the following actions: shift n: Read next input symbol, push state n on the stack. go n: Push state n on the stack. reduce p: Reduce with the production numbered p. accept: Parsing has completed successfully. error: A syntax error has been detected. Note that the current state is always found at the top of the stack. Shift and reduce actions are found when a state is cross-indexed with a terminal symbol. Go actions are found when a state is cross-indexed with a nonterminal.

Constructing SLR parse tables An SLR parse table has as its core a DFA. We first construct an NFA using a new techniques to convert this into a DFA. We study the procedure by considering the grammar below.

Constructing SLR parse tables The first step is to extend the grammar with a new starting production. Doing this the grammar above yields the grammar below.

Constructing SLR parse tables The next step is to make an NFA for each production. This is done by treating both terminals and non terminals as alphabet symbols. The accepting state of each NFA is labeled with the number of the corresponding production. NFAs for the productions of the grammar above

Constructing SLR parse tables The NFAs in the figure above make transitions both on terminals and non terminals. Transitions by terminal corresponds to shift actions and transitions on non terminals correspond to go actions. A go action happens after a reduction, whereby some elements of the stack (corresponding to the right-hand side of a production) are replaced by a nonterminal (corresponding to the left-hand side of that production).

Constructing SLR parse tables To achieve this we must, whenever a transition by a nonterminal is possible, also allow transitions on the symbols on the right-hand side of a production for that nonterminal so these eventually can be reduced to the nonterminal. We do this by adding epsilon-transitions to the NFAs. Whenever there is a transition from state s to state t on a nonterminal N, we add epsilon-transitions from s to the initial states of all the NFAs for productions with N on the left-hand side. The epsilon-transitions are shown in table below.

Constructing SLR parse tables Together with epsilon-transitions, the NFAs form a single, combined NFA. This NFA has the starting state A (the starting state of the NFA for the added start production) and an accepting state for each production in the grammar.

Conversion of NFA to DFA We must now convert this NFA into a DFA using the subset construction. ε-closure (A) = {A, C, E, I, J} ε-closure (B) = {B} ε-closure (C) = {C, I, J} ε-closure (D) = {D} ε-closure (E) = { }={E} ε-closure (F) = {C, E, F, I, J} ε-closure (G) = { } ={ G} ε-closure (H) = {H} ε-closure (I) = {I} ε-closure (J) = { } ={J} ε-closure (K) = {K, I, J} ε-closure (L) = {L}

NFA to DFA conversion T along with S & empty b c T R {A, C, E, I, J} {C, E, F, I, J} {K, I, J} ∮ {B} {D} 1 2 3 {G} 4 {L} 5 {H} 6 7

DFA

Table representation Instead of showing the resulting DFA graphically, we construct a table where transitions on terminals are shown as shift actions and transitions on non terminals as go actions. The table below shows the DFA constructed from the NFA made by adding epsilon-transitions

Constructing SLR parse tables SLR DFA for the grammar above

Operator-Precedence Parser Operator grammar small, but an important class of grammars we may have an efficient operator precedence parser (a shift-reduce parser) for an operator grammar. In an operator grammar, no production rule can have:  at the right side two adjacent non-terminals at the right side. Ex: EAB EEOE EE+E | Aa Eid E*E | Bb O+|*|/ E/E | id not operator grammar not operator grammar operator grammar

Operator – Precedence Parser (OPP) Bottom-up parsers for a class of CFG can be easily developed using operator grammars. Operator grammars have the property that no production right side is empty or has two adjacent non terminals. These parser rely on the following three precedence relations: Relation Meaning a < b a yields precedence to b a = b a has the same precedence as b a > b a takes precedence over b

OPP There are two common ways of determining precedence relation hold between a pair of terminals. 1. Based on associativity and precedence of operators 2. Using operator precedence relation. For Ex, * have higher precedence than +. We make + < * and * > +

Opr Val Input String Action $ $ id+id. id$ Shift < $ $id +id Opr Val Input String Action $ $ id+id*id$ Shift < $ $id +id*id$ Shift $+ $id id*id$ Shift $+ $id id *id$ Shift $+* $id id id$ Shift $+* $id id id $ Reduce > $+ $id id2 $ Reduce $ $id + id2 $ Accept

Disadvantages of Operator Precedence Parsing It is difficult to handle operators (like - unary minus)which have different precedence (the lexical analyzer should handle the unary minus). Small class of grammars. Difficult to decide which language is recognized by the grammar. Advantages: Simple and Easy to implement powerful enough for expressions in programming languages Can be constructed by hand after understanding the grammar. Simple to debug