Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
LR-Grammars LR(0), LR(1), and LR(K).
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
1 Bottom Up Parsing. 2 Bottom-Up Parsing l Bottom-up parsing is more general than top-down parsing »And just as efficient »Builds on ideas in top-down.
Pertemuan 12, 13, 14 Bottom-Up Parsing
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Bottom Up Parsing.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
LR(k) Grammar David Rodriguez-Velazquez CS6800-Summer I, 2009 Dr. Elise De Doncker.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
LESSON 24.
Syntax and Semantics Structure of programming languages.
LR(k) Parsing CPSC 388 Ellen Walker Hiram College.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
 an efficient Bottom-up parser for a large and useful class of context-free grammars.  the “ L ” stands for left-to-right scan of the input; the “ R.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
111 Chapter 6 LR Parsing Techniques Prof Chung. 1.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Bottom-Up Parsing David Woolbright. The Parsing Problem Produce a parse tree starting at the leaves The order will be that of a rightmost derivation The.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
4. Bottom-up Parsing Chih-Hung Wang
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Syntax and Semantics Structure of programming languages.
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Compiler design Bottom-up parsing Concepts
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Syntax Analysis Part II
Subject Name:COMPILER DESIGN Subject Code:10CS63
Top-Down Parsing CS 671 January 29, 2008.
4d Bottom Up Parsing.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
4d Bottom Up Parsing.
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
4d Bottom Up Parsing.
4d Bottom Up Parsing.
Chap. 3 BOTTOM-UP PARSING
4d Bottom Up Parsing.
Presentation transcript:

Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011

Outline Overview Shift-Reduce Parsers LR(0) Table Construction Conflict Diagnosis Conflict Resolution and Table Construction

Overview Problems in top-town parsers –Left-recursion –Common prefixes –(Fig vs. Fig. 5.16) Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically

ACTOR

A bottom-up parser begins with parse tree’s leaves, and moves toward its root A bottom-up parser traces a rightmost derivation in reverse A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS (Fig. 4.5 & Fig. 4.6)

Bottom-up: from terminal symbols to the goal symbol Shift-reduce: two most prevalent actions – Shift symbols onto the parse stack – Reduce a string to nonterminals LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead LR parsers are more general than LL parsers –Yacc: LR parser generator

Shift-Reduce Parsers LR parsers and rightmost derivations –LR parses construct rightmost derivations in reverse –Fig. 6.2 LR parsing as knitting –How the RHS of a production is found –Fig. 6.1

In Fig. 6.1: –Right needle: unprocessed portion of the string –Left needle: parser’s stack (processed portion) Operations – Shift : transfers a symbol from right needle to left needle – Reduction : symbols at the top of the parse stack (left needle) A   –(Fig. 6.1)

LR Parsing Engine A simple driver for shift-reduce parser –Fig. 6.3 –Driven by a table (Sec ) –Indexed by the parser’s current state and the next input symbol Current state: parser stack –Shift and reduce actions are performed until Accepted: input is reduced to the goal symbol Error: no valid actions found

PUSH PEEK POP ADVANCE PREPEND ERROR

LR Parse Table Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction –How to identify the handle –Which production to employ – (Fig. 6.4 & Fig. 6.5)

In Fig. 6.5: –[s]: Shift to state s –r: reduction by rule r –Blank: error actions A bottom-up parse of “a b b d c $” –Fig. 6.6 & Fig. 6.7 –A rightmost derivation in reverse –Shift actions are implied by inability to perform a useful reduction Tokens are shifted until a handle appears

LR(k) Parsing Concept of LR parsing introduced by Knuth in 1965 LR(k) –LR(0): number of symbols lookahead used in constructing the parse table –LR(0) and LR(1): one symbol lookahead at parse time –Number of columns in parse table: n k

Properties of LR(k) parsers –Shifting symbols and examining lookahead until the end of handle is found –Handle is reduced to a nonterminal –Determine whether to shift or reduce, based on the symbols already shifted (left context) and the next k lookahead symbols (right context) A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language –Deterministic: each cell in LR parse table contains only one entry

Formal Definition of LR(k) Grammars A grammar is LR(k) iff. the following conditions imply  Ay=  Bx –S=>* rm  Aw => rm  w –S=>* rm  Bx => rm  y –First k (w)=First k (y) LR(k) parsers can always determine the correct reduction (A   ) given –The left context (  ) up to the end of the handle –The next k symbols (First k (w)) of the input

LR(0) Table Construction (Fig. 6.2) –E  plus E E LR(0) item –A grammar production with a bookmark that indicates the current progress through the production’s RHS Fresh: E . plus E E Reducible: E  plus E E. –(Fig. 6.8)

Parser state: a set of LR(0) items LR(0) construction algorithm –Fig. 6.9 & Fig –ComputeGoto Closure of state s Transitions from s –E.g.: Fig Kernel of state s A DFA called CFSM (characteristic finite-state machine)

OMPUTE DD TATE DVANCE OT DD TATE RODUCTIONS OR XTRACT LEMENT OMPUTE OTO

DVANCE OT RODUCTIONS OR OMPUTE OTO LOSURE DD TATE

CFSM recognizes its grammar’s viable prefixes –Viable prefix: any prefix that does not extend beyond its handle –Accept state in CFSM: a viable prefix that ends with a handle Reduction (Fig. 6.12)

For LR(0) grammar, the following properties –Given a syntactically correct input string, CFSM will block only in double-boxed states –There’s at most one item in any double-boxed state –If the input string is syntactically invalid, parser will enter a state that the offending symbol cannot be shifted To complete that parse table –(Fig & 6.14) –E.g.: (Fig. 6.15)

OMPLETE ABLE OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY EPORT ONFLICT

RY ULE N TATE SSERT NTRY OMPUTE OOKAHEAD RY ULE N TATE

Conflict Diagnosis A parse table conflict arises when the table- construction method cannot decide between multiple alternatives for some table entry –Shift/reduce conflicts –Reduce/reduce conflicts Reasons for conflicts –Grammar is ambiguous –Grammar is no ambiguous, but current table-building approach cannot resolve the conflict Given more lookahead Use a more powerful method

Ambiguous Grammars

Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts –Determine a sequence of vocabulary symbols that cause the parse to move from the start state to the inadequate state E plus E –We obtain a snapshot E plus E. plus E (Fig. 6.17)

Top parse tree –Reduction –Left-associative grouping for addition Bottom parse tree –Shift –Right-associative grouping for addition -> we eliminate the ambiguity by creating a grammar that favors left-association –(Fig. 6.18)

Grammars that are not LR(k)

Reduce/reduce conflict –Start=>rm Exprs $ =>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $

Conflict Resolution and Table Construction Increasingly sophisticated lookahead techniques to resolve conflicts –SLR(k): simple –LALR(k) –LR(k): the most powerful

SLR(k) Table Construction SLR(k): Simple LR with k tokens of lookahead –A grammar that is not LR(0): Fig –Input string: num plus num times num $

Replacing a terminal by a nonterminal whose role in the grammar in equivalent –(Fig. 6.21) LR(0) construction: (Fig. 6.22) –Shift/reduce conflict of state 6 Shift: (can continue as in Fig.6.21) Reduce: block in state 3 –E time num $ is not a valid sentential form –E -> E plus T is appropriate under some conditions

For sentential forms –E plus T $ –E plus T plus num $ If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form –plus  Follow(E) –TryRuleInState(): (Fig.6.23) –SLR(1) parse table: (Fig. 6.24)

RY ULE N TATE SSERT NTRY

LALR(k) Table Construction Sometimes SLR(k) construction fails only because the Follow k information is not rule specific –(Fig.6.25) –Grammar is not ambiguous –State 3 has shift/reduce conflict –Follow k (A) = {b$ k-1, c$ k-1, $ k } Insufficient to resolve the conflict (Fig. 6.26)

LALR(k): Lookahead Ahead LR with k tokens of lookahead –Same number of rows (states) as LR(0) table –The most popular LR table-building Balance of power and efficiency –Redefine two methods TryRuleInState: ItemFollow set (Fig. 6.27) ComputeLookahead: lookahead propagation graph (Fig. 6.28)

RY ULE N TATE SSERT NTRY

LALR Propagation Graph Each LR(0) item occurs at most once in any state –The pair (s, A-> .  ): a vertex in the graph –Edge between items i and j Symbols that follow the reducible form of item i should be included in the corresponding set of symbols for item j

For item A-> .B , any symbol in First(  ) can follow each closure item B->.  Propagation edges –An edge is placed from an item A-> .B  in state s to item A->  B.  in state t –When  =>*λ, any symbol that can follow A can also follow B Example: –Building propagation graph (Fig. 6.29) –Evaluating propagation graph (Fig. 6.30)

In general, multiple passes can be required for convergence –(Fig. 6.31) –(Fig. 6.32) –(Fig. 6.33) In practice, LALR(1) lookahead computations converge quickly, usually in one or two passes LALR(1) is a powerful parsing method LALR(1) grammars are available for all popular programming languages

LR(k) Table Construction LR(k) parsing: not very practical because –LR(1) tables are typically much larger than LR(0) tables (for SLR(k) and LALR(k)) –It’s rare that LR(1) can handle a grammar for which LALR(1) fails Ex. (Fig. 6.35) When LALR(1) fails –Grammar is ambiguous: LR(k) cannot help –More lookahead needed: LR(k) can help, but LALR(k) might suffice –No amount of lookahead suffices: LR(k) cannot help

E.g. item 14 –ItemFollow(14)={rb, rp} –ItemFollow(15)={rb, rp} –Reduce/reduce conflict

A state in LR(k) is uniquely identified not only by its kernel, but also its lookahead –For LR(k), we extend an item’s notation from A-> .  to [A-> . , w] For LR(1), w is a symbol that can follow A after reduction For LR(k), w is a k-length string that can follow A after reduction –The number of states in LR(k) is usually much larger We can also begin with LALR(1), and split states selectively

Thanks for Your Attention!