4. Bottom-up Parsing Chih-Hung Wang

Slides:



Advertisements
Similar presentations
Compiler Designs and Constructions
Advertisements

Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
LR(k) Grammar David Rodriguez-Velazquez CS6800-Summer I, 2009 Dr. Elise De Doncker.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Bottom-up parsing Goal of parser : build a derivation
LALR Parsing Canonical sets of LR(1) items
LESSON 24.
Syntax and Semantics Structure of programming languages.
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
LR Parsing Compiler Baojian Hua
LR(k) Parsing CPSC 388 Ellen Walker Hiram College.
Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.
1 Chapter 6 LR Parsing Techniques. 2 Shift-Reduce Parsers Reviewing some technologies: –Phrase –Simple phrase –Handle of a sentential form S AbC  bCaC.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Compilation /15a Lecture 5 1 * Syntax Analysis: Bottom-Up parsing Noam Rinetzky.
–Exercise: construct the SLR parsing table for grammar: S->L=R, S->R L->*R L->id R->L –The grammar can have shift/reduce conflict or reduce/reduce conflict.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Eliminating Left-Recursion Where some of a nonterminal’s productions are left-recursive, top-down parsing is not possible “Immediate” left-recursion can.
7. Symbol Table Chih-Hung Wang Compilers References 1. C. N. Fischer and R. J. LeBlanc. Crafting a Compiler with C. Pearson Education Inc., D.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Announcements/Reading
Programming Languages Translator
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Table-driven parsing Parsing performed by a finite state machine.
Compiler design Bottom-up parsing: Canonical LR and LALR
Fall Compiler Principles Lecture 4: Parsing part 3
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
LR Parsing – The Tables Lecture 11 Wed, Feb 16, 2005.
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Compiler design Bottom-up parsing: Canonical LR and LALR
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

4. Bottom-up Parsing Chih-Hung Wang Compilers 4. Bottom-up Parsing Chih-Hung Wang References 1. C. N. Fischer and R. J. LeBlanc. Crafting a Compiler with C. Pearson Education Inc., 2009. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)

Creating a bottom-up parser automatically Left-to-right parse, Rightmost-derivation create a node when all children are present handle: nodes representing the right-hand side of a production IDENT rest_expression expression rest_expr term aap + ( noot + mies ) 

LR(0) Parsing Theoretically important but too weak to be useful. running example: expression grammar input  expression EOF expression  expression ‘+’ term | term term  IDENTIFIER | ‘(’ expression ‘)’ short-hand notation Z  E $ E  E ‘+’ T | T T  i | ‘(’ E ‘)’

LR(0) Parsing keep track of progress inside potential handles when consuming input tokens LR items: N     initial set S0 Z   E $ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’

 Closure algorithm for LR(0) The important part is the inference rule; it predicts new handle hypotheses from the hypothesis that we are looking for a certain non-terminal, and is sometimes called prediction rule; it corresponds to an  move, in that it allows the automation to move to another state without consuming input. Reduce item: an item with the dot at the end Shift item: the others

Transition Diagram T i E i ‘+’ $ T Z   E $ E   E ‘+’ T E   T

LR(0) parsing example (1) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ S0 stack i + i $ input shift input token (i) onto the stack compute new state

LR(0) parsing example (2) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 i S1 + i $ reduce handle on top of the stack compute new state Q: what does state S1 look like? A: write down on the blackboard, including transition. Do so for each new state in the remainder of the animation.

LR(0) parsing example (3) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 T S2 + i $ i reduce handle on top of the stack compute new state

LR(0) parsing example (4) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + i $ T shift input token on top of the stack compute new state i

LR(0) parsing example (5) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i $ T shift input token on top of the stack compute new state i

LR(0) parsing example (6) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i S1 $ T reduce handle on top of the stack compute new state i Q: is it allowed to re-use state S1? A: yes.

LR(0) parsing example (7) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 T S5 $ T i reduce handle on top of the stack compute new state i Note we cannot re-use state S2.

LR(0) parsing example (8) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ E + T shift input token on top of the stack compute new state T i i

LR(0) parsing example (9) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ S6 E + T reduce handle on top of the stack compute new state T i i

LR(0) parsing example (10) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 Z E $ accept! E + T T i i

Precomputing the item set (1) Initial item set

Precomputing the item set (2) Next item set

Complete transition diagram

The LR push-down automation Two major moves and a minor move Shift move Remove the first token from the present input and pushes it onto the stack Reduce move N ->   are moved from the stack N is then pushed onto the stack Termination The input has been parsed successfully when it has been reduced to the start symbol.

GOTO and ACTION tables

LR(0) parsing of the input i+i$

Another Example of LR(0) from Fischer (1)—closure0 G1 example

Another Example of LR(0) from Fischer (2) G2 example S'S$ SID|

Another Example of LR(0) from Fischer (3)

Another Example of LR(0) from Fischer (4)

Algorithm of LR(0) Construction (1)-goto Function

Algorithm of LR(0) Construction (2)- CFSM

Algorithm of LR(0) Construction (3)- goto Table

LR comments The bottom-up parsing, unlike the top-down parsing, has no problems with left-recursion. On the other hand, bottom-up parsing has a slight problem with right-recursion.

LR(0) conflicts (1) shift-reduce conflict Exist in a state when table construction cannot use the next k tokens to decide whether to shift the next input token or call for a reduction. array indexing: T  i [ E ] T  i  [ E ] (shift) T  i  (reduce) -rule: RestExpr   Expr  Term  RestExpr (shift) RestExpr   (reduce)

LR(0) conflicts (2) reduce-reduce conflict Exist when table construction cannot use the next k tokens to distinguish between multiple reductions that cannot be applied in the inadequate state. assignment statement: Z  V := E $ V  i  (reduce) T  i  (reduce) (Different reduce rules) typical LR(0) table contains many conflicts

Handling LR(0) conflicts Use a one-token look-ahead Use a two-dimensional ACTION table different construction of ACTION table SLR(1) – Simple LR LR(1) LALR(1) – Look-Ahead LR

SLR(1) parsing A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N. reduce N   iff token  FOLLOW(N) FOLLOW(N) FOLLOW(Z) = { $ } FOLLOW(E) = { ‘+’, ‘)’, $ } FOLLOW(T) = { ‘+’, ‘)’, $ }

SLR(1) ACTION table shift

SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  T 3: E  E ‘+’ T 4: T  i 5: T  ‘(’ E ‘)’ s7 sn – shift to state n rn – reduce rule n

Example of resolving conflicts (1) A new rule T  i [E] state stack symbol / look-ahead token i + ( ) [ ] $ E T s5 s7 s1 s6 1 s3 s2 2 r1 3 s4 4 r3 5 r4 6 r2 7 s8 8 s9 9 r5 1: Z  E $ 2: E  T 3: E  E ‘+’ T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’

Example of resolving conflicts (2) state stack symbol / look-ahead token i + ( ) [ ] $ E T s5 s7 s1 s6 1 s3 s2 2 r1 3 s4 4 r3 5 r4 s10 6 r2 7 s8 8 s9 9 r5 1: Z  E $ 2: E  T 3: E  E ‘+’ T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’ s5 T  i. T  i. [E]

Another Example of LR(0) Conflicts(1)

Another Example of LR(0) Conflicts(2)

Another Example of LR(0) Conflicts(3) num plus num times num $

Another Example of LR(0) Conflicts(4) Follow(E)= {plus, $}

Unfortunately … SLR(1) leaves many shift-reduce conflicts unsolved problem: FOLLOW(N) set is a union of all all look- aheads of all alternatives of N in all states example S  A | x b A  a A b | B B  x Follow (S)={$} Follow(A) = {b, $} Follow(B) = {b, $}

SLR(1) automation

Another Example of SLR Problem Follow(A)={b, c, $}

Make the Grammar SLR(1) Follow(A1)={b, $}

Another Example 2 of SLR(1) (1) G3: SE$ EE+T|T TT*P|P PID|(E)

Another Example 2 of SLR(1) (2) SLR(1) Action Table for G3 G3: SE$ EE+T|T TT*P|P PID|(E) Follow(E)={$+)} Follow(T)={$+)*}

SLR(1) Conflicts Ex2 Fellow(Elem)={“)”,”,”,….} Elem(List, Elem) ElemScalar ListList,Elem List Elem Scalar ID Scalar(Scalar) Fellow(Elem)={“)”,”,”,….}

LR(1) parsing The LR(1) technique does not rely on FOLLOW sets, but rather keeps the specific look-ahead with each item LR(1) item: N     {}  - closure for LR(1) item sets: if set S contains an item P    N  {} then for each production rule N   S must contain the item N    {} where  = FIRST(  {} )

Creating look-ahead sets Extended definition of FIRST stes If FIRST() does not contain , FIRST({}) is just equal to FIRST(); if  can produce , FIRST({}) contain all the tokens in FIRST(), excluding , plus the tokens in .

LR(1) automation

Another Example of LR(1) Construction (1)

Another Example of LR(1) Construction (2)

Another Example of LR(1) Construction (3)

Another Example of LR(1) Construction (4)

Another Example of LR(1) Construction (5)

Another Example 2 LR(1) (1) LR(1) for G3 G3: SE$ EE+T|T TT*P|P PID|(E)

Another Example 2 LR(1) (2)

Another Example 2 LR(1) (3) Action Table

LR(1) parsing comments LR(1) automation is more discriminating than the SLR(1). In fact, it is so strong that any language that can be parsed from left to right with a one-token look-ahead in linear time can be parsed using the LR(1). LR tables are big Combine “equal” sets by merging look-ahead sets: LALR(1).

LALR(1) S3 and S10 are similar in that they are equal if one ignores the look-ahead sets, and so are S4 and S9, S6 and S11, and S8 and S12.

LALR(1) automation

Another Example for LALR(1) (1) G3: SE$ EE+T|T TT*P|P PID|(E)

Another Example for LALR(1) (2) G3: SE$ EE+T|T TT*P|P PID|(E)

Practice Derive the LALR(1) ACTION/GOTO table for the grammar in Fig. 2.95

Making a grammar LR(1) – or not Although the chances for a grammar to be LR(1) are much larger than those being SLR(1) or LL(1), one often encounters a grammar that still is not LR(1). The reason is generally that the grammar is ambiguous. For Example if_statement -> ‘if’ ‘(’ expression ‘)’ statement | ‘if’ ‘(’expression ‘)’ statement ‘else’ statement statement -> … | if_statement |… The statement: if (x>0) if (y>0) p=0; else q=0;

Possible syntax trees (1)

Possible syntax trees (2)

Other Examples of Ambiguous Grammar (1)

Other Examples of Ambiguous Grammar (2)

Resolving shift-reduce conflicts (1) The longest possible sequence of grammar symbols is taken for reduction. In a shift-reduce conflict do shift. Another example E * + E + * input: i * i + i E  E  ‘+’ E E  E ‘*’ E  reduce shift

Resolving shift-reduce conflicts (2) The use of precedences between tokens Example: a shift-reduce conflict on t: P -> t{…} (shift item) Q -> uR {…t…} (reduce item) where R is either empty or one non-terminal. If the look-ahead is t, we perform one of the following three actions: If symbol u has a higher precedence than symbol t, we reduce If t has a higher precedence than symbol u, we shift. If both have equal precedence, we also shift

Bottom-up parser: yacc/bison The most widely used parser generator is yacc Yacc is an LALR(1) parser generator A yacc look-alike called bison, provided by GNU

A very high-level view of text analysis techniques

Yacc code example (constructing parser tree)

Yacc code example (auxiliary code)