5. Bottom-Up Parsing Chih-Hung Wang

Slides:



Advertisements
Similar presentations
Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Advertisements

Compiler Designs and Constructions
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Mooly Sagiv and Roman Manevich School of Computer Science
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
LR(k) Grammar David Rodriguez-Velazquez CS6800-Summer I, 2009 Dr. Elise De Doncker.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
LR Parsing Compiler Baojian Hua
Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
4. Bottom-up Parsing Chih-Hung Wang
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
7. Symbol Table Chih-Hung Wang Compilers References 1. C. N. Fischer and R. J. LeBlanc. Crafting a Compiler with C. Pearson Education Inc., D.
COMPILER CONSTRUCTION
Lecture 7 Syntax Analysis (5) Operator-Precedence Parsing
Announcements/Reading
CS 326 Programming Languages, Concepts and Implementation
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Compiler design Bottom-up parsing Concepts
50/50 rule You need to get 50% from tests, AND
Bottom-Up Parsing.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Unit-3 Bottom-Up-Parsing.
8. Symbol Table Chih-Hung Wang
Parsing IV Bottom-up Parsing
Table-driven parsing Parsing performed by a finite state machine.
CS 404 Introduction to Compiler Design
Compiler Construction
Compiler design Bottom-up parsing: Canonical LR and LALR
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
4 (c) parsing.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Parsing #2 Leonidas Fegaras.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Bottom Up Parsing.
Parsing #2 Leonidas Fegaras.
3. Formal Grammars and and Top-down Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Parsing Bottom-Up LR Table Construction.
4d Bottom Up Parsing.
Parsing Bottom-Up LR Table Construction.
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 7, 10/09/2003 Prof. Roy Levow.
Compiler design Bottom-up parsing: Canonical LR and LALR
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

5. Bottom-Up Parsing Chih-Hung Wang Compilers 5. Bottom-Up Parsing Chih-Hung Wang References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)

Creating a bottom-up parser automatically Left-to-right parse, Rightmost-derivation create a node when all children are present handle: nodes representing the right-hand side of a production IDENT rest_expression expression rest_expr term aap + ( noot + mies ) 

LR(0) Parsing Theoretically important but too weak to be useful. running example: expression grammar input  expression EOF expression  expression ‘+’ term | term term  IDENTIFIER | ‘(’ expression ‘)’ short-hand notation Z  E $ E  E ‘+’ T | T T  i | ‘(’ E ‘)’

LR(0) Parsing keep track of progress inside potential handles when consuming input tokens LR items: N     initial set S0 Z   E $ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’

 Closure algorithm for LR(0) The important part is the inference rule; it predicts new handle hypotheses from the hypothesis that we are looking for a certain non-terminal, and is sometimes called prediction rule; it corresponds to an  move, in that it allows the automation to move to another state without consuming input. Reduce item: an item with the dot at the end Shift item: the others

Transition Diagram T i E i ‘+’ $ T Z   E $ E   E ‘+’ T E   T

LR(0) parsing example (1) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ S0 stack i + i $ input shift input token (i) onto the stack compute new state

LR(0) parsing example (2) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 i S1 + i $ reduce handle on top of the stack compute new state Q: what does state S1 look like? A: write down on the blackboard, including transition. Do so for each new state in the remainder of the animation.

LR(0) parsing example (3) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 T S2 + i $ i reduce handle on top of the stack compute new state

LR(0) parsing example (4) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + i $ T shift input token on top of the stack compute new state i

LR(0) parsing example (5) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i $ T shift input token on top of the stack compute new state i

LR(0) parsing example (6) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i S1 $ T reduce handle on top of the stack compute new state i Q: is it allowed to re-use state S1? A: yes.

LR(0) parsing example (7) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 T S5 $ T i reduce handle on top of the stack compute new state i Note we cannot re-use state S2.

LR(0) parsing example (8) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ E + T shift input token on top of the stack compute new state T i i

LR(0) parsing example (9) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ S6 E + T reduce handle on top of the stack compute new state T i i

LR(0) parsing example (10) Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 Z E $ accept! E + T T i i

Precomputing the item set (1) Initial item set

Precomputing the item set (2) Next item set

Complete transition diagram

The LR push-down automation Two major moves and a minor move Shift move Remove the first token from the present input and pushes it onto the stack Reduce move N ->   are moved from the stack N is then pushed onto the stack Termination The input has been parsed successfully when it has been reduced to the start symbol.

GOTO and ACTION tables

LR(0) parsing of the input i+i$

Another Example of LR(0) from Fischer (1)

Another Example of LR(0) from Fischer (2)

Another Example of LR(0) from Fischer (3)

Algorithm of LR(0) Construction (1)

Algorithm of LR(0) Construction (2)

LR(0) Table

LR comments The bottom-up parsing, unlike the top-down parsing, has no problems with left-recursion. On the other hand, bottom-up parsing has a slight problem with right-recursion.

LR(0) conflicts (1) shift-reduce conflict Exist in a state when table construction cannot use the next k tokens to decide whether to shift the next input token or call for a reduction. array indexing: T  i [ E ] T  i  [ E ] (shift) T  i  (reduce) -rule: RestExpr   Expr  Term  RestExpr (shift) RestExpr   (reduce)

LR(0) conflicts (2) reduce-reduce conflict Exist when table construction cannot use the next k tokens to distinguish between multiple reductions that cannot be applied in the inadequate state. assignment statement: Z  V := E $ V  i  (reduce) T  i  (reduce) (Different reduce rules) typical LR(0) table contains many conflicts

Handling LR(0) conflicts Use a one-token look-ahead Use a two-dimensional ACTION table different construction of ACTION table SLR(1) – Simple LR LR(1) LALR(1) – Look-Ahead LR

SLR(1) parsing A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N. reduce N   iff token  FOLLOW(N) FOLLOW(N) FOLLOW(Z) = { $ } FOLLOW(E) = { ‘+’, ‘)’, $ } FOLLOW(T) = { ‘+’, ‘)’, $ }

SLR(1) ACTION table shift

SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  T 3: E  E ‘+’ T 4: T  i 5: T  ‘(’ E ‘)’ s7 sn – shift to state n rn – reduce rule n

Example of resolving conflicts (1) A new rule T  i [E] state stack symbol / look-ahead token i + ( ) [ ] $ E T s5 s7 s1 s6 1 s3 s2 2 r1 3 s4 4 r3 5 r4 6 r2 7 s8 8 s9 9 r5 1: Z  E $ 2: E  T 3: E  E ‘+’ T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’

Example of resolving conflicts (2) state stack symbol / look-ahead token i + ( ) [ ] $ E T s5 s7 s1 s6 1 s3 s2 2 r1 3 s4 4 r3 5 r4 s10 6 r2 7 s8 8 s9 9 r5 1: Z  E $ 2: E  T 3: E  E ‘+’ T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’ s5 T  i. T  i. [E]

Another Example of LR(0) Conflicts(1)

Another Example of LR(0) Conflicts(2)

Another Example of LR(0) Conflicts(3) num plus num times num $

Another Example of LR(0) Conflicts(4) Follow(E)= {plus, $}

Unfortunately … SLR(1) leaves many shift-reduce conflicts unsolved problem: FOLLOW(N) set is a union of all all look- aheads of all alternatives of N in all states example S  A | x b A  a A b | B B  x Follow (S)={$} Follow(A) = {b, $} Follow(B) = {b, $}

SLR(1) automation

Another Example of SLR Problem Follow(A)={b, c, $}

Make the Grammar SLR(1) Follow(A1)={b, $}

LR(1) parsing The LR(1) technique does not rely on FOLLOW sets, but rather keeps the specific look-ahead with each item LR(1) item: N     {}  - closure for LR(1) item sets: if set S contains an item P    N  {} then for each production rule N   S must contain the item N    {} where  = FIRST(  {} )

Creating look-ahead sets Extended definition of FIRST stes If FIRST() does not contain , FIRST({}) is just equal to FIRST(); if  can produce , FIRST({}) contain all the tokens in FIRST(), excluding , plus the tokens in .

LR(1) automation

Another Example of LR(1) Construction (1)

Another Example of LR(1) Construction (2)

Another Example of LR(1) Construction (3)

Another Example of LR(1) Construction (4)

Another Example of LR(1) Construction (5)

LR(1) parsing comments LR(1) automation is more discriminating than the SLR(1). In fact, it is so strong that any language that can be parsed from left to right with a one-token look-ahead in linear time can be parsed using the LR(1). LR tables are big Combine “equal” sets by merging look-ahead sets: LALR(1).

LALR(1) S3 and S10 are similar in that they are equal if one ignores the look-ahead sets, and so are S4 and S9, S6 and S11, and S8 and S12.

LALR(1) automation

Practice Derive the LALR(1) ACTION/GOTO table for the grammar in Fig. 2.95

Making a grammar LR(1) – or not Although the chances for a grammar to be LR(1) are much larger than those being SLR(1) or LL(1), one often encounters a grammar that still is not LR(1). The reason is generally that the grammar is ambiguous. For Example if_statement -> ‘if’ ‘(’ expression ‘)’ statement | ‘if’ ‘(’expression ‘)’ statement ‘else’ statement statement -> … | if_statement |… The statement: if (x>0) if (y>0) p=0; else q=0;

Possible syntax trees (1)

Possible syntax trees (2)

Other Examples of Ambiguous Grammar (1)

Other Examples of Ambiguous Grammar (2)

Resolving shift-reduce conflicts (1) The longest possible sequence of grammar symbols is taken for reduction. In a shift-reduce conflict do shift. Another example E * + E + * input: i * i + i E  E  ‘+’ E E  E ‘*’ E  reduce shift

Resolving shift-reduce conflicts (2) The use of precedences between tokens Example: a shift-reduce conflict on t: P -> t{…} (shift item) Q -> uR {…t…} (reduce item) where R is either empty or one non-terminal. If the look-ahead is t, we perform one of the following three actions: If symbol u has a higher precedence than symbol t, we reduce If t has a higher precedence than symbol u, we shift. If both have equal precedence, we also shift

Bottom-up parser: yacc/bison The most widely used parser generator is yacc Yacc is an LALR(1) parser generator A yacc look-alike called bison, provided by GNU

A very high-level view of text analysis techniques

Yacc code example (constructing parser tree)

Yacc code example (auxiliary code)