Compilation 0368-3133 (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compiler Designs and Constructions
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
 CS /11/12 Matthew Rodgers.  What are LL and LR parsers?  What grammars do they parse?  What is the difference between LL and LR?  Why do.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Chapter 3 Syntax Analysis
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Mooly Sagiv and Roman Manevich School of Computer Science
Lecture 04 – Syntax analysis: top-down and bottom-up parsing
Theory of Compilation Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
LR Parsing Compiler Baojian Hua
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
Compilation /15a Lecture 5 1 * Syntax Analysis: Bottom-Up parsing Noam Rinetzky.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.
10/10/2002© 2002 Hal Perkins & UW CSED-1 CSE 582 – Compilers LR Parsing Hal Perkins Autumn 2002.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Syntax and Semantics Structure of programming languages.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Compiler design Bottom-up parsing Concepts
Bottom-Up Parsing.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Compiler design Bottom-up parsing: Canonical LR and LALR
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
Syntax Analysis Part II
4 (c) parsing.
Fall Compiler Principles Lecture 4: Parsing part 3
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lecture 4: Syntax Analysis: Botom-Up Parsing Noam Rinetzky
Compiler Design 7. Top-Down Table-Driven Parsing
Fall Compiler Principles Lecture 4: Parsing part 3
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Chap. 3 BOTTOM-UP PARSING
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav

What is a Compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 2

Conceptual Structure of a Compiler 3 Executable code exe Source text txt Semantic Representation Backend Compiler Frontend Lexical Analysis Syntax Analysis Parsing Semantic Analysis Intermediate Representation (IR) Code Generation wordssentences

From scanning to parsing 4 ((23 + 7) * x) )x*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: E ... | Id Id  ‘a’ |... | ‘z’ Op(*) Id(b) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error

Top-Down vs Bottom-Up Top-down (predict match/scan-complete ) 5 to be read… already read… A A ab Aabc aacbb$ A  aAb|c

Top-Down vs Bottom-Up Top-down (predict match/scan-complete ) 6  Bottom-up (shift reduce) to be read… already read… A ab A c ab A aacbb$ A  aAb|c

Model of an LR parser 7 LR Parser q5 + q2 T q0 Stack $id+ Output (AST/Error) state symbol ACTIONGOTO Input Terminals and Non-terminals Control Tables Top Remainder of text to be processed State controls decisions Initial stack contains q0

LR(0) parser tables 8 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2reduce: Z  E$ q3q5q7q4shift q4reduce: E  E+T q5reduce: T  i q6reduce: E  T q7q5q7q8q6shift q8q3q9shift q9reduce: T  E GOTO Table ACTION Table Empty cell = error move

LR(0) parser tables Shift action row –Tells which state to GOTO for current token –Blank entry indicates an error Reduce action row –Tells which rule to reduce with Independent of current token –GOTO entries are blank 9 Stateid+()$ETaction q0q5q7q1q6shift q5reduce: T  i

Stateid+()$ETaction q0q5q7q1q6shift q5reduce: T  i Shift Move Remove first token from input Push it on the stack Compute next state based on GOTO table Push new state on the stack If new state is error – report error 10 id+ $ input q0 stack +id$ input q0 stack shift idq5

Reduce Move (using N  α) Symbols in α and their following states are removed from stack New state computed based on GOTO table (using top of stack, before pushing N) N is pushed on the stack New state pushed on top of N 11 +id$ input q0 stack idq5 Reduce: T  id +id$ input q0 stack T Stateid+()$ETaction q0q5q7q1q6shift q5reduce: T  i

Reduce Move (using N  α) Symbols in α and their following states are removed from stack New state computed based on GOTO table (using top of stack, before pushing N) N is pushed on the stack New state pushed on top of N 12 +id$ input q0 stack idq5 Reduce: T  id +id$ input q0 stack Tq6 Stateid+()$ETaction q0q5q7q1q6shift q5reduce: T  i

GOTO/ACTION table 13 Statei+()$ET q0s5s7s1s6 q1s3s2 q2r1 q3s5s7s4 q4r3 q5r4 q6r2 q7s5s7s8s6 q8s3s9 q9r5 (1)Z  E $ (2)E  T (3)E  E + T (4)T  i (5)T  ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m

Parsing id+id$ 14 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) StackInputAction 0id + id $s5 Initialize with state 0

Parsing id+id$ 15 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 StackInputAction 0id + id $s5 Initialize with state 0 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 16 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 17 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 pop id 5 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 18 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 push T 6 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 19 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 20 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 21 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 22 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E id 5$r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 23 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E id 5$r4 0 E T 4$r3 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 24 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E id 5$r4 0 E T 4$r3 0 E 1$s2 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Constructing an LR parsing table Construct a transition diagram (deterministic FSM) –States = sets of LR(0) items –Transitions = one-step derivation If there are conflicts – stop Fill table entries from diagram 25

Terminology: Reductions & Handles The opposite of derivation is called reduction –Let A  α be a production rule –Derivation: βAµ  βαµ –Reduction: βαµ  βAµ A handle is the reduced substring –α is the handles for βαµ 26

LR(0) Items The items of a grammar are obtained by placing a dot at every position in every production 27 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) 1: S   E$ 2: S  E  $ 3: S  E $  4: E   T 5: E  T  6: E   E + T 7: E  E  + T 8: E  E +  T 9: E  E + T  10: T   id 11: T  id  12: T   (E) 13: T  (  E) 14: T  (E  ) 15: T  (E)  GrammarLR(0) items

LR(0) Item - Intuition 28 N  α  β Already matched To be matched Input Hypothesis about αβ is the rule being reduced, and so far we’ve matched α and we expect to see β

Types of LR(0) items 29 N  α  β Shift Item N  αβ  Reduce Item

LR(0) automaton example 30 Z   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  Z  E$  Z  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( reduce state shift state

Computing item sets Initial set –Z is in the start symbol –  -closure({ Z   α | Z  α is in the grammar } ) Next set from a set S and the next symbol X –step(S,X) = { N  αX  β | N  α  Xβ in the item set S} –nextSet(S,X) =  -closure(step(S,X)) 31

Operations for transition diagram construction Initial = {S’   S$} For an item set I Closure(I) = Closure(I) ∪ {X   µ is in grammar| N  α  Xβ in I} Goto(I, X) = { N  αX  β | N  α  Xβ in I} 32

Initial example Initial = {S   E $} 33 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

Closure example Initial = {S   E $} Closure({S   E $}) = { S   E $ E   T E   E + T T   id T   ( E ) } 34 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

Goto example Initial = {S   E $} Closure({S   E $}) = { S   E $ E   T E   E + T T   id T   ( E ) } Goto({S   E $, E   E + T, T   id}, E) = {S  E  $, E  E  + T} 35 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

Constructing the transition diagram Start with state 0 containing item Closure({S   E $}) Repeat until no new states are discovered –For every state p containing item set Ip, and symbol N, compute state q containing item set Iq = Closure(goto(Ip, N)) 36

LR(0) automaton example 37 Z   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  Z  E$  Z  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( reduce state shift state

Automaton construction example 38 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ q0q0 Initialize

39 S   E$ E   T E   E + T T   i T   (E) q0q0 apply Closure Automaton construction example (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Automaton construction example 40 S   E$ E   T E   E + T T   i T   (E) q0q0 E  T  q6q6 T T  (  E) E   T E   E + T T   i T   (E) ( T  i  q5q5 i S  E  $ E  E  + T q1q1 E (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

41 S   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  S  E$  Z  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( terminal transition corresponds to shift action in parse table non-terminal transition corresponds to goto action in parse table a single reduce item corresponds to reduce action Automaton construction example (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Are we done? Can make a transition diagram for any grammar Can make a GOTO table for every grammar Cannot make a deterministic ACTION table for every grammar 42

LR(0) conflicts 43 Z  E $ E  T E  E + T T  i T  ( E ) T  i[E] Z   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  T  i  [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …

LR(0) conflicts 44 Z  E $ E  T E  E + T T  i V  i T  ( E ) Z   E$ E   T E   E + T T   I V   i T   (E) T   i[E] T  i  V  i  q0q0 q5q5 T ( i E reduce/reduce conflict … … …

LR(0) conflicts Any grammar with an  -rule cannot be LR(0) Inherent shift/reduce conflict –A   – reduce item –P  α  Aβ – shift item –A   can always be predicted from P  α  Aβ 45

Conflicts Can construct a diagram for every grammar but some may introduce conflicts shift-reduce conflict: an item set contains at least one shift item and one reduce item reduce-reduce conflict: an item set contains two reduce items 46

LR variants LR(0) – what we’ve seen so far SLR(0) –Removes infeasible reduce actions via FOLLOW set reasoning LR(1) –LR(0) with one lookahead token in items LALR(0) –LR(1) with merging of states with same LR(0) component 47

LR (0) GOTO/ACTIONS tables 48 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2Z  E$ q3q5q7q4Shift q4E  E+T q5TiTi q6ETET q7q5q7q8q6shift q8q3q9shift q9TETE GOTO Table ACTION Table ACTION table determined only by state, ignores input GOTO table is indexed by state and a grammar symbol from the stack

SLR parsing A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N A reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) –If b is not in FOLLOW(N) we just proved there is no derivation S  * βNb. –Thus, it is safe to remove the reduce item from the conflicted state Differs from LR(0) only on the ACTION table –Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take 49

SLR action table 50 Statei+()[]$ 0shift 1 accept 2 3shift 4E  E+T 5TiTiTiTishiftTiTi 6ETETETETETET 7 8 9T  (E) vs. stateaction q0shift q1shift q2 q3shift q4E  E+T q5TiTi q6ETET q7shift q8shift q9TETE SLR – use 1 token look-aheadLR(0) – no look-ahead … as before… T  i T  i[E] Lookahead token from the input

LR(1) grammars In SLR: a reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N –Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of follows computed per item 51

LR(1) items LR(1) item is a pair –LR(0) item –Lookahead token Meaning –We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example –The production L  id yields the following LR(1) items 52 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L [L → ● id] [L → id ●] LR(0) items LR(1) items

LALR(1) LR(1) tables have huge number of entries Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice –Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 53

Summary 54

Summary Bottom up derivation LR(k) can decide on a reduce after seeing the entire right side of the rule plus k look-ahead tokens. –We focused on LR(0) Using a table and a stack to derive LR Items and the automaton Creating the table from the automaton LR parsing with pushdown automata LR(0), SLR, LR(1) – different kinds of LR items, same basic algorithm 55

Broad kinds of parsers Parsers for arbitrary grammars –Earley’s method, CYK method –Usually, not used in practice (though might change) Top-Down parsers –Construct parse tree in a top-down matter –Find the leftmost derivation Bottom-Up parsers –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order 56

LR is More Powerful than LL Any LL(k) language is also in LR(k), i.e., LL(k) ⊂ LR(k). –LR is more popular in automatic tools But less intuitive Also, the lookahead is counted differently in the two cases –In an LL(k) derivation the algorithm sees the left-hand side of the rule + k input tokens and then must select the derivation rule –In LR(k), the algorithm “sees” all right-hand side of the derivation rule + k input tokens and then reduces LR(0) sees the entire right-side, but no input token 57

Grammar Hierarchy 58 Non-ambiguous CFG LR(1) LALR(1) SLR(1) LL(1) LR(0)