Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.

Slides:



Advertisements
Similar presentations
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Advertisements

Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Abstract Syntax Mooly Sagiv html:// 1.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
Lecture 04 – Syntax analysis: top-down and bottom-up parsing
Theory of Compilation Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1.
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom Up Parsing.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Compiler Construction Parsing II Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
LR Parsing Compiler Baojian Hua
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Syntax and Semantics Structure of programming languages.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Compilation /15a Lecture 5 1 * Syntax Analysis: Bottom-Up parsing Noam Rinetzky.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Mooly Sagiv and Roman Manevich School of Computer Science
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
4. Bottom-up Parsing Chih-Hung Wang
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Announcements/Reading
Programming Languages Translator
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Table-driven parsing Parsing performed by a finite state machine.
Compiler design Bottom-up parsing: Canonical LR and LALR
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
Syntax Analysis Part II
Fall Compiler Principles Lecture 4: Parsing part 3
Subject Name:COMPILER DESIGN Subject Code:10CS63
Parsing #2 Leonidas Fegaras.
Lecture 4: Syntax Analysis: Botom-Up Parsing Noam Rinetzky
Parsing #2 Leonidas Fegaras.
Fall Compiler Principles Lecture 4: Parsing part 3
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University

Today Shift-reduce parsing – How does a shift-reduce parser work? – How do we construct a shift-reduce parse table? – LR(1), SLR(1), LALR(1) – Meaning of shift-reduce / reduce-reduce conflicts – Behavior on left-recursion/right-recursion Automatic parser generation (CUP) – Handling ambiguity 2

Model of an LR parser 3 LR Parsing program 0 T id 5 Stack $id+ + Output state symbol gotoaction Input

LR parser stack Sequence made of state, symbol pairs For instance a possible stack for the grammar S  E $ E  T E  E + T T  id T  ( E ) could be: 0 T id 5 4

Form of LR parsing table 5 state terminals non-terminals shift/reduce actions goto part snsn rkrk shift state nreduce by rule k gmgm goto state m acc accept error

LR parser table example 6 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) gotoactionSTATE TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9

Shift move 7 LR Parsing program q Stack $…a… Output gotoaction Input If action[q, a] = sn

Result of shift If action[q, a] = sn 8 LR Parsing program n a q Stack $…a… Output gotoaction Input

Reduce move If action[qn, a] = rk Production: (k) A  β If β = σ 1 … σ n Top of stack looks like q1 σ 1 … qn σ n goto[q, A] = qm 9 LR Parsing program qn … q … Stack $…a… Output gotoaction Input 2*|β|

Result of reduce move If action[qn, a] = rk Production: (k) A  β If β = σ 1 … σ n Top of stack looks like q1 σ 1 … qn σ n goto[q, A] = qm 10 LR Parsing program Stack Output gotoaction 2*|β| qm A q … $…a… Input

Accept move 11 LR Parsing program q Stack $a… Output gotoaction Input If action[q, a] = accept parsing completed

Error move 12 LR Parsing program q Stack $…a… Output gotoaction Input If action[q, a] = error parsing discovered a syntactic error

Parsing id+id$ 13 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) StackInputAction 0id + id $s5 Initialize with state 0

Parsing id+id$ 14 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) StackInputAction 0id + id $s5 Initialize with state 0

Parsing id+id$ 15 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 16 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) pop id 5

Parsing id+id$ 17 StackInputAction 0id + id $s5 0 id 5+ id $r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) push T 6

Parsing id+id$ 18 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 19 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 20 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 21 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E id 5$r4 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 22 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E id 5$r4 0 E T 4$r3 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Parsing id+id$ 23 StackInputAction 0id + id $s5 0 id 5+ id $r4 0 T 6+ id $r2 0 E 1+ id $s3 0 E 1 + 3id $s5 0 E id 5$r4 0 E T 4$r3 0 E 1$s2 gotoactionS TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

Constructing an LR parsing table 1.Construct a (determinized) transition diagram from LR items 2.If there are conflicts – stop 3.Fill table entries from diagram 24

Form of LR(0) items N  α  β Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β 25

Types of LR(0) items N  α  β Shift Item N  αβ  Reduce Item 26

LR(0) items enumeration example All items can be obtained by placing a dot at every position for every production: 27 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) 1: S   E$ 2: S  E  $ 3: S  E $  4: E   T 5: E  T  6: E   E + T 7: E  E  + T 8: E  E +  T 9: E  E + T  10: T   i 11: T  i  12: T   (E) 13: T  (  E) 14: T  (E  ) 15: T  (E)  Grammar LR(0) items

Operations for transition diagram construction Initial = {S’   S$} For an item set I Closure(I) = Closure(I) + {X   µ is in grammar| N  α  Xβ in I} Goto(I, X) = { N  αX  β | N  α  Xβ in I} 28

Initial example Initial = { S   E $ } 29 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

Closure example Initial = { S   E $ } Closure({ S   E $ }) = S   E $ E   T E   E + T T   id T   ( E ) 30 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

Goto example Initial = { S   E $ } Closure({ S   E $ }) = S   E $ E   T E   E + T T   id T   ( E ) Goto({S   E $, E   E + T, T   id}, E) = {S  E  $, E  E  + T} 31 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) Grammar

Constructing the transition diagram 1.Start with state 0 containing item Closure({ S   E $ }) 2.Repeat until no new states are discovered – For every state p containing item set Ip, and symbol N, compute state q containing item set Iq = Closure(goto(Ip, N)) 32

Automaton construction example 33 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  S  E$  S  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i (

Automaton construction example 34 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ q0q0 Initialize

Automaton construction example 35 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) q0q0 apply Closure

Automaton construction example 36 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) q0q0 E  T  q6q6 T T  (  E) E   T E   E + T T   i T   (E) ( T  i  q5q5 i S  E  $ E  E  + T q1q1 E

Automaton construction example 37 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) S   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  S  E$  Z  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( terminal transition corresponds to shift action in parse table non-terminal transition corresponds to goto action in parse table a single reduce item corresponds to reduce action

Conflicts Can construct a diagram for every grammar but some may introduce conflicts shift-reduce conflict: an item set contains at least one shift item and one reduce item reduce-reduce conflict: an item set contains two reduce items 38

LR(0) conflicts 39 S  E $ E  T E  E + T T  i T  ( E ) T  i[E] S   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  T  i  [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …

LR(0) conflicts 40 S  E $ E  T E  E + T T  i V  i T  ( E ) S   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  V  i  q0q0 q5q5 T ( i E reduce/reduce conflict … … …

LR(0) conflicts Any grammar with an  -rule cannot be LR(0) Inherent shift/reduce conflict – A   – reduce item – P  α  Aβ – shift item – A   can always be predicted from P  α  Aβ 41

LR variants LR(0) – what we’ve seen so far SLR(0) – Removes infeasible reduce actions via FOLLOW set reasoning LR(1) – LR(0) with one lookahead token in items LALR(0) – LR(1) with merging of states with same LR(0) component 42

SRL parsing A handle should not be reduced to a non- terminal N if the lookahead is a token that cannot follow N A reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) – If b is not in FOLLOW(N) we just proved there is no derivation S =>* βNαb and thus it is safe to remove the reduce item from the conflicted state Differs from LR(0) only on the ACTION table – Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take 43

SLR action table Statei+()[]$ 0shift 1 accept 2 3shift 4E  E+T 5TiTiTiTishiftTiTi 6ETETETETETET 7 8 9T  (E) vs. stateaction q0shift q1shift q2 q3shift q4E  E+T q5TiTi q6ETET q7shift q8shift q9TETE SLR – use 1 token look-aheadLR(0) – no look-ahead … as before… T  i T  i[E] Lookahead token from the input 44

Going beyond SLR(0) Some common language constructs introduce conflicts even for SLR (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L 45

S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S’ → S  S → L  = R R → L  S → R  L → *  R R →  L L →  * R L →  id L → id  S → L =  R R →  L L →  * R L →  id L → * R  R → L  S → L = R  S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5 46

shift/reduce conflict S → L  = R vs. R → L  FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve conflict 47 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

Inputs requiring shift/reduce For the input id the rightmost derivation S’ => S => R => L => id requires reducing in q2 For the input id = id S’ => S => L = R => L = L => L = id => id = id requires shifting 48 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

LR(1) grammars In SLR: a reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N – Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of follows computed per item 49

LR(1) items LR(1) item is a pair – LR(0) item – Lookahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example – The production L  id yields the following LR(1) items 50 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L [L → ● id] [L → id ●] LR(0) items LR(1) items

 -closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b  FIRST(βc) – Add [B → ● δ, b] to S 51

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 52

Back to the conflict Is there a conflict now? 53 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2

LALR(1) LR(1) tables have huge number of entries Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice – Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 54

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 55

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, $) q10 R q13 id 56

Left/Right- recursion At home: create a simple grammar with left- recursion and one with right-recursion Construct corresponding LR(0) parser – Any conflicts? Run on simple input and observe behavior – Attempt to generalize observation for long inputs 57

Automated Parser Generation (via CUP)CUP 58

High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser AST TPL.cup TPL.lex sym.java Parser.java Lexer.java (Token.java) 59

Expression calculator expr  expr + expr | expr - expr | expr * expr | expr / expr | - expr | ( expr ) | number Goals of expression calculator parser: Is a valid expression? What is the meaning (value) of this expression? 60

Syntax analysis with CUP CUPjavac Parser spec.javaParser AST CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer tokens 61

CUP spec file Package and import specifications User code components Symbol (terminal and non-terminal) lists – Terminals go to sym.java – Types of AST nodes Precedence declarations The grammar – Semantic actions to construct AST 62

Expression Calculator – 1 st Attempt terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ; Symbol type explained later 63

Ambiguities a * b + c a bc + * a bc * + a + b + c a bc + + a bc

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Expression Calculator – 2 nd Attempt Increasing precedence Contextual precedence 65

Parsing ambiguous grammars using precedence declarations Each terminal assigned with precedence – By default all terminals have lowest precedence – User can assign his own precedence – CUP assigns each production a precedence Precedence of last terminal in production or user-specified contextual precedence On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left / right help resolve conflicts – left means reduce – right means shift More information on precedence declarations in CUP’s manualprecedence declarations 66

Resolving ambiguity a + b + c a bc + + a bc + + precedence left PLUS 67

Resolving ambiguity a * b + c a bc + * a bc * + precedence left PLUS precedence left MULT 68

Resolving ambiguity - a - b a b - - MINUS expr %prec UMINUS a - b - 69

Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Rule has precedence of UMINUS UMINUS never returned by scanner (used only to define precedence) 70

More CUP directives precedence nonassoc NEQ – Non-associative operators: == != etc. – 1<2<3 identified as an error (semantic error?) start non-terminal – Specifies start non-terminal other than first non-terminal – Can change to test parts of grammar Getting internal representation – Command line options: -dump_grammar -dump_states -dump_tables -dump 71

import java_cup.runtime.*; % %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Parser gets terminals from the scanner Scanner integration Generated from token declarations in.cup file 72

Recap Package and import specifications and user code components Symbol (terminal and non-terminal) lists – Define building-blocks of the grammar Precedence declarations – May help resolve conflicts The grammar – May introduce conflicts that have to be resolved 73

Assigning meaning So far, only validation Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; 74

Symbol labels used to name variables RESULT names the left-hand side symbol expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; Assigning meaning 75

Building an AST More useful representation of syntax tree – Less clutter – Actual level of detail depends on your design Basis for semantic analysis Later annotated with various information – Type information – Computed values 76

Parse tree vs. AST + expr 12+3 ()()

AST construction AST Nodes constructed during parsing – Stored in push-down stack Bottom-up parser – Grammar rules annotated with actions for AST construction – When node is constructed all children available (already constructed) – Node (RESULT) pushed on stack Top-down parser – More complicated 78

1 + (2) + (3) expr + (expr) + (3) + expr 12+3 expr + (3) expr ()() expr + (expr) expr expr + (2) + (3) int_const val = 1 plus e1e2 int_const val = 2 int_const val = 3 plus e1e2 expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} AST construction 79

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Designing an AST 80

See you next time