Lecture 04 – Syntax analysis: top-down and bottom-up parsing

Slides:



Advertisements
Similar presentations
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Advertisements

Compiler Designs and Constructions
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Lecture 05 – Syntax analysis & Semantic Analysis Eran Yahav 1.
Mooly Sagiv and Roman Manevich School of Computer Science
Lecture 03 – Syntax analysis: top-down parsing Eran Yahav 1.
Theory of Compilation Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
LR Parsing Compiler Baojian Hua
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Compilation Lecture 4 Syntax Analysis Noam Rinetzky 1 Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntax and Semantics Structure of programming languages.
Compiler Principles Fall Compiler Principles Lecture 3: Parsing part 2 Roman Manevich Ben-Gurion University.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Compilation /15a Lecture 5 1 * Syntax Analysis: Bottom-Up parsing Noam Rinetzky.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
4. Bottom-up Parsing Chih-Hung Wang
Compilation Lecture 3: Syntax Analysis Top Down parsing Bottom Up parsing Noam Rinetzky 1.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Lecture 02b – Syntax analysis: top-down parsing Eran Yahav 1.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Compilation (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax and Semantics Structure of programming languages.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Programming Languages Translator
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Table-driven parsing Parsing performed by a finite state machine.
Fall Compiler Principles Lecture 4: Parsing part 3
Lecture 3: Syntax Analysis: Top-Down parsing Noam Rinetzky
Bottom-Up Syntax Analysis
Fall Compiler Principles Lecture 4: Parsing part 3
Subject Name:COMPILER DESIGN Subject Code:10CS63
Top-Down Parsing CS 671 January 29, 2008.
Parsing #2 Leonidas Fegaras.
Lecture 4: Syntax Analysis: Botom-Up Parsing Noam Rinetzky
Compiler Design 7. Top-Down Table-Driven Parsing
Parsing #2 Leonidas Fegaras.
Fall Compiler Principles Lecture 4: Parsing part 3
Kanat Bolazar February 16, 2010
Chap. 3 BOTTOM-UP PARSING
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Lecture 04 – Syntax analysis: top-down and bottom-up parsing Theory of Compilation Eran Yahav

You are here Source text Executable code Compiler txt exe Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR) Code Gen. Executable code exe

Last week: from tokens to AST <ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”> expression Syntax Tree expression MINUS term expression MULT factor term term ID term MULT factor term MULT factor ‘c’ factor ID factor ID ID ‘b’ ID ‘a’ ‘b’ ‘4’ Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen.

Last week: context free grammars G = (V,T,P,S) V – non terminals T – terminals (tokens) P – derivation rules Each rule of the form V  (T  V)* S – initial symbol Example S  S;S S  id := E E  id | E + E | E * E | ( E )

Last week: parsing A context free language can be recognized by a non- deterministic pushdown automaton Parsing can be seen as a search problem Can you find a derivation from the start symbol to the input word? Easy (but very expensive) to solve with backtracking We want efficient parsers Linear in input size Deterministic pushdown automata We will sacrifice generality for efficiency

Recursively enumerable Chomsky Hierarchy Turing machine Recursively enumerable Linear-bounded non-deterministic Turing machine Context sensitive Context free Regular Non-deterministic pushdown automaton Finite-state automaton

Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) LL(1) SLR(1) LR(0)

LL(k) Parsers Manually constructed Generated Recursive Descent Uses a pushdown automaton Does not use recursion

LL(k) parsing with pushdown automata Pushdown automaton uses Prediction stack Input stream Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t

LL(k) parsing with pushdown automata input nonterminals Input tokens output

LL(k) parsing with pushdown automata Two possible moves Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error Parsing terminates when prediction stack is empty. If input is empty at that point, success. Otherwise, syntax error

Example transition table (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Which rule should be used Input tokens ( ) not true false and or xor $ E 2 3 1 LIT 4 5 OP 6 7 8 Nonterminals

Simple Example aacbb$ A  aAb | c Input suffix Stack content Move predict(A,a) = A  aAb aAb$ match(a,a) acbb$ Ab$ aAbb$ cbb$ Abb$ predict(A,c) = A  c match(c,c) bb$ match(b,b) b$ $ match($,$) – success a b c A A  aAb A  c Stack top on the left

Simple Example abcbb$ A  aAb | c Input suffix Stack content Move predict(A,a) = A  aAb aAb$ match(a,a) bcbb$ Ab$ predict(A,b) = ERROR a b c A A  aAb A  c

Error Handling and Recovery x = a * (p+q * ( -b * (r-s); Where should we report the error? The valid prefix property Recovery is tricky Heuristics for dropping tokens, skipping to semicolon, etc.

Error Handling in LL Parsers c$ S  a c | b S Input suffix Stack content Move c$ S$ predict(S,c) = ERROR Now what? Predict bS anyway “missing token b inserted in line XXX” a b c S S  a c S  bS

Error Handling in LL Parsers c$ S  a c | b S Input suffix Stack content Move bc$ S$ predict(b,c) = S  bS bS$ match(b,b) c$ Looks familiar? Result: infinite loop a b c S S  a c S  bS

Error Handling Requires more systematic treatment Enrichment Acceptable-set method Not part of course material

Summary so far Parsing Top-down parsing LL(K) parsers Top-down or bottom-up Top-down parsing Recursive descent LL(k) grammars LL(k) parsing with pushdown automata LL(K) parsers Cannot deal with left recursion Left-recursion removal might result with complicated grammar

Bottom-up Parsing LR(K) SLR LALR All follow the same pushdown-based algorithm Differ on type of “LR Items”

N  αβ LR Item Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

LR Items N  αβ Shift Item N  αβ Reduce Item

Example Z  expr EOF expr  term | expr + term term  ID | ( expr ) E  T | E + T T  i | ( E ) (just shorthand of the grammar on the top)

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) i + i $ Z  E $ Why do we need these additional LR items? Where do they come from? What do they mean? E  T E  E + T T  i T  ( E )

-closure Given a set S of LR(0) items If P  αNβ is in S then for each rule N  in the grammar S must also contain N   Z  E $ E  T E  E + T T  i T  ( E ) { Z  E $, E  T, -closure({Z  E $}) = E  E + T, T  i , T  ( E ) }

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) i + i $ Z  E $ T  i Reduce item! E  T E  E + T T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) T + i $ i Z  E $ Reduce item! E  T E  T E  E + T T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E + i $ T i Z  E $ Reduce item! E  T E  T E  E + T T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E + i $ T i Z  E $ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E + i $ T i Z  E $ Z  E$ E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E + T $ i T i Z  E $ Z  E$ E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E + T $ T i i Reduce item! Z  E $ Z  E$ E  E+T E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E $ E + T T i i Z  E $ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) E $ E + T T i Reduce item! i Z  E $ Z  E$ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

Example: Parsing with LR Items Z  E $ E  T | E + T T  i | ( E ) Z E $ E + T Reduce item! T i Z  E $ i Z  E$ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

Computing Item Sets Initial set Z is in the start symbol -closure({ Zα | Zα is in the grammar } ) Next set from a set S and the next symbol X step(S,X) = { NαXβ | NαXβ in the item set S} nextSet(S,X) = -closure(step(S,X))

LR(0) Automaton Example q6 E  T T q0 T q7 Z  E$ E  T E  E + T T  i T  (E) T  (E) E  T E  E + T T  i T  (E) ( q5 i T  i i E ( E ( i q1 q8 Z  E$ E  E+ T q3 E  E+T T  i T  (E) T  (E) E  E+T + + $ ) q2 q9 Z  E$ T  (E)  T q4 E  E + T

GOTO/ACTION Tables ACTION Table GOTO Table State i + ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 ZE$ q4 Shift EE+T Ti ET q8 q9 TE

LR Pushdown Automaton Two moves: shift and reduce Shift move shift Remove first token from input Push it on the stack Compute next state based on GOTO table Push new state on the stack If new state is error – report error shift input i + i $ input + i $ stack q0 stack q0 i q5 State i + ( ) $ E T action q0 q5 q7 q1 q6 shift

LR Pushdown Automaton Reduce move Reduce T  i input + i $ input + i $ Using a rule N α Symbols in α and their following states are removed from stack New state computed based on GOTO table (using top of stack, before pushing N) N is pushed on the stack New state pushed on top of N Reduce T  i input + i $ input + i $ stack q0 i q5 stack q0 T q6 State i + ( ) $ E T action q0 q5 q7 q1 q6 shift

GOTO/ACTION Table State i + ( ) $ E T q0 s5 s7 s1 s6 q1 s3 s2 q2 r1 q3 Z  E $ E  T E  E + T T  i T ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m

GOTO/ACTION Table Stack Input Action q0 i + i $ s5 q0 i q5 + i $ r4 top is on the right st i + ( ) $ E T q0 s5 s7 s1 s6 q1 s3 s2 q2 r1 q3 s4 q4 r3 q5 r4 q6 r2 q7 s8 q8 s9 q9 r5 Stack Input Action q0 i + i $ s5 q0 i q5 + i $ r4 q0 T q6 r2 q0 E q1 s3 q0 E q1 + q3 i $ q0 E q1 + q3 i q5 $ q0 E q1 + q3 T q4 r3 s2 q0 E q1 $ q2 r1 q0 Z Z  E $ E  T E  E + T T  i T ( E )

Are we done? Can make a transition diagram for any grammar Can make a GOTO table for every grammar Cannot make a deterministic ACTION table for every grammar

LR(0) Conflicts … T q0 Z  E$ E  T E  E + T T  i T  (E) ( … T  i[E] ( … q5 i T  i T  i[E] E Shift/reduce conflict … Z  E $ E  T E  E + T T  i T  ( E ) T  i[E]

LR(0) Conflicts … T q0 Z  E$ E  T E  E + T T  i T  (E) ( … T  i[E] ( … q5 i T  i V  i E reduce/reduce conflict … Z  E $ E  T E  E + T T  i V  i T  ( E )

LR(0) Conflicts Any grammar with an -rule cannot be LR(0) Inherent shift/reduce conflict A  - reduce item P αAβ – shift item A  can always be predicted from P αAβ

Back to the GOTO/ACTIONS tables GOTO Table State i + ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 ZE$ q4 Shift EE+T Ti ET q8 q9 TE ACTION table determined only by transition diagram, ignores input

SRL Grammars A handle should not be reduced to a non- terminal N if the look-ahead is a token that cannot follow N A reduce item N  α is applicable only when the look-ahead is in FOLLOW(N) Differs from LR(0) only on the ACTION table

Look-ahead token from the input SLR ACTION Table State i + ( ) $ q0 shift q1 q2 ZE$ q3 q4 EE+T q5 Ti q6 ET q7 q8 q9 T(E) Look-ahead token from the input Remember: In contrast, GOTO table is indexed by state and a grammar symbol from the stack Z  E $ E  T E  E + T T  i T ( E )

SLR ACTION Table State i + ( ) [ ] $ q0 shift q1 q2 ZE$ q3 q4 EE+T vs. SLR – use 1 token look-ahead LR(0) – no look-ahead … as before… T  i T  i[E]

Are we done? (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id

L = R id * L → id  id id * L L R q3 R S → R  q0 S S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S q1 S’ → S q9 S → L = R  q2 L S → L  = R R → L  = R q6 S → L =  R R →  L L →  * R L →  id q5 id * L → id  * id id q4 L → *  R R →  L L →  * R L →  id * L q8 L R → L  q7 L → * R  R

Shift/reduce conflict (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L q2 S → L  = R R → L  = q6 S → L =  R R →  L L →  * R L →  id S → L  = R vs. R → L  FOLLOW(R) contains = S ⇒ L = R ⇒ * R = R SLR cannot resolve the conflict either

LR(1) Grammars In SLR: a reduce item N  α is applicable only when the look-ahead is in FOLLOW(N) But FOLLOW(N) merges look-ahead for all alternatives for N LR(1) keeps look-ahead with each LR item Idea: a more refined notion of follows computed per item

LR(1) Item LR(1) item is a pair Meaning Example LR(0) item Look-ahead token Meaning We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token. Example The production L  id yields the following LR(1) items [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

-closure for LR(1) For every [A → α ● Bβ , c] in S for every production B→δ and every token b in the grammar such that b  FIRST(βc) Add [B → ● δ , b] to S

q3 q0 (S → R ∙ , $) (S’ → ∙ S , $) (S → ∙ L = R , $) (S → ∙ R , $) (L → ∙ * R , = ) (L → ∙ id , = ) (R → ∙ L , $ ) (L → ∙ id , $ ) (L → ∙ * R , $ ) q9 q11 R (S → L = R ∙ , $) (L → id ∙ , $) q1 S (S’ → S ∙ , $) R id id L q2 q12 (S → L ∙ = R , $) (R → L ∙ , $) (R → L ∙ , $) = id * * q6 (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) L q10 q4 id q5 (L → * ∙ R , =) (R → ∙ L , =) (L → ∙ * R , =) (L → ∙ id , =) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) (L → id ∙ , $) (L → id ∙ , =) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) * L q8 R q7 (R → L ∙ , =) (R → L ∙ , $) R q13 (L → * R ∙ , =) (L → * R ∙ , $) (L → * R ∙ , $)

Back to the conflict Is there a conflict now? q2 (S → L ∙ = R , $) (R → L ∙ , $) = q6 (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) Is there a conflict now?

LALR LR tables have large number of entries Often don’t need such refined observation (and cost) LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts LALR not as powerful as LR(1)

Summary Bottom up LR Items LR parsing with pushdown automata LR(0), SLR, LR(1) – different kinds of LR items, same basic algorithm

Next time Semantic analysis

State i + ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 ZE$ q4 Shift EE+T Ti ET q8 q9 TE