LR Parsing Compiler Baojian Hua

Slides:



Advertisements
Similar presentations
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Advertisements

Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
1 Bottom Up Parsing. 2 Bottom-Up Parsing l Bottom-up parsing is more general than top-down parsing »And just as efficient »Builds on ideas in top-down.
Parsing Discrete Mathematics and Its Applications Baojian Hua
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom Up Parsing.
Chapter 4-2 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR Other.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
LALR Parsing Canonical sets of LR(1) items
 an efficient Bottom-up parser for a large and useful class of context-free grammars.  the “ L ” stands for left-to-right scan of the input; the “ R.
Syntax and Semantics Structure of programming languages.
LEX and YACC work as a team
Lab 3: Using ML-Yacc Zhong Zhuang
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
 an efficient Bottom-up parser for a large and useful class of context-free grammars.  the “ L ” stands for left-to-right scan of the input; the “ R.
LANGUAGE TRANSLATORS: WEEK 17 scom.hud.ac.uk/scomtlm/cis2380/ See Appel’s book chapter 3 for support reading Last Week: Top-down, Table driven parsers.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Syntax and Semantics Structure of programming languages.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-Down Parsing.
4. Bottom-up Parsing Chih-Hung Wang
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
Programming Languages Translator
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Chapter 4 Syntax Analysis.
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
4 (c) parsing.
Parsing #2 Leonidas Fegaras.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
5. Bottom-Up Parsing Chih-Hung Wang
Syntax Analysis - 3 Chapter 4.
Kanat Bolazar February 16, 2010
Chap. 3 BOTTOM-UP PARSING
Presentation transcript:

LR Parsing Compiler Baojian Hua

Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer

Parsing The parser translates the token sequence into abstract syntax trees Token sequence: returned from the lexer abstract syntax trees: compiler internal data structures Must take account of the program syntax

Conceptually token sequence abstract syntax tree parser language syntax

Predicative Parsing Grammars encode enough information on how to choose production rules, when input terminals are seen LL(1) pros: simple, easy to implement efficient Cons: grammar rewriting ugly

Today ’ s Topic Bottom-up Parsing a.k.a. shift-reduce parsing, LR parsing This is the predominant algorithm used by automatic YACC-like parser generators YACC, bison, CUP, C#yacc, etc.

Bottom-up Parsing 1S:= exp 2exp:= exp + term 3 exp:= term 4term:= term * factor 5term:= factor 6factor:= ID 7factor:= INT * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S A reverse of right- most derivation!

Dot notation As a convenient notation, we will mark how much of the input we have consumed by using a symbol exp + 3 * 4 consumedremaining input

Bottom-up Parsing * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S

Another View * * 4 3 * 4 * factor term exp exp + exp + 3 exp + factor exp + term exp + term * exp + term * 4 exp + term * factor exp + term exp S S := exp exp := exp + term exp := term term := term * factor term := factor factor := ID factor := INT What ’ s the data structure of the left?

Producing a rightmost derivation in reverse We do two things: shift a token (terminal) onto the stack, or reduce the top n symbols on the stack by a production When we reduce by a production A ::=   is on the top of the stack, pop  and push A Key problem: when to shift or reduce?

Yet Another View * * 4 2 factor term exp E T F 2

Yet Another View E T F * * 4 3 * 4 * 4 2 factor term exp exp + exp + 3 exp + factor exp + term + 3 F S E T 4 F* T

A shift-reduce parser Two components: Stack: holds the viable prefixes Input stream: holds remaining source Four actions: shift: push token from input stream onto stack reduce: right-end (  of A :=  ) is at top of stack, pop , push A accept: success error: syntax error discovered

Table-driven LR(k) parsers Parser Loop Lexer tokens Stack Action table & GOTO table AST Parser Generator Grammar

An LR parser Put S on stack in state s 0 Parser configuration is: (S, s 0, X 1, s 1, X 2, s 2, … X m, s m ; a i a i+1 … a n $) do forever: read a i. if (action[a i, s m ] is shift s then (S, s 0, X 1, s 1, X 2, s 2, … X m, s m, a i, s; a i+1 … a n $) if (action[a i, s m ] is reduce A:=  then (S, s 0, X 1, s 1, X 2, s 2, … X m-|  |, s m-|  |, A, s; a i a i+1 … a n $) where s = goto[s m-|  |, A] if (action[a i, s m ] is accept, DONE if (action[a i, s m ] is error, handle error

Generating LR parsers In order to generate an LR parser, we must create the action and GOTO tables Many different ways to do this We will start here by the simplest approach, called LR(0) Left-to-right parsing, Rightmost derivation, 0 lookahead

Item LR(0) items have the form: [production-with-dot] For example, X -> A B C has 4 forms of items [X := A B C ]

What items mean? [X :=    ] input is consistent with X :=    [X :=    ] input is consistent with X :=    and we have already recognized  [X :=    ] input is consistent with X :=    and we have already recognized   [X :=    ] input is consistent with X :=    and we can reduce to X

LR(0) Items 0: S’ -> S$ 1: S -> x S 2: S -> y S’ -> S $ S -> x S S -> y 1 S’ -> S $ 4 S S -> x S S -> y 2 x 3 y y x actionGOTO state\symbolxy$S 1s2s3g4 2s2s3g5 3r2 4accept 5r1 S -> x S 5 S

LR(0) Items 0: S’ -> S$ 1: S -> x S 2: S -> y S’ -> S $ S -> x S S -> y 1 S’ -> S $ 4 S S -> x S S -> y 2 x 3 y y x actionGOTO state\symbolxy$S 1s2s3g4 2s2s3g5 3r2 4accept 5r1 S -> x S 5 S x x y 1 1, 2 1, 2, 2 1, 2, 2, 3 1, 2, 2, 5 1, 2, 5 1, 4 accept x x y $ x y $ y $ $

Another Example 0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S S’ -> S $ S -> (L) S -> x 1 S’ -> S $ 4 S S -> x 2 x S -> ( L) L -> S L -> L, S S -> (L) S -> x 3 ( L -> S 7 S ( x S -> (L ) L -> L, S S -> (L) L -> L, S S -> (L) S -> x L -> L, S S, ) x ( L

Another Example: LR(0) table actiongoto s\t()x,$SL 1s3s2g4 2r2 3s3 g7g5 4accept 5s6s8 6r1 7r3 8s3s2g9 9r4

LR(0) table construction Construct LR(0) Items Item I i becomes state i Parsing actions at state i are: [ A :=  a  ]  I i and goto(I i, a) = I j then action[i, a] = “ shift j ” [ A :=  ]  I i and A  S ’ then action[i, a] = “ reduce by A := ” [ S ’ := S ]  I i then action[i, $] = “ accept ”

LR(0) table construction, cont ’ d GOTO table for non-terminals: GOTO[i, A] = j if GOTO(I i, A) = I j Empty entries are “ error ” Table-driven LR-parsing algorithm: figure 4.36 on text

Problems with LR(0) For every item of the form: X ->  blindly reduce  to X, followed with a “ goto ” which may not miss any error, but may postpone the detection of some errors try “ x x y x ” on our first example Another problem with LR(0) is that some grammar may have conflicts

For the 1st kind of problem 0: S’ -> S$ 1: S -> x S 2: S -> y S’ -> S $ S -> x S S -> y 1 S’ -> S $ 4 S S -> x S S -> y 2 x 3 y y x actionGOTO state\symbolxy$S 1s2s3g4 2s2s3g5 3r2 4accept 5r1 S -> x S 5 S

For the 2nd kind of problem 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x S -> E $ E -> T + E E -> T T -> x 1 5 x S -> E $ 2 E E -> T +E E -> T 3 T E -> T+ E E -> T T -> x 4 + E -> T+E 6 E T x A shift-reduce conflict (on state 3)!

LR(0) Parse Table actiongoto s\tx+$ET 1s5g2g3 2accept 3r2s4, r2r2 4s5g6g3 5r3 6r1 Similar reason for this problem: the “ reduce ” action should NOT be filled into (3, +).

SLR table construction Construct LR(0) Items Item I i becomes state i Parsing actions at state i are: [ A :=  a  ]  I i and goto(I i, a) = I j then action[i,a] = “ shift j ” [ A :=  ]  I i and A  S ’ then action[i,a] = “ reduce by A := ” only for all a  FOLLOW(A) [ S ’ := S ]  I i then action[i,$] = “ accept ” GOTO table as before

Follow set 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x S -> E $ E -> T + E E -> T T -> x 1 5 x S -> E $ 2 E E -> T +E E -> T 3 T E -> T+ E E -> T T -> x 4 + E -> T+E 6 E T x Follow (E) = {$} Follow (T) = {+, $}

LR(0) Table with Follow actiongoto s\tx+$ET 1s5g2g3 2accept 3r2s4, r2r2 4s5g6g3 5r3 6r1

Problems with SLR For every item of the form: X ->  only reduce when the next token t\in Follow(X) sometimes, this resolves conflicts such as shift-reduce presented above However, there exist conflicts that can NOT be resolved by SLR

Problems with SLR S’ := S$ S :=L = R |R L := * R | id R :=L S’ := S $ S := L = R S := R L := *R L := id R := L 0 S’ := S $ 1 S := L = R R := L 2 S := R 3 L := * R R := L L := *R L := id 4 5 S := L = R R := L L := *R L := id 6 L := * R 7 R := L 8 S := L = R 9 R L id * * R L L = S

Problems with SLR Reduce on ALL terminals in FOLLOW set FOLLOW(R) = FOLLOW(L) Especially, we have ‘ = ‘ \in FOLLOW(R) Thus, there exists shift-reduce conflict in state 2 Why this happen and how to solve this? S :=L = R |R L := * R | id R :=L S := L = R R := L 2

LR(1) Items [X :=  , a ] means  is at top of stack Input string is derivable from  a In other words, when we reduce X := , a had better be the look ahead symbol. Or, put ‘ reduce by X := ’ in action[ s, a ] only

LR(1) table construction Construct LR(1) Items Item I i becomes state i Parsing actions at state i are: [ A :=  a ,b]  I i and goto(I i, a) = I j then action[i, a] = “ shift j ” [ A := ,b]  I i and A  S ’ then action[i, a] = “ reduce by A := ” for b [ S ’ := S,$]  I i then action[i, $] = “ accept ” GOTO table as before Initial state is from Item containing [S ’ := S,$]

LR(1) Items (part) S’ := S$ S :=L = R |R L := * R | id R :=L S’ := S,$ S := L = R,$ S := R,$ L := *R,=/$ L := id,=/$ R := L,$ 0 S’ := S,$ 1 S := L = R,$ R := L,$ 2 L

S :=L = R |R L := * R | id R :=L S’ := S,$ S := L = R,$ S := R,$ L := *R,=/$ L := id,=/$ R := L,$ 0 S’ := S,$ 1 S := L = R,$ R := L,$ 2 S := R,$ 3 L := * R,=/$ R := L,=/$ L := *R,=/$ L := id,=/$ 4 5 S := L = R,$ R := L,$ L := *R,$ L := id,$ 6 L := *R,=/$ 7 R := L,=/$ 8 S := L = R,$ 9 R L id * * R L More R := L,$ 10 L := id,$ 11 others

S :=L = R |R L := * R | id R :=L S’ := S,$ S := L = R,$ S := R,$ L := *R,=/$ L := id,=/$ R := L,$ 0 S’ := S,$ 1 S := L = R,$ R := L,$ 2 S := R,$ 3 L := * R,=/$ R := L,=/$ L := *R,=/$ L := id,=/$ 4 5 S := L = R,$ R := L,$ L := *R,$ L := id,$ 6 L := *R,=/$ 7 R := L,=/$ 8 S := L = R,$ 9 R L id * * R L Notice similar states? R := L,$ 10 L := id,$ 11 others

S :=L = R |R L := * R | id R :=L S’ := S,$ S := L = R,$ S := R,$ L := *R,=/$ L := id,=/$ R := L,$ 0 S’ := S,$ 1 S := L = R,$ R := L,$ 2 S := R,$ 3 L := * R,=/$ R := L,=/$ L := *R,=/$ L := id,=/$ 4 5 S := L = R,$ R := L,$ L := *R,$ L := id,$ 6 L := *R,=/$ 7 R := L,=/$ 8 S := L = R,$ 9 R L id * * R L Notice similar states? R := L,$ 10 L := id,$ 11 others

LALR Construction Merge items with common cores Change GOTO table to reflect merges Can introduce reduce/reduce conflicts Cannot introduce shift/reduce conflicts

S :=L = R |R L := * R | id R :=L S’ := S,$ S := L = R,$ S := R,$ L := *R,=/$ L := id,=/$ R := L,$ 0 S’ := S,$ 1 S := L = R,$ R := L,$ 2 S := R,$ 3 L := * R,=/$ R := L,=/$ L := *R,=/$ L := id,=/$ 4 5 S := L = R,$ R := L,$ L := *R,$ L := id,$ 6 L := *R,=/$ 7 R := L,=/$ 8 S := L = R,$ 9 R L id * * R L LALR R := L,$ 10 L := id,$ 11 others

Ambiguous Grammars No ambiguous grammars can be LR(k) hence can not be parsed bottom-up Nevertheless, some of the ambiguous grammar are well-understood, and can be parsed by LR(k) with some tricks precedence associativity dangling-else

Precedence E :=E*E |E+E | id S’ := E $ E := E * E E := E + E E := id S’ := E $ E := E * E E := E + E E := E * E E := E + E E := id E := E + E E := E * E E := E + E E := id E := E * E E := E + E E := E * E E := E + E s/r on both * and + E

Precedence E :=E*E |E+E | id S’ := E $ E := E * E E := E + E E := id S’ := E $ E := E * E E := E + E E := E * E E := E + E E := id E := E + E E := E * E E := E + E E := id E := E * E E := E + E E := E * E E := E + E reduce on + shift on * E reduce on + reduce on * What if we want both + and * right-associative?

Parser Implementation Implementation Options: Write a parser by hand, from scratch not as boring as writing a lexer recall the dragon compiler Use an automatic parser generator Very general & robust. sometimes not quite as efficient as hand-written parsers. good for rapid prototyping. Both are used extensively in production compilers

Yacc Tool semantic analyzer specification parser Yacc Creates a parser from a declarative specification involving a context-free grammar

Brief History YACC stands for Yet Another Compiler- Compiler It was first developed by Steve Johnson in 1975 for Unix There have been many later versions of YACC (e.g., GNU Bison), each offering minor improvements Ported to many languages YACC is now a standard tool, defined in IEEE Posix standard P1003.2

Yacc User code and Yacc decleartions: declare values available in the rule actions % Grammar rules: parser specified by CFG rules and associated semantic action that generate abstract syntax % User code: other code

ML-Yacc Definitions (preliminaries) Specify type of positions %pos int * int Specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS... %nonterm prog | exp | stm Specify end-of-parse token %eop EOF Specify start symbol (by default, non terminal in LHS of first rule) %start prog

Example % %term ASSIGN | ID | PLUS |NUM | SEMICOLON | TIMES %nonterm s | e %pos int %start p %eop EOF %left PLUS %left TIMES % p -> s SEMICOLON p () -> () s -> ID ASSIGN e () e -> e PLUS e () | e TIMES e () | ID () | NUM ()

Summary Bottom-up parsing reverse order of derivations LR grammars are more powerful use of stacks and parse tables yet more complex Bonus: tools do the hard work for you, read the online Yacc manual