Compiler Baojian Hua bjhua@ustc.edu.cn LR Parsing Compiler Baojian Hua bjhua@ustc.edu.cn.

Slides:



Advertisements
Similar presentations
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Advertisements

Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Review: LR(k) parsers a1 … a2 … an $ LR parsing program Action goto Sm xm … s1 x1 s0 output input stack Parsing table.
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom Up Parsing.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
LALR Parsing Canonical sets of LR(1) items
LEX and YACC work as a team
LR Parsing Compiler Baojian Hua
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Announcements/Reading
Lec04-bottomupparser 4/13/2018 LR Parsing.
Programming Languages Translator
Bottom-Up Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Chapter 4 Syntax Analysis.
CS 404 Introduction to Compiler Design
Compiler design Bottom-up parsing: Canonical LR and LALR
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
Simple, efficient;limitated
Syntax Analysis Part II
4 (c) parsing.
Top-Down Parsing.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Bison Marcin Zubrowski.
4d Bottom Up Parsing.
Parsing #2 Leonidas Fegaras.
Lecture (From slides by G. Necula & R. Bodik)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Bottom Up Parsing.
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
Compiler Construction
Parsing #2 Leonidas Fegaras.
4d Bottom Up Parsing.
Fall Compiler Principles Lecture 4: Parsing part 3
5. Bottom-Up Parsing Chih-Hung Wang
Syntax Analysis - 3 Chapter 4.
Kanat Bolazar February 16, 2010
Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
Compiler Construction
Parsing Bottom-Up LR Table Construction.
4d Bottom Up Parsing.
Parsing Bottom-Up LR Table Construction.
Chap. 3 BOTTOM-UP PARSING
4d Bottom Up Parsing.
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

Compiler Baojian Hua bjhua@ustc.edu.cn LR Parsing Compiler Baojian Hua bjhua@ustc.edu.cn

Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

Parsing The parser translates the source program into abstract syntax trees Token sequence: returned from the lexer abstract syntax tree: check validity of programs form compiler internal data structures for programs Must take account the program syntax

Conceptually parser language syntax abstract token sequence syntax tree language syntax

Predicative Parsing Grammars encode enough information on how to choose production rules, when input terminals are seen LL(1) pros: simple, easy to implement efficient Cons: grammar rewriting ugly

Today’s Topic Bottom-up Parsing shift-reduce parsing, LR parsing This is the predominant algorithm used by automatic YACC-like parser generators YACC, bison, CUP, etc.

Bottom-up Parsing A reverse of right-most derivation! 1 S := exp 2 exp := exp + term 3 exp := term 4 term := term * factor 5 term := factor 6 factor := ID 7 factor := INT 2 + 3 * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S A reverse of right-most derivation!

Dot notation As a convenient notation, we will mark how much of the input we have consumed by using a • symbol exp + 3  * 4 consumed remaining input

Bottom-up Parsing 2  + 3 * 4 2 + 3 * 4 factor  + 3 * 4 term  + 3 * 4 exp + 3  * 4 exp + factor  * 4 exp + term * 4  exp + term * factor  exp + term  exp  S  2 + 3 * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S

Another View What’s the data structure of the left? 2 factor term exp exp + factor exp + term exp + term * exp + term * 4 exp + term * factor S 2 + 3 * 4  + 3 * 4  3 * 4  * 4  4  S := exp exp := exp + term exp := term term := term * factor term := factor factor := ID factor := INT What’s the data structure of the left?

Producing a rightmost derivation in reverse We do two things: shift a token (terminal) onto the stack, or reduce the top n symbols on the stack by a production When we reduce by a production A ::=   is on the top of the stack, pop  and push A Key problem: when to shift or reduce?

Yet Another View 2 factor term exp 2 + 3 * 4  + 3 * 4 E T F 2

Yet Another View S E E + T T T * F F 4 F 3 2 2 factor term exp exp + exp + factor exp + term 2 + 3 * 4  + 3 * 4  3 * 4  * 4 S E E + T T T * F F 4 F 3 2

A shift-reduce parser Two components: Four actions: Stack: holds the viable prefixes Input stream: holds remaining source Four actions: shift: push token from input stream onto stack reduce: right-end ( of A := ) is at top of stack, pop , push A accept: success error: syntax error discovered

Table-driven LR(k) parsers tokens AST Lexer Parser Loop Stack Action table & GOTO table Grammar Parser Generator

An LR parser Put S on stack in state s0 Parser configuration is: (S, s0, X1, s1, X2, s2, … Xm, sm; ai ai+1 … an $) do forever: read ai. if (action[ai, sm] is shift s then (S, s0, X1, s1, X2, s2, … Xm, sm, ai, s; ai+1 … an $) if (action[ai, sm] is reduce A:=  then (S, s0, X1, s1, X2, s2, … Xm-| |, sm-| |, A, s; ai ai+1 … an $) where s = goto[sm-| |, A] if (action[ai, sm] is accept, DONE if (action[ai, sm] is error, handle error

Generating LR parsers In order to generate an LR parser, we must create the action and GOTO tables Many different ways to do this We will start here with the simplest approach, called LR(0) Left-to-right parsing, Rightmost derivation, 0 lookahead

Item LR(0) items have the form: [production-with-dot] For example, X -> A B C has 4 forms of items [X :=  A B C ] [X := A  B C ] [X := A B  C ] [X := A B C  ]

What items mean? [X :=     ] [X :=     ] [X :=     ] input is consistent with X :=    [X :=     ] input is consistent with X :=    and we have already recognized  [X :=     ] input is consistent with X :=    and we have already recognized   [X :=     ] input is consistent with X :=    and we can reduce to X

LR(0) Items 8 2 x x S’ ->  S $ S ->  S x S ->  y 1 L -> L,  S S ->  (L) S ->  x 0: S’ -> S$ 1: S -> x S 2: S -> y x 3 ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x ( 9 S S S’ -> S  $ 4 , L -> L, S  ( S -> (L  ) L -> L , S 5 L S ) 7 L -> S  S -> (L)  6

LR(0) Items 8 2 x x S’ ->  S $ S ->  (L) S ->  x 1 L -> L,  S S ->  (L) S ->  x 0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x ( 9 S S S’ -> S  $ 4 , L -> L, S  ( S -> (L  ) L -> L , S 5 L S ) 7 L -> S  S -> (L)  6

LR(0) table construction Construct LR(0) Items Item Ii becomes state i Parsing actions at state i are: [ A :=   a  ]  Ii and goto(Ii, a) = Ij then action[i, a] = “shift j” [ A :=   ]  Ii and A  S’ then action[i, a] =“reduce by A := ” [ S’ := S  ]  Ii then action[i, $] =“accept”

LR(0) table construction, cont’d GOTO table for non-terminals: GOTO[i,A] = j if GOTO(Ii, A) = Ij Empty entries are “error”

LR(0) Table action goto s\t ( ) x , $ S L 1 s3 s2 g4 2 r2 3 g7 g5 4 accept 5 s6 s8 6 r1 7 r3 8 g9 9 r4

Problems with LR(0) For every item of the form: X ->   blindly reduce to X, followed with a “goto” which may not miss any error, but may postpone the detection of some errors

Problems with LR(0) 8 2 x x S’ ->  S $ S ->  (L) S ->  x 1 L -> L,  S S ->  (L) S ->  x 0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x ( 9 S S S’ -> S  $ 4 , L -> L, S  ( S -> (L  ) L -> L , S 5 L S ) 7 Consider this input: x 5 $ L -> S  S -> (L)  6

Problems with LR(0) action goto s\t ( ) x , $ S L 1 s3 s2 g4 2 r2 3 g7 accept 5 s6 s8 6 r1 7 r3 8 g9 9 r4

Another Example 2 E S ->  E $ E ->  T + E E ->  T T ->  x 1 S -> E  $ 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x 3 T E -> T  +E E -> T  x + T 4 x T -> x  5 E -> T+  E E ->  T+E E ->  T T ->  x 6 E E -> T+E  A shift-reduce conflict!

LR(0) Parse Table action goto s\t x + $ E T 1 s5 g2 g3 2 accept 3 r2 s4, r2 4 g6 5 r3 6 r1

SLR table construction Construct LR(0) Items Item Ii becomes state i Parsing actions at state i are: [ A :=   a  ]  Ii and goto(Ii, a) = Ij then action[i,a] = “shift j” [ A :=   ]  Ii and A  S’ then action[i,a] =“reduce by A := ” for all a  FOLLOW(A) [ S’ := S  ]  Ii then action[i,$] =“accept” GOTO table for non-terminals: GOTO[i,A] = j if GOTO(Ii, A) = Ij Empty entries are “error”

Reduce LR(0) Table 8 2 x x S’ ->  S $ S ->  (L) S ->  x 1 L -> L,  S S ->  (L) S ->  x 0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x ( 9 S S S’ -> S  $ 4 , L -> L, S  ( S -> (L  ) L -> L , S 5 L S ) 7 Follow set: S’ {$} S {$, ,, )} L {,, )} L -> S  S -> (L)  6

Reduce LR(0) Table action goto s\t ( ) x , $ S L 1 s3 s2 g4 2 r2 3 g7 accept 5 s6 s8 6 r1 7 r3 8 g9 9 r4

Resolve Shift-reduce Conflict 2 E S ->  E $ E ->  T + E E ->  T T ->  x 1 S -> E  $ 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x 3 T E -> T  +E E -> T  x + T 4 x T -> x  5 E -> T+  E E ->  T+E E ->  T T ->  x 6 E E -> T+E  Follow set: S {$} E {$} T {+, $}

Resolve Shift-reduce Conflict action goto s\t x + $ E T 1 s5 g2 g3 2 accept 3 r2 s4, r2 4 g6 5 r3 6 r1

Problems with SLR * R S’ :=  S $ S :=  L = R S :=  R L :=  *R L :=  id R :=  L L := *  R R :=  L L :=  *R L :=  id 4 S’ := S$ S := L = R | R L := * R | id R := L * L := * R  7 id L L L := id  5 R := L  8 S’ := S  $ 1 S := L =  R R :=  L L :=  *R L :=  id 6 L S := L  = R R := L  2 = R S := L = R  9 S := R  3

Problems with SLR Reduce on ALL terminals in FOLLOW set S := L = R | R FOLLOW(R) = FOLLOW(L) But, we should never reduce R := L on ‘=‘ Thus, there should be no reduction in state 2 Why this happen and how can we solve this? S := L = R | R L := * R | id R := L S := L  = R R := L  2

LR(1) Items [X :=   , a] Means  is at top of stack Input string is derivable from a In other words, when we reduce X := , a had better be the look ahead symbol. Or, put ‘reduce by X := ’ in action[s, a] only

LR(1) table construction Construct LR(1) Items Item Ii becomes state i Parsing actions at state i are: [ A :=   a  ,b]  Ii and goto(Ii, a) = Ij then action[i, a] = “shift j” [ A :=   ,b]  Ii and A  S’ then action[i, a] =“reduce by A := ” for b [ S’ := S  ,$]  Ii then action[i, $] =“accept” GOTO table for non-terminals: GOTO[i, A] = j if GOTO(Ii, A) = Ii Empty entries are “error” Initial state is from Item containing [S’ :=  S ,$]

LR(1) Items (part) S’ :=  S ,$ S :=  L = R,$ S :=  R ,$ S’ := S$ L :=  id ,=/$ R :=  L ,$ S’ := S$ S := L = R | R L := * R | id R := L L S’ := S  ,$ 1 S := L= R ,$ R := L ,$ 2

More * R S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 S := L = R | R L := * R | id R := L * L := *R  ,=/$ 7 id L L L := id  ,=/$ 5 R := L  ,=/$ 8 S’ := S  ,$ 1 R := L  ,$ 10 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 L := id  ,$ 11 S := L  = R ,$ R := L  ,$ 2 R S := L = R  ,$ 9 others S := R  ,$ 3

Notice similar states? * R S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  id ,=/$ R :=  L ,$ L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 S := L = R | R L := * R | id R := L * L := *R  ,=/$ 7 id L L L := id  ,=/$ 5 R := L  ,=/$ 8 S’ := S  ,$ 1 R := L  ,$ 10 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 L := id  ,$ 11 S := L  = R ,$ R := L  ,$ 2 R S := L = R  ,$ 9 others S := R  ,$ 3

Notice similar states? * R S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  id ,=/$ R :=  L ,$ L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 S := L = R | R L := * R | id R := L * L := *R  ,=/$ 7 id L L 5 L := id  ,=/$ 8 R := L  ,=/$ S’ := S  ,$ 1 10 R := L  ,$ S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 S := L  = R ,$ R := L  ,$ 2 11 L := id  ,$ R S := L = R  ,$ 9 others S := R  ,$ 3

LALR S := CC C := cC | d c C := c  C ,c/d C :=  cC ,c/d C :=  d ,c/d 3 C c d S’ :=  S ,$ S := CC ,$ C :=  cC ,c/d C :=  d ,c/d d C := d  ,c/d 4 C := cC  ,c/d 8 S C := CC  ,$ 5 c C := cC , $ 9 S’ := S  ,$ 1 C C := c  C ,$ C :=  cC ,$ C :=  d ,$ 6 C C c S := C  C ,$ C :=  cC ,$ C :=  d ,$ 2 d C := d  ,$ 7

LALR S := CC C := cC | d c C := c  C, c/d/$ C :=  cC, c/d/$ C :=  d, c/d/$ 36 C c d S’ :=  S, $ S := CC, $ C :=  cC, c/d C :=  d, c/d d C := d , c/d/$ 47 C := cC ,c/d/$ 89 d c S C := CC , $ 5 c C := cC , $ 9 S’ := S , $ 1 C C := c  C, $ C :=  cC, $ C :=  d, $ 6 C C S := C  C, $ C :=  cC, $ C :=  d, $ 2 C := d , $ 7

LALR S := CC C := cC | d c C := c  C, c/d/$ C :=  cC, c/d/$ C :=  d, c/d/$ 36 C c d S’ :=  S, $ S := CC, $ C :=  cC, c/d C :=  d, c/d d C := d , c/d/$ 47 C := cC ,c/d/$ 89 d c S C := CC , $ 5 S’ := S , $ 1 C C S := C  C, $ C :=  cC, $ C :=  d, $ 2

LALR Construction Merge items with common cores Change GOTO table to reflect merges Can introduce reduce/reduce conflicts Cannot introduce shift/reduce conflicts

Ambiguous Grammars No ambiguous grammars can be LR(k) hence can not be parsed bottom-up Nevertheless, some of the ambiguous grammar are well-understood, and can be parsed by LR(k) with some tricks precedence associativity dangling-else

Precedence E := E*E | E+E | id S’ :=  E $ E :=  E * E E :=  E + E s/r on both * and +

Precedence E := E*E | E+E | id S’ :=  E $ E :=  E * E E :=  E + E What if we want both + and * right-associative? E := E *  E E :=  E * E E :=  E + E E :=  id E := E +  E E :=  E * E E :=  E + E E :=  id reduce on + reduce on * E := E * E  E := E  * E E := E  + E E := E + E  E := E  * E E := E  + E reduce on + shift on *

Parser Implementation Implementation Options: Write a parser from scratch not as boring as writing a lexer, but not exactly simple as you may imagine Use an automatic parser generator Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Both are used extensively in production compilers

semantic analyzer specification Yacc Tool semantic analyzer specification parser Yacc Creates a parser from a declarative specification involving a context-free grammar

Brief History YACC stands for Yet Another Compiler-Compiler It was first developed by Steve Johnson in 1975 for Unix There have been many later versions of YACC (e.g., GNU Bison), each offering minor improvements Ported to many languages YACC is now a standard tool, defined in IEEE Posix standard P1003.2

ML-Yacc User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non- terminals; special declarations to resolve conflicts Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc Definitions (preliminaries) Specify type of positions %pos int * int Specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS ... %nonterm prog | exp | stm Specify end-of-parse token %eop EOF Specify start symbol (by default, non terminal in LHS of first rule) %start prog

Example %% %term ASSIGN | ID | PLUS |NUM | SEMICOLON | TIMES %nonterm s | e %pos int %start p %eop EOF %left PLUS %left TIMES p -> s SEMICOLON p () -> () s -> ID ASSIGN e () e -> e PLUS e () | e TIMES e () | ID () | NUM ()

Summary Bottom-up parsing LR grammars are more powerful reverse order of derivations LR grammars are more powerful use of stacks and parse tables yet more complex Bonus: tools take the hard work for you, read the online ML-Yacc manual