Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3.

Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Efficient Parsers Pushdown automata Deterministic Report an error as soon as the input is not a prefix of a valid program Not usable for all context free grammars bison context free grammar tokens parser “Ambiguity errors” parse tree

Kinds of Parsers Top-Down (Predictive Parsing) LL –Construct parse tree in a top-down matter –Find the leftmost derivation –For every non-terminal and token predict the next production Bottom-Up LR –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order – For every potential right hand side and token decide when a production is found

Bottom-Up Syntax Analysis Input –A context free grammar –A stream of tokens Output –A syntax tree or error Method –Construct parse tree in a bottom-up manner –Find the rightmost derivation in (reversed order) – For every potential right hand side and token decide when a production is found –Report an error as soon as the input is not a prefix of valid program

Plan Pushdown automata Bottom-up parsing (given a parser table) Constructing the parser table Interesting non LR grammars

Pushdown Automaton control parser-table input stack $ $utw V

Bottom-Up Parser Actions reduce A   –Pop |  | symbols and |  | states from the stack –Apply the associated action –Push a symbol goto[top, A] on the stack shift X –Push X onto the stack –Advance the input accept –Parsing is complete error –Report an error

A Parser Table for S  a S b|  statedescriptionab$S 0initials1 r S   4 1parsed as1 r S   2 2parsed Ss3 3parsed a S b r S  a S b 4finalacc

stateab$S 0s1 r S   4 1s1 r S   2 2s3 3 r S  a S b 4acc stackinputaction 0($)aabb$s1 0($) 1(a)abb$s1 0($) 1(a)1(a)bb$ r S   0($) 1(a)1(a)2(S)bb$s3 0($) 1(a)1(a)2(S)3(b)b$ r S  a S b 0($) 1(a)2(S)b$s3 0 ($)1(a)2(S)3(b)$ r S  a S b 0($) 4(S)$acc

stateab$S 0s1 r S   4 1s1 r S   2 2s3 3 r S  a S b 4acc stackinputaction 0($)abb$s1 0($) 1(a)bb$ r S   0($) 1(a)2(S)bb$s3 0($) 1(a)2(S)3(b)b$ r S  a S b 0($) 4(S)b$err

A Parser Table for S  AB | A A  a B  a statedescriptiona$ABS 0initials125 1parsed a from Ar A  a 2parsed As3r S  A 4 3parsed a from Br B  a 4parsed ABr S  AB 5finalacc

stackinputaction 0($)aa$s1 0 ($)1(a)a$ r A  a 0 ($)2(A)a$s3 0 ($)2(A)3(a)$ r B  a 0 ($)2(A)4(B)$ r S  AB 0($)5(S)$acc statea$ABS 0s125 1 r A  a 2s3 r S  A 4 3 r B  a 4 r S  AB 5acc

A Parser Table for E  E + E| id statedescriptionid+$E 0initials12 1parsed id r E  id 2parsed Es3acc 3parsed E+s14 4parsed E+Es3 r E  E + E

The Challenge How to construct a parser-table from a given grammar LR(1) grammars –Left to right scanning –Rightmost derivations (reverse) –1 token Different solutions –Operator precedence –SLR(1) Simple LR(1) –CLR(1) Canonic LR(1) –LALR(1) Look Ahead LR(1) Yacc, Bison, CUP

Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) SLR(1) LL(1)

Constructing an SLR parsing table Add a production S’  S$ Construct a finite automaton accepting “valid stack symbols” –The states of the automaton becomes the states of parsing-table –Determine shift operations –Determine goto operations Construct reduce entries by analyzing the grammar

A finite Automaton for S’  S$ S  a S b|  statedescriptionab$S 0initials14 1parsed as12 2parsed Ss3 3parsed a S b 4final 012 3 4 a b S S a

Constructing a Finite Automaton NFA –For X  X 1 X 2 … X n [X  X 1 X 2 …X i  X i+1 … X n ] –“prefixes of rhs (handles)” –X 1 X 2 … X i is at the top of the stack and we expect X i+1 … X n The initial state [S’ . S$ ] –  ([X  X 1 …X i  X i+1 … X n ], X i+1 ) = [X  X 1 …X i X i+1  … X n ] –For every production X i+1    ([[X  X 1 X 2 …X i  X i+1 … X n ],  ) = [X i+1    ] Convert into DFA

NFA S’  S$ S  a S b|  [S’ .S$] [S’  S.$] [S .aSb] [S .] [S  a.Sb] [S  aSb.] [S  aS.b] S S b a    

[S’ .S$] [S’  S.$] [S .aSb] [S .] [S  a.Sb] [S  aSb.] [S  aS.b] S S b a     [S’ .S$] [S .aSb] [S .] [S  a.Sb] [S .aSb] [S .] a [S’  S.$] S [S  aS.b] S [S  aSb.] b DFA a

stateitemsab$S 0 [S .S$, S .aSb, S .] s14 1 [S  a.Sb$, S .aSb, S .] s12 2 [S  aS.b] s3 3 [S  aSb.] 4 [S’  S.$] acc [S’ .S$] [S .aSb] [S .] [S  a.Sb] [S .aSb] [S .] a [S’  S.$] S [S  aS.b] S [S  aSb.] b a

Filling reduce entries For an item [A .] we need to know the tokens that can follow A in a derivation from S’ Follow(A) = {t | S’  *  At  } See the textbook for an algorithm for constructing Follow from a given grammar

stateitemsab$S 0 [S .S$, S .aSb, S .] s14 1 [S  a.Sb$, S .aSb, S .] s12 2 [S  aS.b] s3 3 [S  aSb.] 4 [S’  S.$] acc [S’ .S$] [S .aSb] [S .] [S  a.Sb] [S .aSb] [S .] a [S’  S.$] S [S  aS.b] S [S  aSb.] b a Follow(S) = {b, $} r S   r S  a S b

Example E  E + E| E* E | id [S’ .E$] [E .E + E] [E .E *E] [E .id] id [E  id.] E [S’  E.$] [E  E. + E] [E  E. *E] [E  E +. E] [E .E +E] [E .E*E] [E .id] + [E  E *. E] [E .E +E] [E .E*E] [E .id] * id E E [E  E + E.] [E  E.+E] [E  E.*E] [E  E *E.] [E  E.+E] [E  E.*E] E E 1 2 3 4 5 6 7 + * * +

Interesting Non SLR(1) Grammar S’  S$ S  L = R | R L  *R | id R  L [S’ .S$] [S .L=R] [S .R] [L .*R] [L .id] [R  L] [S  L.=R] [R  L.] L = Partial DFA [S  L=.R] [R .L] [L .*R] [L .id] Follow(R)= {$, =}

LR(1) Parser Item [A . , t] –  is at the top of the stack and we are expecting  t LR(1) State –Sets of items LALR(1) State –Merge items with the same look-ahead

Interesting Non LR(1) Grammars Ambiguous –Arithmetic expressions –Dangling-else Common derived prefix –A  B 1 a b | B 2 a c –B 1   –B 2   Optional non-terminals –St  OptLab Ass –OptLab  id : |  –Ass  id := Exp

A motivating example Create a desk calculator Challenges –Non trivial syntax –Recursive expressions (semantics) Operator precedence

Solution (lexical analysis) /* desk.l */ % [0-9]+ { yylval = atoi(yytext); return NUM ;} “+” { return PLUS ;} “-” { return MINUS ;} “/” { return DIV ;} “*” { return MUL ;} “(“ { return LPAR ;} “)” { return RPAR ;} “//”.*[\n] ; /* comment */ [\t\n ] ; /* whitespace */. { error (“illegal symbol”, yytext[0]); }

Solution (syntax analysis) /* desk.y */ %token NUM %left PLUS, MINUS %left MUL, DIV %token LPAR, RPAR %start P % P : E {printf(“%d\n”, $1) ;} ; E : NUM { $$ = $1 ;} | LPAR e RPAR { $$ = $2; } | e PLUS e { $$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e {$$ = $1 / $3; } ; % #include “lex.yy.c” flex desk.l bison desk.y cc y.tab.c –ll -ly

Summary LR is a powerful technique Generates efficient parsers Generation tools exit –Bison, yacc, CUP But some grammars need to be tuned –Shift/Reduce conflicts –Reduce/Reduce conflicts –Efficiency of the generated parser

Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3.

Similar presentations

Presentation on theme: "Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3.

Similar presentations

Presentation on theme: "Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3."— Presentation transcript:

Similar presentations

About project

Feedback