Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University.

Similar presentations


Presentation on theme: "Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University."— Presentation transcript:

1 Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University

2 Today Review top-down parsing – Recursive descent LL(1) parsing Start bottom-up parsing – LR(k) – SLR(k) – LALR(k) 2

3 Top-down parsing Parser repeatedly faces the following problem – Given program P starting with terminal t – Non-terminal N with list of possible production rules: N  α 1 … N  α k Predict which rule can should be used to derive P 3

4 Recursive descent parsing Define a function for every nonterminal Every function work as follows – Find applicable production rule – Terminal function checks match with next input token – Nonterminal function calls (recursively) other functions If there are several applicable productions for a nonterminal, use lookahead 4

5 Boolean expressions example 5 not ( not true or false ) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E E (EOPE) notLITorLIT truefalse production to apply known from next token E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor

6 Flavors of top-down parsers Manually constructed – Recursive descent (previous lecture, review now) Generated (this lecture) – Based on pushdown automata – Does not use recursion 6

7 Recursive descent parsing Define a function for every nonterminal Every function work as follows – Find applicable production rule – Terminal function checks match with next input token – Nonterminal function calls (recursively) other functions If there are several applicable productions for a nonterminal, use lookahead 7

8 Matching tokens Variable current holds the current input token 8 match(token t) { if (current == t) current = next_token() else error } E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor

9 Functions for nonterminals 9 E() { if (current  {TRUE, FALSE}) // E  LIT LIT(); else if (current == LPAREN) // E  ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT)// E  not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor

10 Implementation via recursion E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor E() { if (current  {TRUE, FALSE})LIT(); else if (current == LPAREN)match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT)match(NOT); E(); else error; } LIT() { if (current == TRUE)match(TRUE); else if (current == FALSE)match(FALSE); elseerror; } OP() { if (current == AND)match(AND); else if (current == OR)match(OR); else if (current == XOR)match(XOR); elseerror; } 10

11 How is prediction done? For simplicity, let’s assume no null production rules – See book for general case Find out the token that can appear first in a rule – FIRST sets 11 p. 189

12 FIRST sets For every production rule A  α – FIRST(α) = all terminals that α can start with – Every token that can appear as first in α under some derivation for α In our Boolean expressions example – FIRST( LIT ) = { true, false } – FIRST( ( E OP E ) ) = { ‘(‘ } – FIRST( not E ) = { not } No intersection between FIRST sets => can always pick a single rule If the FIRST sets intersect, may need longer lookahead – LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens – LL(1) is an important and useful class 12

13 Computing FIRST sets Assume no null productions A   1.Initially, for all nonterminals A, set FIRST(A) = { t | A  tω for some ω } 2.Repeat the following until no changes occur: for each nonterminal A for each production A  Bω set FIRST(A) = FIRST(A) ∪ FIRST(B) This is known as fixed-point computation 13

14 FIRST sets computation example 14 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT

15 1. Initialization 15 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while

16 2. Iterate 1 16 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while zero? Not ++ --

17 2. Iterate 2 17 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ --

18 2. Iterate 3 – fixed-point 18 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- id constant

19 LL(k) grammars A grammar is in the class LL(K) when it can be derived via: – Top-down derivation – Scanning the input from left to right (L) – Producing the leftmost derivation (L) – With lookahead of k tokens (k) – For every two productions A  α and A  β we have FIRST(α) ∩ FIRST(β) = {} (and FIRST(A) ∩ FOLLOW(A) = {} for null productions) A language is said to be LL(k) when it has an LL(k) grammar What can we do if grammar is not LL(k)? 19

20 Pushdown automaton uses – Prediction stack – Input stream – Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t LL(k) parsing via pushdown automata 20

21 LL(k) parsing via pushdown automata Two possible moves – Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error – Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t If (t ≠ T) syntax error Parsing terminates when prediction stack is empty – If input is empty at that point, success. Otherwise, syntax error 21

22 Model of non-recursive predictive parser 22 Predictive Parsing program Parsing Table X Y Z $ Stack $b+a Output

23 ()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Example transition table 23 Which rule should be used

24 abc AA  aAbA  c A  aAb | c aacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A  aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A  aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A  c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success Running parser example 24

25 abc AA  aAbA  c A  aAb | c abcbb$ Input suffixStack contentMove abcbb$A$predict(A,a) = A  aAb abcbb$aAb$match(a,a) bcbb$Ab$predict(A,b) = ERROR Illegal input example 25

26 Using top-down parsing approach Compute parsing table – If table is conflict-free then we have an LL(k) parser If table contains conflicts investigate – If grammar is ambiguous try to disambiguate – Try using left-factoring/substitution/left-recursion elimination to remove conflicts 26

27 Marking “end-of-file” Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S $ where $ is not part of the set of tokens To parse an input P with G’ we change it into P$ Simplifies top-down parsing with null productions and LR parsing 27

28 Bottom-up parsing No problem with left recursion Widely used in practice Shift-reduce parsing: LR(k), SLR, LALR – All follow the same pushdown-based algorithm – Read input left-to-right producing rightmost derivation – Differ on type of “LR Items” Parser generator CUP implements LALR 28

29 Some terminology The opposite of derivation is called reduction – Let A  α be a production rule – Let βAµ be a sentence – Replace left-hand side of rule in sentence: βAµ => βαµ A handle is a substring that is reduced during a series of steps in a rightmost derivation 29

30 Rightmost derivation example 1 + (2) + (3) E + (E) + (3) + E  E + (E) E  i E 12+3 E E + (3) E () () E + (E) E E E E + (2) + (3) Each non-leaf node represents a handle Rightmost derivation in reverse 30

31 LR item N  α  β Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β 31

32 LR items 32 N  α  β Shift Item N  αβ  Reduce Item

33 Example 33 Z  expr EOF expr  term | expr + term term  ID | ( expr ) Z  E $ E  T | E + T T  i | ( E ) (just shorthand of the grammar on the top)

34 Example: parsing with LR items 34 Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ i+i$ Why do we need these additional LR items? Where do they come from? What do they mean?

35  -closure Given a set S of LR(0) items If P  α  Nβ is in S then for each rule N  in the grammar S must also contain N    35  -closure ({Z   E $}) = E   T, E   E + T, T   i, T   ( E ) } { Z   E $, Z  E $ E  T | E + T T  i | ( E )

36 36 Example: parsing with LR items i+i$ E   T E   E + T T   i T   ( E ) Z   E $ Z  E $ E  T | E + T T  i | ( E ) Items denote possible future handles Remember position from which we’re trying to reduce

37 37 Example: parsing with LR items T  i  Reduce item! i+i$ E   T E   E + T T   i T   ( E ) Z   E $ Z  E $ E  T | E + T T  i | ( E ) Match items with current token

38 38 Example: parsing with LR items i E  T  Reduce item! T+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $

39 39 Example: parsing with LR items T E  T  Reduce item! i E+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $

40 40 Example: parsing with LR items T i E+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ E  E  + T Z  E  $

41 41 Example: parsing with LR items T i E+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ E  E  + T Z  E  $ E  E+  T T   i T   ( E )

42 42 Example: parsing with LR items E  E  + T Z  E  $ E  E+  T T   i T   ( E ) E+T$ i Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ T i

43 43 E   T E   E + T T   i T   ( E ) Z   E $ Z  E $ E  T | E + T T  i | ( E ) E+T Example: parsing with LR items T i E  E  + T Z  E  $ E  E+  T T   i T   ( E ) i E  E+T  $ Reduce item!

44 44 E   T E   E + T T   i T   ( E ) Z   E $ E$ Example: parsing with LR items E T i +T Z  E  $ E  E  + T i Z  E $ E  T | E + T T  i | ( E )

45 45 E   T E   E + T T   i T   ( E ) Z   E $ E$ Example: parsing with LR items E T i +T Z  E  $ E  E  + T Z  E$  Reduce item! i Z  E $ E  T | E + T T  i | ( E )

46 46 E   T E   E + T T   i T   ( E ) Z   E $ Z Example: parsing with LR items E T i +T Z  E  $ E  E  + T Z  E$  Reduce item! E$ i Z  E $ E  T | E + T T  i | ( E )

47 Computing item sets Initial set – Z is in the start symbol –  -closure({ Z   α | Z  α is in the grammar } ) Next set from a set S and the next symbol X – step(S,X) = { N  αX  β | N  α  Xβ in the item set S} – nextSet(S,X) =  -closure(step(S,X)) 47

48 LR(0) automaton example 48 Z   E$ E   T E   E + T T   i T   (E) T  (  E) E   T E   E + T T   i T   (E) E  E + T  T  (E)  Z  E$  Z  E  $ E  E  + T E  E+  T T   i T   (E) T  i  T  (E  ) E  E  +T E  T  q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( reduce state shift state

49 GOTO/ACTION tables 49 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2Z  E$ q3q5q7q4Shift q4E  E+T q5TiTi q6ETET q7q5q7q8q6shift q8q3q9shift q9TETE GOTO Table ACTION Table empty – error move

50 LR(0) parser tables Two types of rows: – Shift row – tells which state to GOTO for current token – Reduce row – tells which rule to reduce (independent of current token) GOTO entries are blank 50

51 LR parser data structures Input – remainder of text to be processed Stack – sequence of pairs N, qi – N – symbol (terminal or non-terminal) – qi – state at which decisions are made Initial stack contains q0 51 +i$ input q0 stack iq5

52 LR(0) pushdown automaton Two moves: shift and reduce Shift move – Remove first token from input – Push it on the stack – Compute next state based on GOTO table – Push new state on the stack – If new state is error – report error 52 i+i$ input q0 stack +i$ input q0 stack shift iq5 Statei+()$ETaction q0q5q7q1q6shift

53 LR(0) pushdown automaton Reduce move – Using a rule N  α – Symbols in α and their following states are removed from stack – New state computed based on GOTO table (using top of stack, before pushing N) – N is pushed on the stack – New state pushed on top of N 53 +i$ input q0 stack iq5 Reduce T  i +i$ input q0 stack Tq6 Statei+()$ETaction q0q5q7q1q6shift

54 GOTO/ACTION table 54 Statei+()$ET q0s5s7s1s6 q1s3s2 q2r1 q3s5s7s4 q4r3 q5r4 q6r2 q7s5s7s8s6 q8s3s9 q9r5 (1)Z  E $ (2)E  T (3)E  E + T (4)T  i (5)T  ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m

55 GOTO/ACTION table 55 sti+()$ET q0s5s7s1s6 q1s3s2 q2r1 q3s5s7s4 q4r3 q5r4 q6r2 q7s5s7s8s6 q8s3s9 q9r5 (1)Z  E $ (2)E  T (3)E  E + T (4)T  i (5)T  ( E ) StackInputAction q0i + i $s5 q0 i q5+ i $r4 q0 T q6+ i $r2 q0 E q1+ i $s3 q0 E q1 + q3i $s5 q0 E q1 + q3 i q5$r4 q0 E q1 + q3 T q4$r3 q0 E q1$s2 q0 E q1 $ q2r1 q0 Z top is on the right

56 Are we done? Can make a transition diagram for any grammar Can make a GOTO table for every grammar Cannot make a deterministic ACTION table for every grammar 56

57 LR(0) conflicts 57 Z  E $ E  T E  E + T T  i T  ( E ) T  i[E] Z   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  T  i  [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …

58 LR(0) conflicts 58 Z  E $ E  T E  E + T T  i V  i T  ( E ) Z   E$ E   T E   E + T T   i T   (E) T   i[E] T  i  V  i  q0q0 q5q5 T ( i E reduce/reduce conflict … … …

59 LR(0) conflicts Any grammar with an  -rule cannot be LR(0) Inherent shift/reduce conflict – A   – reduce item – P  α  Aβ – shift item – A   can always be predicted from P  α  Aβ 59

60 Back to the GOTO/ACTIONS tables 60 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2Z  E$ q3q5q7q4Shift q4E  E+T q5TiTi q6ETET q7q5q7q8q6shift q8q3q9shift q9TETE GOTO Table ACTION Table ACTION table determined only by transition diagram, ignores input

61 SRL grammars A handle should not be reduced to a non- terminal N if the look-ahead is a token that cannot follow N A reduce item N  α  is applicable only when the look-ahead is in FOLLOW(N) Differs from LR(0) only on the ACTION table 61

62 SLR ACTION table 62 (1)Z  E $ (2)E  T (3)E  E + T (4)T  i (5)T  ( E ) Statei+()$ q0shift q1shift q2Z  E$ q3shift q4E  E+T q5TiTiTiTiTiTi q6ETETETETETET q7shift q8shift q9T  (E) Lookahead token from the input Remember: In contrast, GOTO table is indexed by state and a grammar symbol from the stack

63 SLR ACTION table 63 Statei+()[]$ q0shift q1shift q2Z  E$ q3shift q4E  E+T q5TiTiTiTishiftTiTi q6ETETETETETET q7shift q8shift q9T  (E) vs. stateaction q0shift q1shift q2Z  E$ q3Shift q4E  E+T q5TiTi q6ETET q7shift q8shift q9TETE SLR – use 1 token look-aheadLR(0) – no look-ahead … as before… T  i T  i[E]

64 Are we done? 64 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

65 65 S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S’ → S  S → L  = R R → L  S → R  L → *  R R →  L L →  * R L →  id L → id  S → L =  R R →  L L →  * R L →  id L → * R  R → L  S → L = R  S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5

66 shift/reduce conflict S → L  = R vs. R → L  FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve the conflict either 66 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

67 LR(1) grammars In SLR: a reduce item N  α  is applicable only when the look-ahead is in FOLLOW(N) But FOLLOW(N) merges look-ahead for all alternatives for N LR(1) keeps look-ahead with each LR item Idea: a more refined notion of follows computed per item 67

68 LR(1) item LR(1) item is a pair – LR(0) item – Look-ahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token. Example – The production L  id yields the following LR(1) items 68 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

69  -closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b  FIRST(βc) – Add [B → ● δ, b] to S 69

70 70 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id

71 Back to the conflict Is there a conflict now? 71 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2

72 LALR LR tables have large number of entries Often don’t need such refined observation (and cost) LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts LALR not as powerful as LR(1) 72

73 See you next time 73


Download ppt "Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University."

Similar presentations


Ads by Google