Download presentation
Presentation is loading. Please wait.
Published byRudolph Palmer Modified over 9 years ago
1
Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University
2
Today Review top-down parsing – Recursive descent LL(1) parsing Start bottom-up parsing – LR(k) – SLR(k) – LALR(k) 2
3
Top-down parsing Parser repeatedly faces the following problem – Given program P starting with terminal t – Non-terminal N with list of possible production rules: N α 1 … N α k Predict which rule can should be used to derive P 3
4
Recursive descent parsing Define a function for every nonterminal Every function work as follows – Find applicable production rule – Terminal function checks match with next input token – Nonterminal function calls (recursively) other functions If there are several applicable productions for a nonterminal, use lookahead 4
5
Boolean expressions example 5 not ( not true or false ) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E E (EOPE) notLITorLIT truefalse production to apply known from next token E LIT | (E OP E) | not E LIT true | false OP and | or | xor
6
Flavors of top-down parsers Manually constructed – Recursive descent (previous lecture, review now) Generated (this lecture) – Based on pushdown automata – Does not use recursion 6
7
Recursive descent parsing Define a function for every nonterminal Every function work as follows – Find applicable production rule – Terminal function checks match with next input token – Nonterminal function calls (recursively) other functions If there are several applicable productions for a nonterminal, use lookahead 7
8
Matching tokens Variable current holds the current input token 8 match(token t) { if (current == t) current = next_token() else error } E LIT | (E OP E) | not E LIT true | false OP and | or | xor
9
Functions for nonterminals 9 E() { if (current {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT)// E not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } E LIT | (E OP E) | not E LIT true | false OP and | or | xor
10
Implementation via recursion E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor E() { if (current {TRUE, FALSE})LIT(); else if (current == LPAREN)match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT)match(NOT); E(); else error; } LIT() { if (current == TRUE)match(TRUE); else if (current == FALSE)match(FALSE); elseerror; } OP() { if (current == AND)match(AND); else if (current == OR)match(OR); else if (current == XOR)match(XOR); elseerror; } 10
11
How is prediction done? For simplicity, let’s assume no null production rules – See book for general case Find out the token that can appear first in a rule – FIRST sets 11 p. 189
12
FIRST sets For every production rule A α – FIRST(α) = all terminals that α can start with – Every token that can appear as first in α under some derivation for α In our Boolean expressions example – FIRST( LIT ) = { true, false } – FIRST( ( E OP E ) ) = { ‘(‘ } – FIRST( not E ) = { not } No intersection between FIRST sets => can always pick a single rule If the FIRST sets intersect, may need longer lookahead – LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens – LL(1) is an important and useful class 12
13
Computing FIRST sets Assume no null productions A 1.Initially, for all nonterminals A, set FIRST(A) = { t | A tω for some ω } 2.Repeat the following until no changes occur: for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B) This is known as fixed-point computation 13
14
FIRST sets computation example 14 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant TERMEXPRSTMT
15
1. Initialization 15 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while
16
2. Iterate 1 16 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while zero? Not ++ --
17
2. Iterate 2 17 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ --
18
2. Iterate 3 – fixed-point 18 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- id constant
19
LL(k) grammars A grammar is in the class LL(K) when it can be derived via: – Top-down derivation – Scanning the input from left to right (L) – Producing the leftmost derivation (L) – With lookahead of k tokens (k) – For every two productions A α and A β we have FIRST(α) ∩ FIRST(β) = {} (and FIRST(A) ∩ FOLLOW(A) = {} for null productions) A language is said to be LL(k) when it has an LL(k) grammar What can we do if grammar is not LL(k)? 19
20
Pushdown automaton uses – Prediction stack – Input stream – Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t LL(k) parsing via pushdown automata 20
21
LL(k) parsing via pushdown automata Two possible moves – Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error – Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t If (t ≠ T) syntax error Parsing terminates when prediction stack is empty – If input is empty at that point, success. Otherwise, syntax error 21
22
Model of non-recursive predictive parser 22 Predictive Parsing program Parsing Table X Y Z $ Stack $b+a Output
23
()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Example transition table 23 Which rule should be used
24
abc AA aAbA c A aAb | c aacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success Running parser example 24
25
abc AA aAbA c A aAb | c abcbb$ Input suffixStack contentMove abcbb$A$predict(A,a) = A aAb abcbb$aAb$match(a,a) bcbb$Ab$predict(A,b) = ERROR Illegal input example 25
26
Using top-down parsing approach Compute parsing table – If table is conflict-free then we have an LL(k) parser If table contains conflicts investigate – If grammar is ambiguous try to disambiguate – Try using left-factoring/substitution/left-recursion elimination to remove conflicts 26
27
Marking “end-of-file” Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’ S $ where $ is not part of the set of tokens To parse an input P with G’ we change it into P$ Simplifies top-down parsing with null productions and LR parsing 27
28
Bottom-up parsing No problem with left recursion Widely used in practice Shift-reduce parsing: LR(k), SLR, LALR – All follow the same pushdown-based algorithm – Read input left-to-right producing rightmost derivation – Differ on type of “LR Items” Parser generator CUP implements LALR 28
29
Some terminology The opposite of derivation is called reduction – Let A α be a production rule – Let βAµ be a sentence – Replace left-hand side of rule in sentence: βAµ => βαµ A handle is a substring that is reduced during a series of steps in a rightmost derivation 29
30
Rightmost derivation example 1 + (2) + (3) E + (E) + (3) + E E + (E) E i E 12+3 E E + (3) E () () E + (E) E E E E + (2) + (3) Each non-leaf node represents a handle Rightmost derivation in reverse 30
31
LR item N α β Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β 31
32
LR items 32 N α β Shift Item N αβ Reduce Item
33
Example 33 Z expr EOF expr term | expr + term term ID | ( expr ) Z E $ E T | E + T T i | ( E ) (just shorthand of the grammar on the top)
34
Example: parsing with LR items 34 Z E $ E T | E + T T i | ( E ) E T E E + T T i T ( E ) Z E $ i+i$ Why do we need these additional LR items? Where do they come from? What do they mean?
35
-closure Given a set S of LR(0) items If P α Nβ is in S then for each rule N in the grammar S must also contain N 35 -closure ({Z E $}) = E T, E E + T, T i, T ( E ) } { Z E $, Z E $ E T | E + T T i | ( E )
36
36 Example: parsing with LR items i+i$ E T E E + T T i T ( E ) Z E $ Z E $ E T | E + T T i | ( E ) Items denote possible future handles Remember position from which we’re trying to reduce
37
37 Example: parsing with LR items T i Reduce item! i+i$ E T E E + T T i T ( E ) Z E $ Z E $ E T | E + T T i | ( E ) Match items with current token
38
38 Example: parsing with LR items i E T Reduce item! T+i$ Z E $ E T | E + T T i | ( E ) E T E E + T T i T ( E ) Z E $
39
39 Example: parsing with LR items T E T Reduce item! i E+i$ Z E $ E T | E + T T i | ( E ) E T E E + T T i T ( E ) Z E $
40
40 Example: parsing with LR items T i E+i$ Z E $ E T | E + T T i | ( E ) E T E E + T T i T ( E ) Z E $ E E + T Z E $
41
41 Example: parsing with LR items T i E+i$ Z E $ E T | E + T T i | ( E ) E T E E + T T i T ( E ) Z E $ E E + T Z E $ E E+ T T i T ( E )
42
42 Example: parsing with LR items E E + T Z E $ E E+ T T i T ( E ) E+T$ i Z E $ E T | E + T T i | ( E ) E T E E + T T i T ( E ) Z E $ T i
43
43 E T E E + T T i T ( E ) Z E $ Z E $ E T | E + T T i | ( E ) E+T Example: parsing with LR items T i E E + T Z E $ E E+ T T i T ( E ) i E E+T $ Reduce item!
44
44 E T E E + T T i T ( E ) Z E $ E$ Example: parsing with LR items E T i +T Z E $ E E + T i Z E $ E T | E + T T i | ( E )
45
45 E T E E + T T i T ( E ) Z E $ E$ Example: parsing with LR items E T i +T Z E $ E E + T Z E$ Reduce item! i Z E $ E T | E + T T i | ( E )
46
46 E T E E + T T i T ( E ) Z E $ Z Example: parsing with LR items E T i +T Z E $ E E + T Z E$ Reduce item! E$ i Z E $ E T | E + T T i | ( E )
47
Computing item sets Initial set – Z is in the start symbol – -closure({ Z α | Z α is in the grammar } ) Next set from a set S and the next symbol X – step(S,X) = { N αX β | N α Xβ in the item set S} – nextSet(S,X) = -closure(step(S,X)) 47
48
LR(0) automaton example 48 Z E$ E T E E + T T i T (E) T ( E) E T E E + T T i T (E) E E + T T (E) Z E$ Z E $ E E + T E E+ T T i T (E) T i T (E ) E E +T E T q0q0 q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 q9q9 T ( i E + $ T ) + E i T ( i ( reduce state shift state
49
GOTO/ACTION tables 49 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2Z E$ q3q5q7q4Shift q4E E+T q5TiTi q6ETET q7q5q7q8q6shift q8q3q9shift q9TETE GOTO Table ACTION Table empty – error move
50
LR(0) parser tables Two types of rows: – Shift row – tells which state to GOTO for current token – Reduce row – tells which rule to reduce (independent of current token) GOTO entries are blank 50
51
LR parser data structures Input – remainder of text to be processed Stack – sequence of pairs N, qi – N – symbol (terminal or non-terminal) – qi – state at which decisions are made Initial stack contains q0 51 +i$ input q0 stack iq5
52
LR(0) pushdown automaton Two moves: shift and reduce Shift move – Remove first token from input – Push it on the stack – Compute next state based on GOTO table – Push new state on the stack – If new state is error – report error 52 i+i$ input q0 stack +i$ input q0 stack shift iq5 Statei+()$ETaction q0q5q7q1q6shift
53
LR(0) pushdown automaton Reduce move – Using a rule N α – Symbols in α and their following states are removed from stack – New state computed based on GOTO table (using top of stack, before pushing N) – N is pushed on the stack – New state pushed on top of N 53 +i$ input q0 stack iq5 Reduce T i +i$ input q0 stack Tq6 Statei+()$ETaction q0q5q7q1q6shift
54
GOTO/ACTION table 54 Statei+()$ET q0s5s7s1s6 q1s3s2 q2r1 q3s5s7s4 q4r3 q5r4 q6r2 q7s5s7s8s6 q8s3s9 q9r5 (1)Z E $ (2)E T (3)E E + T (4)T i (5)T ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m
55
GOTO/ACTION table 55 sti+()$ET q0s5s7s1s6 q1s3s2 q2r1 q3s5s7s4 q4r3 q5r4 q6r2 q7s5s7s8s6 q8s3s9 q9r5 (1)Z E $ (2)E T (3)E E + T (4)T i (5)T ( E ) StackInputAction q0i + i $s5 q0 i q5+ i $r4 q0 T q6+ i $r2 q0 E q1+ i $s3 q0 E q1 + q3i $s5 q0 E q1 + q3 i q5$r4 q0 E q1 + q3 T q4$r3 q0 E q1$s2 q0 E q1 $ q2r1 q0 Z top is on the right
56
Are we done? Can make a transition diagram for any grammar Can make a GOTO table for every grammar Cannot make a deterministic ACTION table for every grammar 56
57
LR(0) conflicts 57 Z E $ E T E E + T T i T ( E ) T i[E] Z E$ E T E E + T T i T (E) T i[E] T i T i [E] q0q0 q5q5 T ( i E Shift/reduce conflict … … …
58
LR(0) conflicts 58 Z E $ E T E E + T T i V i T ( E ) Z E$ E T E E + T T i T (E) T i[E] T i V i q0q0 q5q5 T ( i E reduce/reduce conflict … … …
59
LR(0) conflicts Any grammar with an -rule cannot be LR(0) Inherent shift/reduce conflict – A – reduce item – P α Aβ – shift item – A can always be predicted from P α Aβ 59
60
Back to the GOTO/ACTIONS tables 60 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2Z E$ q3q5q7q4Shift q4E E+T q5TiTi q6ETET q7q5q7q8q6shift q8q3q9shift q9TETE GOTO Table ACTION Table ACTION table determined only by transition diagram, ignores input
61
SRL grammars A handle should not be reduced to a non- terminal N if the look-ahead is a token that cannot follow N A reduce item N α is applicable only when the look-ahead is in FOLLOW(N) Differs from LR(0) only on the ACTION table 61
62
SLR ACTION table 62 (1)Z E $ (2)E T (3)E E + T (4)T i (5)T ( E ) Statei+()$ q0shift q1shift q2Z E$ q3shift q4E E+T q5TiTiTiTiTiTi q6ETETETETETET q7shift q8shift q9T (E) Lookahead token from the input Remember: In contrast, GOTO table is indexed by state and a grammar symbol from the stack
63
SLR ACTION table 63 Statei+()[]$ q0shift q1shift q2Z E$ q3shift q4E E+T q5TiTiTiTishiftTiTi q6ETETETETETET q7shift q8shift q9T (E) vs. stateaction q0shift q1shift q2Z E$ q3Shift q4E E+T q5TiTi q6ETET q7shift q8shift q9TETE SLR – use 1 token look-aheadLR(0) – no look-ahead … as before… T i T i[E]
64
Are we done? 64 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L
65
65 S’ → S S → L = R S → R L → * R L → id R → L S’ → S S → L = R R → L S → R L → * R R → L L → * R L → id L → id S → L = R R → L L → * R L → id L → * R R → L S → L = R S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5
66
shift/reduce conflict S → L = R vs. R → L FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve the conflict either 66 S → L = R R → L S → L = R R → L L → * R L → id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L
67
LR(1) grammars In SLR: a reduce item N α is applicable only when the look-ahead is in FOLLOW(N) But FOLLOW(N) merges look-ahead for all alternatives for N LR(1) keeps look-ahead with each LR item Idea: a more refined notion of follows computed per item 67
68
LR(1) item LR(1) item is a pair – LR(0) item – Look-ahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token. Example – The production L id yields the following LR(1) items 68 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L
69
-closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b FIRST(βc) – Add [B → ● δ, b] to S 69
70
70 (S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id
71
Back to the conflict Is there a conflict now? 71 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2
72
LALR LR tables have large number of entries Often don’t need such refined observation (and cost) LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts LALR not as powerful as LR(1) 72
73
See you next time 73
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.