Download presentation
Presentation is loading. Please wait.
1
Lecture 4: Syntax Analysis: Parsing Noam Rinetzky
Compilation Lecture 4: Syntax Analysis: Parsing Noam Rinetzky
3
The Real Anatomy of a Compiler
Source text txt Process text input Lexical Analysis Lexical Analysis Syntax Analysis Syntax Analysis Sem. Analysis characters tokens AST Annotated AST Intermediate code generation Intermediate code optimization Code generation IR IR Symbolic Instructions Executable code exe Target code optimization SI Machine code generation MI Write executable output
4
Broad kinds of parsers Parsers for arbitrary grammars Top-Down parsers
Earley’s method, CYK method Usually, not used in practice (though might change) Top-Down parsers Construct parse tree in a top-down matter Find the leftmost derivation Bottom-Up parsers Construct parse tree in a bottom-up manner Find the rightmost derivation in a reverse order
5
CFG terminology S S ; S S id := E E id E num E E + E
Symbols: Terminals (tokens): ; := ( ) id num print Non-terminals: S E L Start non-terminal: S Convention: the non-terminal appearing in the first derivation rule Grammar productions (rules) N μ
6
CFG terminology Derivation - a sequence of replacements of non-terminals using the derivation rules Language - the set of strings of terminals derivable from the start symbol Sentential form - the result of a partial derivation May contain non-terminals
7
Derivations Show that a sentence ω is in a grammar G
Start with the start symbol Repeatedly replace one of the non-terminals by a right-hand side of a production Stop when the sentence contains only terminals Given a sentence αNβ and rule Nµ αNβ => αµβ ω is in L(G) if S =>* ω
8
Predictive parsing Recursive descent LL(k) grammars
9
Predictive parsing Given a grammar G and a word w attempt to derive w using G Idea Apply production to leftmost nonterminal Pick production rule based on next input token General grammar More than one option for choosing the next production based on a token Restricted grammars (LL) Know exactly which single rule to apply May require some lookahead to decide
10
Boolean expressions example
E LIT | (E OP E) | not E LIT true | false OP and | or | xor not ( not true or false )
11
Boolean expressions example
E LIT | (E OP E) | not E LIT true | false OP and | or | xor not ( not true or false ) production to apply known from next token E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E ( OP ) LIT or true false
12
Implementation via recursion
if (current {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error; } E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } OP() { if (current == AND) match(AND); else if (current == OR) match(OR); else if (current == XOR) match(XOR); else error; }
13
FIRST sets FIRST(X) = { t | X * t β} ∪{ℇ | X * ℇ} Example:
FIRST(X) = all terminals that α can appear as first in some derivation for X + ℇ if can be derived from X Example: FIRST( LIT ) = { true, false } FIRST( ( E OP E ) ) = { ‘(‘ } FIRST( not E ) = { not } First(α) can be defined for any sequence of symbols
14
Computing FIRST sets FIRST (t) = { t } // “t” terminal ℇ∈ FIRST(X) if
X ℇ or X A1 .. Ak and ℇ∈ FIRST(Ai) i=1…k FIRST(α) ⊆ FIRST(X) if X A1 .. Ak α and ℇ∈ FIRST(Ai) i=1…k First(X) is computed for non-terminals
15
Follow sets Follow(X) = { t | S * α X t β} t – Terminal or $
16
FOLLOW sets: Constraints
FIRST(β) – {ℇ} ⊆ FOLLOW(X) For each A α X β FOLLOW(A) ⊆ FOLLOW(X) For each A α X β and ℇ ∈ FIRST(β)
17
Example: FOLLOW sets E TX X+ E | ℇ T (E) | int Y Y * T | ℇ
FIRST(β) – {ℇ} ⊆ FOLLOW(X) For each A α X β FOLLOW(A) ⊆ FOLLOW(X) For each A α X β and ℇ ∈ FIRST(β) Non. Term. E T X Y FOLLOW ), $ +, ), $ $, )
18
Prediction Table A α T[A,t] = α if t ∈FIRST(α)
T[A,t] = α if ℇ ∈ FIRST(α) and t ∈ FOLLOW(A) t can also be $ T is not well defined the grammar is not LL(1)
19
LL(k) grammars A grammar is in the class LL(K) when it can be derived via: Top-down derivation Scanning the input from left to right (L) Producing the leftmost derivation (L) With lookahead of k tokens (k) A language is said to be LL(k) when it has an LL(k) grammar
20
LL(1) grammars A grammar is in the class LL(1) iff
For every two productions A α and A β we have FIRST(α) ∩ FIRST(β) = {} // including If ∈ FIRST(α) then FIRST(β) ∩ FOLLOW(A) = {} If ∈ FIRST(β) then FIRST(α) ∩ FOLLOW(A) = {}
21
Problem: Non LL Grammars
22
Back to problem 1: common prefix
term ID | indexed_elem indexed_elem ID [ expr ] FIRST(term) = { ID } FIRST(indexed_elem) = { ID } FIRST/FIRST conflict
23
Solution: left factoring
Rewrite the grammar to be in LL(1) term ID | indexed_elem indexed_elem ID [ expr ] term ID after_ID After_ID [ expr ] | Intuition: just like factoring x*y + x*z into x*(y+z)
24
Left factoring – another example
S if E then S else S | if E then S | T S if E then S S’ | T S’ else S |
25
Problem: null production
S A a b A a | bool S() { return A() && match(token(‘a’)) && match(token(‘b’)); } bool A() { return match(token(‘a’)) || true; What happens for input “ab”? What happens if you flip order of alternatives and try “aab”?
26
Problem: null production
S A a b A a | FIRST(S) = { a } FOLLOW(S) = { $ } FIRST(A) = { a, } FOLLOW(A) = { a } FIRST/FOLLOW conflict
27
Back to problem 2: null production
S A a b A a | FIRST(S) = { a } FOLLOW(S) = { } FIRST(A) = { a , } FOLLOW(A) = { a } FIRST/FOLLOW conflict
28
Solution: substitution
S A a b A a | Substitute A in S S a a b | a b Left factoring S a after_A after_A a b | b
29
Back to problem 3: Left recursion
E E - term | term Left recursion cannot be handled with a bounded lookahead What can we do?
30
Left recursion removal
p. 130 N Nα | β N βN’ N’ αN’ | G1 G2 L(G1) = β, βα, βαα, βααα, … L(G2) = same Can be done algorithmically. Problem: grammar becomes mangled beyond recognition For our 3rd example: E E - term | term E term TE | term TE - term TE |
31
LL(k) Parsers Recursive Descent Wanted Manual construction
Uses recursion Wanted A parser that can be generated automatically Does not use recursion
32
LL(k) parsing via pushdown automata
Pushdown automaton uses Prediction stack Input stream Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t
33
LL(k) parsing via pushdown automata
Two possible moves Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error Parsing terminates when prediction stack is empty If input is empty at that point, success. Otherwise, syntax error
34
Example transition table
(1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Which rule should be used Input tokens ( ) not true false and or xor $ E 2 3 1 LIT 4 5 OP 6 7 8 Nonterminals
35
Model of non-recursive predictive parser
$ b + a Predictive Parsing program Stack X Y Z $ Output Parsing Table
36
Running parser example
aacbb$ A aAb | c Input suffix Stack content Move aacbb$ A$ predict(A,a) = A aAb aAb$ match(a,a) acbb$ Ab$ aAbb$ cbb$ Abb$ predict(A,c) = A c match(c,c) bb$ match(b,b) b$ $ match($,$) – success a b c A A aAb A c
37
Erorrs
38
Handling Syntax Errors
Report and locate the error Diagnose the error Correct the error Recover from the error in order to discover more errors without reporting too many “strange” errors
39
Error Diagnosis Line number The current token The expected tokens
may be far from the actual error The current token The expected tokens Parser configuration
40
Error Recovery Becomes less important in interactive environments
Example heuristics: Search for a semi-column and ignore the statement Try to “replace” tokens for common errors Refrain from reporting 3 subsequent errors Globally optimal solutions For every input w, find a valid program w’ with a “minimal-distance” from w
41
Illegal input example abcbb$ A aAb | c Input suffix Stack content
Move abcbb$ A$ predict(A,a) = A aAb aAb$ match(a,a) bcbb$ Ab$ predict(A,b) = ERROR a b c A A aAb A c
42
Error handling in LL parsers
c$ S a c | b S Input suffix Stack content Move c$ S$ predict(S,c) = ERROR Now what? Predict b S anyway “missing token b inserted in line XXX” a b c S S a c S b S
43
Error handling in LL parsers
c$ S a c | b S Input suffix Stack content Move bc$ S$ predict(b,c) = S bS bS$ match(b,b) c$ Looks familiar? Result: infinite loop a b c S S a c S b S
44
Error handling and recovery
x = a * (p+q * ( -b * (r-s); Where should we report the error? The valid prefix property
45
The Valid Prefix Property
For every prefix tokens t1, t2, …, ti that the parser identifies as legal: there exists tokens ti+1, ti+2, …, tn such that t1, t2, …, tn is a syntactically valid program If every token is considered as single character: For every prefix word u that the parser identifies as legal there exists w such that u.w is a valid program
46
Recovery is tricky Heuristics for dropping tokens, skipping to semicolon, etc.
47
Building the Parse Tree
48
Adding semantic actions
Can add an action to perform on each production rule Can build the parse tree Every function returns an object of type Node Every Node maintains a list of children Function calls can add new children
49
Building the parse tree
Node E() { result = new Node(); result.name = “E”; if (current {TRUE, FALSE}) // E LIT result.addChild(LIT()); else if (current == LPAREN) // E ( E OP E ) result.addChild(match(LPAREN)); result.addChild(E()); result.addChild(OP()); result.addChild(match(RPAREN)); else if (current == NOT) // E not E result.addChild(match(NOT)); else error; return result; }
50
Parser for Fully Parenthesized Expers
static int Parse_Expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ‘(‘) { expr->type=‘P’; get_next_token(); if (!Parse_Expression(&expr->left)) Error(“missing expression”); if (!Parse_Operator(&expr->oper)) Error(“missing operator”); if (Token.class != ‘)’) Error(“missing )”); return 1; } return 0; }
51
Bottom Up parsing
52
Bottom-Up Parsing Goal: Build a parse tree How:
Report error if input is not a legal program How: Read input left-to-right Construct a subtree for the first left-most tree node whose childern have been constructed
53
(Non standard precedence)
Bottom-up parsing E E * T E T T T + F T F F id F num F ( E ) E E T T T F F F 1 + 2 * 3 (Non standard precedence)
54
Bottom-up parsing: LR(k) Grammars
A grammar is in the class LR(K) when it can be derived via: Bottom-up derivation Scanning the input from left to right (L) Producing the rightmost derivation (R) In reverse oreder With lookahead of k tokens (k) A language is said to be LR(k) if it has an LR(k) grammar The simplest case is LR(0), which we will discuss
55
Terminology: Reductions & Handles
The opposite of derivation is called reduction Let A α be a production rule Derivation: βAµ βαµ Reduction: βαµ βAµ A handle is the reduced substring α is the handles for βαµ
56
Use Shift & Reduce In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.
57
How does the parser know what to do?
token stream ) x * 7 + 23 ( RP Id OP Num LP Input Action Table Parser Stack Output Goto table Op(*) Id(b) Num(23) Num(7) Op(+)
58
How does the parser know what to do?
A state will keep the info gathered on handle(s) A state in the “control” of the PDA Also (part of) the stack alpha bet A table will tell it “what to do” based on current state and next token The transition function of the PDA A stack will records the “nesting level” Stack contains a sequence of prefixes of handles Set of LR(0) items
59
Important Bottom-Up LR-Parsers
LR(0) – simplest, explains basic ideas SLR(1) – simple, exaplins lookahead LR(1) – complictated, very powerful, expensive LALR(1) – complicated, powerful enough, used by automatic tools
60
LR(0) vs SLR(1) vs LR(1) vs LALR(1)
All use shift / reduce Main difference: how to identify a handle Technically: Using different sets of states More expsensive more states more specific choice of which reduction rule to use But the usage of the states is the same in all parsers Reduction is the same in all techniques Once the handle is determined
61
LR(0) Parsing
62
N αβ LR item Already matched To be matched Input
Hypothesis about αβ being a possible handle: so far we’ve matched α, expecting to see β
63
Example: LR(0) Items All items can be obtained by placing a dot at every position for every production: LR(0) items 1: S E$ 2: S E $ 3: S E $ 4: E T 5: E T 6: E E + T 7: E E + T 8: E E + T 9: E E + T 10: T i 11: T i 12: T (E) 13: T ( E) 14: T (E ) 15: T (E) Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E )
64
Example: LR(0) Items All items can be obtained by placing a dot at every position for every production: Before = reduced matched prefix After = may be reduced May be matched by suffix LR(0) items 1: S E$ 2: S E $ 3: S E $ 4: E T 5: E T 6: E E + T 7: E E + T 8: E E + T 9: E E + T 10: T id 11: T id 12: T (E) 13: T ( E) 14: T (E ) 15: T (E) Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E )
65
LR(0) items N αβ N αβ States are sets of items Shift Item
Reduce Item States are sets of items
66
LR(0) Items E → E * B | E + B | B B → 0 | 1
A derivation rule with a location marker (●) is called LR(0) item
67
PDA States E → E * B | E + B | B B → 0 | 1
A PDA state is a set of LR(0) items. E.g., q13 = { E → E ● * B , E → E ● + B, B → 1 ●} Intuitively, if we matched 1, Then the state will remember the 3 possible alternatives rules and where we are in each of them (1) E → E ● * B (2) E → E ● + B (3) B → 1 ●
68
LR(0) Shift/Reduce Items
N αβ Shift Item N αβ Reduce Item
69
Intuition Read input tokens left-to-right and remember them in the stack When a right hand side of a rule is found, remove it from the stack and replace it with the non-terminal it derives Remembering token is called shift Each shift moves to a state that remembers what we’ve seen so far Replacing RHS with LHS is called reduce Each reduce goes to a state that determines the context of the derivation
70
Model of an LR parser Input Stack LR Parser Output $ id + T 2 + 7 id 5
state T 2 + 7 id 5 Output symbol Terminals and Non-terminals Goto Table Action
71
LR parser stack Sequence made of state, symbol pairs
For instance a possible stack for the grammar S E $ E T E E + T T id T ( E ) could be: 0 T id 5 Stack grows this way
72
Form of LR parsing table
state terminals non-terminals Shift/Reduce actions Goto part 1 . . . acc gm rk sn error shift state n reduce by rule k goto state m accept
73
LR parser table example
goto action STATE T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9
74
Shift move Input Stack action[q, a] = sn LR Parsing program $ … a q
. . . goto action action[q, a] = sn
75
Result of shift Input Stack action[q, a] = sn LR Parsing program $ … a
. . . goto action action[q, a] = sn
76
Reduce move Input Stack 2*n action[qn, a] = rk
$ … a Stack LR Parsing program qn σn … q1 σ1 q 2*n goto action action[qn, a] = rk Production: (k) A σ1… σn Top of stack looks like q1σ1…qnσn for some q1… qn goto[q, A] = qm
77
Result of reduce move Input Stack action[qn, a] = rk
$ … a Stack LR Parsing program qm A q … goto action action[qn, a] = rk Production: (k) A σ1… σn Top of stack looks like q1σ1…qnσn for some q1… qn goto[q, A] = qm
78
Accept move If action[q, a] = accept parsing completed Input Stack
$ a … Stack LR Parsing program q . . . goto action If action[q, a] = accept parsing completed
79
Error move If action[q, a] = error (usually empty)
Input $ … a Stack LR Parsing program q . . . goto action If action[q, a] = error (usually empty) parsing discovered a syntactic error
80
Example Z E $ E T | E + T T i | ( E )
81
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) i + i $ Z E $ Why do we need these additional LR items? Where do they come from? What do they mean? E T E E + T T i T ( E )
82
-closure Given a set S of LR(0) items If P αNβ is in state S
then for each rule N in the grammar state S must also contain N -closure({Z E $}) = { Z E $, E T, Z E $ E T | E + T T i | ( E ) E E + T, T i , T ( E ) }
83
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) i + i $ Remember position from which we’re trying to reduce Items denote possible future handles Z E $ E T E E + T T i T ( E )
84
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) i + i $ Match items with current token Z E $ T i Reduce item! E T E E + T T i T ( E )
85
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) T + i $ i Z E $ Reduce item! E T E T E E + T T i T ( E )
86
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E + i $ T i Z E $ Reduce item! E T E T E E + T T i T ( E )
87
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E + i $ T i Z E $ Z E$ E T E E + T E E+ T T i T ( E )
88
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E + i $ T i Z E $ Z E$ E E+T E T T i E E + T E E+ T T ( E ) T i T ( E )
89
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E + T $ i T i Z E $ Z E$ E E+T E T T i E E + T E E+ T T ( E ) T i T ( E )
90
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E + T $ T i i Reduce item! Z E $ Z E$ E E+T E E+T E T T i E E + T E E+ T T ( E ) T i T ( E )
91
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E $ E + T T i i Z E $ Z E$ E T E E + T E E+ T T i T ( E )
92
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) E $ E + T T i Reduce item! i Z E $ Z E$ Z E$ E T E E + T E E+ T T i T ( E )
93
Example: parsing with LR items
Z E $ E T | E + T T i | ( E ) Z E $ E + T Reduce item! T i Z E $ i Z E$ Z E$ E T E E + T E E+ T T i T ( E )
94
GOTO/ACTION tables ACTION Table GOTO Table empty – error move State i
+ ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 ZE$ q4 Shift EE+T Ti ET q8 q9 TE
95
LR(0) parser tables Two types of rows:
Shift row – tells which state to GOTO for current token Reduce row – tells which rule to reduce (independent of current token) GOTO entries are blank
96
LR parser data structures
Input – remainder of text to be processed Stack – sequence of pairs N, qi N – symbol (terminal or non-terminal) qi – state at which decisions are made Initial stack contains q0 Input suffix + i $ stack q0 i q5 Stack grows this way
97
LR(0) pushdown automaton
Two moves: shift and reduce Shift move Remove first token from input Push it on the stack Compute next state based on GOTO table Push new state on the stack If new state is error – report error shift input i + i $ input + i $ stack q0 stack q0 i q5 Stack grows this way State i + ( ) $ E T action q0 q5 q7 q1 q6 shift
98
LR(0) pushdown automaton
Reduce move Using a rule N α Symbols in α and their following states are removed from stack New state computed based on GOTO table (using top of stack, before pushing N) N is pushed on the stack New state pushed on top of N Reduce T i input + i $ input + i $ stack q0 i q5 stack q0 T q6 Stack grows this way State i + ( ) $ E T action q0 q5 q7 q1 q6 shift
99
GOTO/ACTION table Warning: numbers mean different things!
State i + ( ) $ E T q0 s5 s7 s1 s6 q1 s3 s2 q2 r1 q3 s4 q4 r3 q5 r4 q6 r2 q7 s8 q8 s9 q9 r5 Z E $ E T E E + T T i T ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m
100
Parsing id+id$ Initialize with state 0 Stack Input Action id + id $ s5
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 Initialize with state 0 rn = reduce using rule number n sm = shift to state m
101
Parsing id+id$ Initialize with state 0 Stack Input Action id + id $ s5
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 Initialize with state 0 rn = reduce using rule number n sm = shift to state m
102
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
103
Parsing id+id$ pop id 5 Stack Input Action id + id $ s5 0 id 5 + id $
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 pop id 5 rn = reduce using rule number n sm = shift to state m
104
Parsing id+id$ push T 6 Stack Input Action id + id $ s5 0 id 5 + id $
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 push T 6 rn = reduce using rule number n sm = shift to state m
105
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6 r2 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
106
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6 r2 0 E 1 s3 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
107
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6 r2 0 E 1 s3 0 E 1 + 3 id $ goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
108
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6 r2 0 E 1 s3 0 E 1 + 3 id $ 0 E id 5 $ goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
109
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6 r2 0 E 1 s3 0 E 1 + 3 id $ 0 E id 5 $ 0 E T 4 r3 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
110
Parsing id+id$ Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Parsing id+id$ Stack grows this way Stack Input Action id + id $ s5 0 id 5 + id $ r4 0 T 6 r2 0 E 1 s3 0 E 1 + 3 id $ 0 E id 5 $ 0 E T 4 r3 s2 goto action S T E $ ) ( + id g6 g1 s7 s5 acc s3 1 2 g4 3 r3 4 r4 5 r2 6 g8 7 s9 8 r5 9 rn = reduce using rule number n sm = shift to state m
111
LR(0) automaton example
reduce state shift state q6 E T T q0 T q7 Z E$ E T E E + T T i T (E) T (E) E T E E + T T i T (E) read input “(“ ( q5 i T i i Managed to reduce E E ( E ( i q1 q8 Z E$ E E+ T q3 E E+T T i T (E) T (E) E E+T + + $ ) q2 q9 Z E$ T (E) T q4 E E + T
112
States and LR(0) Items E → E * B | E + B | B B → 0 | 1
The state will “remember” the potential derivation rules given the part that was already identified For example, if we have already identified E then the state will remember the two alternatives: (1) E → E * B, (2) E → E + B Actually, we will also remember where we are in each of them: (1) E → E ● * B, (2) E → E ● + B A derivation rule with a location marker is called LR(0) item. The state is actually a set of LR(0) items. E.g., q13 = { E → E ● * B , E → E ● + B}
113
GOTO/ACTION tables GOTO Table empty = error move ACTION Table State i
+ ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 Z E$ q4 Shift E E+T T i E T q8 q9 T E
114
LR(0) parser tables Actions determined by topmost state
Two types of rows: Shift row – tells which state to GOTO for current token Reduce row – tells which rule to reduce (independent of current token) GOTO entries are blank
115
GOTO/ACTION tables GOTO Table empty = error move ACTION Table State i
+ ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 Z E$ q4 Shift E E+T T i E T q8 q9 T E
116
GOTO/ACTION table Warning: numbers mean different things!
State i + ( ) $ E T q0 s5 s7 s1 s6 q1 s3 s2 q2 r1 q3 s4 q4 r3 q5 r4 q6 r2 q7 s8 q8 s9 q9 r5 action shift Z E$ Shift E E+T T i E T T E Z E $ E T E E + T T i T ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m
117
LR(0) Parsing of i + i Warning: numbers mean different things!
State i + ( ) $ E T q0 s5 s7 s1 s6 q1 s3 s2 q2 r1 q3 s4 q4 r3 q5 r4 q6 r2 q7 s8 q8 s9 q9 r5 Stack q0 q0 i q5 q0 T q6 q0 E q1 q0 E q1 + q3 q0 E q1 + q3 i q5 q0 E q1 + q3 T q4 q0 E q1 $ q2 q0 $ input i + i $ + i $ i $ $ Action shift reduce 4 reduce 2 reduce 3 reduce 1 accept Z E $ E T E E + T T i T ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m
118
Constructing an LR parsing table
Construct a (determinized) transition diagram from LR items If there are conflicts – stop Fill table entries from diagram
119
N αβ LR item Already matched To be matched Input
Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β
120
Types of LR(0) items N αβ Shift Item N αβ Reduce Item
121
LR(0) automaton example
reduce state shift state q6 E T T q0 T q7 Z E$ E T E E + T T i T (E) T (E) E T E E + T T i T (E) ( q5 i T i i E ( E ( i q1 q8 Z E$ E E+ T q3 E E+T T i T (E) T (E) E E+T + + $ ) q2 q9 Z E$ T (E) T q4 E E + T
122
Computing item sets Initial set
Z is in the start symbol -closure({ Z α | Z α is in the grammar } ) Next set from a set S and the next symbol X step(S,X) = { N αXβ | N αXβ in the item set S} nextSet(S,X) = -closure(step(S,X))
123
Operations for transition diagram construction
Initial = {S’ S$} For an item set I Closure(I) = Closure(I) ∪ {X µ is in grammar| N αXβ in I} Goto(I, X) = { N αXβ | N αXβ in I}
124
Initial example Initial = {S E $} Grammar (1) S E $ (2) E T
(4) T id (5) T ( E ) Initial = {S E $}
125
Closure example Initial = {S E $}
Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Initial = {S E $} Closure({S E $}) = { S E $ E T E E + T T id T ( E ) }
126
Goto example Initial = {S E $}
Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Initial = {S E $} Closure({S E $}) = { S E $ E T E E + T T id T ( E ) } Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}
127
Constructing the transition diagram
Start with state 0 containing item Closure({S E $}) Repeat until no new states are discovered For every state p containing item set Ip, and symbol N, compute state q containing item set Iq = Closure(goto(Ip, N))
128
LR(0) automaton example
reduce state shift state q6 E T T q0 T q7 Z E$ E T E E + T T i T (E) T (E) E T E E + T T i T (E) ( q5 i T i i E ( E ( i q1 q8 Z E$ E E+ T q3 E E+T T i T (E) T (E) E E+T + + $ ) q2 q9 Z E$ T (E) T q4 E E + T
129
Automaton construction example
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q0 S E$ Initialize
130
Automaton construction example
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q0 S E$ E T E E + T T i T (E) apply Closure
131
Automaton construction example
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) E T q6 T q0 S E$ E T E E + T T i T (E) T (E) E T E E + T T i T (E) ( T i q5 i S E$ E E+ T q1 E
132
Automaton construction example
(1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q6 E T q0 T q7 T S E$ E T E E + T T i T (E) T (E) E T E E + T T i T (E) non-terminal transition corresponds to goto action in parse table ( q5 i T i i E ( E ( i q1 q8 q3 Z E$ E E+ T terminal transition corresponds to shift action in parse table E E+T T i T (E) T (E) E E+T + + $ ) q2 q9 S E$ T (E) T q4 E E + T a single reduce item corresponds to reduce action
133
Are we done? Can make a transition diagram for any grammar
Can make a GOTO table for every grammar Cannot make a deterministic ACTION table for every grammar
134
LR(0) conflicts … T q0 Z E$ E T E E + T T i T (E) ( …
T i[E] ( … q5 i T i T i[E] E Shift/reduce conflict … Z E $ E T E E + T T i T ( E ) T i[E]
135
LR(0) conflicts … T q0 Z E$ E T E E + T T i T (E) ( …
T i[E] ( … q5 i T i V i E reduce/reduce conflict … Z E $ E T E E + T T i V i T ( E )
136
LR(0) conflicts Any grammar with an -rule cannot be LR(0)
Inherent shift/reduce conflict A – reduce item P αAβ – shift item A can always be predicted from P αAβ
137
Conflicts Can construct a diagram for every grammar but some may introduce conflicts shift-reduce conflict: an item set contains at least one shift item and one reduce item reduce-reduce conflict: an item set contains two reduce items
138
LR variants LR(0) – what we’ve seen so far SLR(0) LR(1) LALR(0)
Removes infeasible reduce actions via FOLLOW set reasoning LR(1) LR(0) with one lookahead token in items LALR(0) LR(1) with merging of states with same LR(0) component
139
LR (0) GOTO/ACTIONS tables
GOTO table is indexed by state and a grammar symbol from the stack ACTION Table GOTO Table State i + ( ) $ E T action q0 q5 q7 q1 q6 shift q3 q2 Z E$ q4 Shift E E+T T i E T q8 q9 T E ACTION table determined only by state, ignores input
140
SLR parsing A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N A reduce item N α is applicable only when the lookahead is in FOLLOW(N) If b is not in FOLLOW(N) we proved there is no derivation S * βNb. Thus, it is safe to remove the reduce item from the conflicted state Differs from LR(0) only on the ACTION table Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take
141
Lookahead token from the input
SLR action table Lookahead token from the input State i + ( ) [ ] $ shift 1 accept 2 3 4 E E+T 5 T i 6 E T 7 8 9 T (E) state action q0 shift q1 q2 q3 q4 E E+T q5 T i q6 E T q7 q8 q9 T E vs. SLR – use 1 token look-ahead LR(0) – no look-ahead … as before… T i T i[E]
142
LR(1) grammars In SLR: a reduce item N α is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of follows computed per item
143
LR(1) items LR(1) item is a pair Meaning Example LR(0) item
Lookahead token Meaning We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example The production L id yields the following LR(1) items LR(1) items [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L LR(0) items [L → ● id] [L → id ●]
144
LR(1) items LR(1) item is a pair Meaning Example
Lookahead token Meaning We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example The production L id yields the following LR(1) items Reduce only if the the expected lookhead matches the input [L → id ●, =] will be used only if the next input token is =
145
LALR(1) LR(1) tables have huge number of entries
Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice
146
Summary
147
LR is More Powerful than LL
Any LL(k) language is also in LR(k), i.e., LL(k) ⊂ LR(k). LR is more popular in automatic tools But less intuitive Also, the lookahead is counted differently in the two cases In an LL(k) derivation the algorithm sees the left-hand side of the rule + k input tokens and then must select the derivation rule In LR(k), the algorithm “sees” all right-hand side of the derivation rule + k input tokens and then reduces LR(0) sees the entire right-side, but no input token
148
Using tools to parse + create AST
terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; nonterminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; Precedence left UMINUS; %% expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 %prec UMINUS {: RESULT = new Integer(0 - e1.intValue(); :} | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :}
149
Grammar Hierarchy Non-ambiguous CFG LR(1) LALR(1) LL(1) SLR(1) LR(0)
150
Earley Parsing Jay Earley, PhD
151
Earley Parsing Invented by Jay Earley [PhD. 1968]
Handles arbitrary context free grammars Can handle ambiguous grammars Complexity O(N3) when N = |input| Uses dynamic programming Compactly encodes ambiguity
152
Dynamic programming Break a problem P into subproblems P1…Pk
Solve P by combining solutions for P1…Pk Memoize (store) solutions to subproblems instead of re-computation Bellman-Ford shortest path algorithm Sol(x,y,i) = minimum of Sol(x,y,i-1) Sol(t,y,i-1) + weight(x,t) for edges (x,t)
153
Earley Parsing Dynamic programming implementation of a recursive descent parser S[N+1] Sequence of sets of “Earley states” N = |INPUT| Earley state (item) s is a sentential form + aux info S[i] All parse tree that can be produced (by a RDP) after reading the first i tokens S[i+1] built using S[0] … S[i]
154
Earley Parsing Parse arbitrary grammars in O(|input|3)
O(|input|2) for unambigous grammer Linear for most LR(k) langaues Dynamic programming implementation of a recursive descent parser S[N+1] Sequence of sets of “Earley states” N = |INPUT| Earley states is a sentential form + aux info S[i] All parse tree that can be produced (by an RDP) after reading the first i tokens S[i+1] built using S[0] … S[i]
155
Earley States s = < constituent, back >
constituent (dotted rule) for Aαβ A•αβ predicated constituents Aα•β in-progress constituents Aαβ• completed constituents back previous Early state in derivation
156
Earley States s = < constituent, back >
constituent (dotted rule) for Aαβ A•αβ predicated constituents Aα•β in-progress constituents Aαβ• completed constituents back previous Early state in derivation
157
Earley Parser Input = x[1…N] S[0] = <E’ •E, 0>; S[1] = … S[N] = {} for i = N do until S[i] does not change do foreach s ∈ S[i] if s = <A…•a…, b> and a=x[i+1] then // scan S[i+1] = S[i+1] ∪ {<A…a•…, b> } if s = <A … •X …, b> and Xα then // predict S[i] = S[i] ∪ {<X•α, i > } if s = < A … • , b> and <X…•A…, k> ∈ S[b] then // complete S[i] = S[i] ∪{<X…A•…, k> }
158
Example if s = <A…•a…, b> and a=x[i+1] then // scan
S[i+1] = S[i+1] ∪ {<A…a•…, b> } if s = <A … •X …, b> and Xα then // predict S[i] = S[i] ∪ {<X•α, i > } if s = < A … • , b> and <X…•A…, k> ∈ S[b] then // complete S[i] = S[i] ∪{<X…A•…, k> }
159
Earley Parsing Jay Earley, PhD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.