LR(k) grammars The Chinese University of Hong Kong Fall 2008 CSC 3130: Automata theory and formal languages LR(k) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
LR(0) example from last time 4 A aA•b a A b 2 A a•Ab A a•b A •aAb A •ab 5 1 A aAb• A •aAb A •ab a b 3 A ab• A aAb | ab
LR(0) parsing example revisited Stack Input S A •aAb A •ab A a•Ab A a•b A •ab A aA•b A aAb• A ab• a b A 1 2 3 4 5 1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A aabb abb bb b 1 2 3 4 5 S R A • A a b • • • • a b • • A aAb | ab A aAb aabb
Meaning of LR(0) items eNFA transitions to: X •g A aX•b A a•Xb A undiscovered part shift focus to subtree rooted at X (if X is nonterminal) a • X b focus A aX•b A a•Xb move past subtree rooted at X
Outline of LR(0) parsing algorithm Algorithm can perform two actions: What if: no complete item is valid there is one valid item, and it is complete shift (S) reduce (R) some valid items complete, some not more than one valid complete item S / R conflict R / R conflict
Definition of LR(0) grammar A grammar is LR(0) if S/R, R/R conflicts never occur LR means parsing happens left to right and produces a rightmost derivation LR(0) grammars are unambiguous and have a fast parsing algorithm Unfortunately, they are not “expressive” enough to describe programming languages
Hierarchy of context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm
A grammar that is not LR(0) S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) input: a
A grammar that is not LR(0) S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) input: a possibilities: shift (3), reduce (4) reduce (5), shift (6) S S S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a A A B A A A S/R, R/R conflicts! a • a a a • a a • c
Lookahead input: valid LR(0) items: S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) input: a peek inside! S S S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a A A B A A A a • a a a • a a • c
parse tree must look like this Lookahead S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) input: a peek inside! a A a S • … valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a action: shift parse tree must look like this
parse tree must look like this Lookahead S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) input: a a peek inside! a … A a S • valid LR(0) items: A a•A, A a• A •aA, A •a action: shift parse tree must look like this
parse tree must look like this Lookahead S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) input: a a a A a S • valid LR(0) items: A a•A, A a• A •aA, A •a action: reduce parse tree must look like this
LR(0) items vs. LR(1) items A a b • LR(1) A a b • A a•Ab [A a•Ab, b] A aAb | ab
LR(1) items LR(1) items are of the form to represent this state in the parsing [A a•b, x] or [A a•b, e] A A a • b x a • b
Outline of LR(1) parsing algorithm Step 1: Build eNFA that describes valid item updates Step 2: Convert eNFA to DFA As in LR(0), DFA will have shift and reduce states Step 3: Run DFA on input, using stack to remember sequence of states Use lookahead to eliminate wrong reduce items
Recall eNFA transitions for LR(0) States of eNFA will be items (plus a start state q0) For every item S •a we have a transition For every item A •X we have a transition For every item A a•Cb and production C •d e q0 S •a X A •X A X• e A •C C •d
eNFA transitions for LR(1) For every item [S •a, e] we have a transition For every item A •X we have a transition For every item [A a•Cb, x] and production C d for every y in FIRST(bx) e q0 [S •a, e] X [A •X, x] [A X•, x] e [A •C, x] [C •d, y]
FIRST sets FIRST(a) is the set of terminals that occur on the left in some derivation starting from a Example FIRST(a) = {a} FIRST(A) = {a} FIRST(S) = {a, c} FIRST(bAc) = {b} FIRST(BA) = {a} FIRST(e) = ∅ S A(1) | cB(2) A aA(3) | a(4) B a(5) | ab(6)
Explaining the transitions • X b x a X • b x X [A •X, x] [A X•, x] C b A y a • C b x • d e [A •C, x] [C •d, y] y ∈ FIRST(bx)
Example . . . S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) [S A•, e] A [A •aA, e] e [S •A, e] [A •a, e] e e . . . q0 [S B•c, e] e B e [S •Bc, e] [B •a, c] e [B •ab, c]
Convert NFA to DFA Each DFA state is a subset of LR(1) items, e.g. States can contain S/R, R/R conflicts But lookahead can always resolve such conflicts [A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ]
Example look ahead! S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6) stack input valid items a ab B Bc S abc bc c [S •A, ] [S •Bc, ] [A •aA, ] [A •a, ] [B •a, c] [B •ab, c] S S R S [A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ] [B ab•, c] [S B•c, ] [S Bc•, ]
LR(k) grammars A context-free grammar is LR(1) if all S/R, R/R conflicts can be resolved with one lookahead More generally, LR(k) grammars can resolve all conflicts with k lookahead symbols Items have the form [A •, x1...xk] LR(1) grammars describe the semantics of most programming languages