LR(k) grammars The Chinese University of Hong Kong Fall 2008

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Nondeterminism.
CS 536 Spring Bottom-Up Parsing: Algorithms, part 1 LR(0), SLR Lecture 12.
CSC 361Finite Automata1. CSC 361Finite Automata2 Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA to regular.
LR(k) Parsing CPSC 388 Ellen Walker Hiram College.
CS 321 Programming Languages and Compilers Bottom Up Parsing.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Closure.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Limitations.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Bottom-Up Parsing Algorithms LR(k) parsing L: scan input Left to right R: produce Rightmost derivation k tokens of lookahead LR(0) zero tokens of look-ahead.
Mid-Terms Exam Scope and Introduction. Format Grades: 100 points -> 20% in the final grade Multiple Choice Questions –8 questions, 7 points each Short.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
2016/7/9Page 1 Lecture 11: Semester Review COMP3100 Dept. Computer Science and Technology United International College.
Announcements/Reading
Nondeterminism The Chinese University of Hong Kong Fall 2011
CONTEXT-FREE LANGUAGES
Formal Language & Automata Theory
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
LR(k) grammars The Chinese University of Hong Kong Fall 2009
50/50 rule You need to get 50% from tests, AND
Bottom-Up Parsing.
Ambiguity Parsing algorithms
Syntax Specification and Analysis
Table-driven parsing Parsing performed by a finite state machine.
Compiler Construction
CS314 – Section 5 Recitation 3
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
PARSE TREES.
Bottom-Up Syntax Analysis
Syntax Analysis Part II
Parsing Techniques.
Context-Free Languages
LR(0) grammars The Chinese University of Hong Kong Fall 2010
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Pushdown automata and CFG ↔ PDA conversions
A New Look at LR(k) Bill McKeeman, MathWorks Fellow for
CSCI 3130: Formal languages and automata theory Tutorial 6
LR(1) grammars The Chinese University of Hong Kong Fall 2010
More on DFA minimization and DFA equivalence
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
NFAs, DFAs, and regular expressions
Decidable and undecidable languages
Parsers for programming languages
Parsers for programming languages
Theory of Computation Lecture #
CFGs: Formal Definition
Chapter Fifteen: Stack Machine Applications
LR(1) grammars The Chinese University of Hong Kong Fall 2011
Limitations of pushdown automata
Pushdown automata The Chinese University of Hong Kong Fall 2011
Normal forms and parsing
Limitations of context-free languages
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 7, 10/09/2003 Prof. Roy Levow.
Nondeterminism The Chinese University of Hong Kong Fall 2010
Presentation transcript:

LR(k) grammars The Chinese University of Hong Kong Fall 2008 CSC 3130: Automata theory and formal languages LR(k) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

LR(0) example from last time 4 A  aA•b a A b 2 A  a•Ab A  a•b A  •aAb A  •ab 5 1 A  aAb• A  •aAb A •ab a b 3 A  ab• A  aAb | ab

LR(0) parsing example revisited Stack Input S A  •aAb A •ab A  a•Ab A  a•b A  •ab A  aA•b A  aAb• A  ab• a b A 1 2 3 4 5 1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A aabb abb bb b  1 2 3 4 5 S R A • A a b • • • • a b • • A  aAb | ab A  aAb  aabb

Meaning of LR(0) items eNFA transitions to: X  •g A  aX•b A  a•Xb A undiscovered part shift focus to subtree rooted at X (if X is nonterminal) a • X b focus A  aX•b A  a•Xb move past subtree rooted at X

Outline of LR(0) parsing algorithm Algorithm can perform two actions: What if: no complete item is valid there is one valid item, and it is complete shift (S) reduce (R) some valid items complete, some not more than one valid complete item S / R conflict R / R conflict

Definition of LR(0) grammar A grammar is LR(0) if S/R, R/R conflicts never occur LR means parsing happens left to right and produces a rightmost derivation LR(0) grammars are unambiguous and have a fast parsing algorithm Unfortunately, they are not “expressive” enough to describe programming languages

Hierarchy of context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm

A grammar that is not LR(0) S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a

A grammar that is not LR(0) S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a possibilities: shift (3), reduce (4) reduce (5), shift (6) S S S valid LR(0) items: A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a A A B A A A S/R, R/R conflicts! a • a a a • a a • c

Lookahead input: valid LR(0) items: S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a peek inside! S S S valid LR(0) items: A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a A A B A A A a • a a a • a a • c

parse tree must look like this Lookahead S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a peek inside! a A a S • … valid LR(0) items: A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a action: shift parse tree must look like this

parse tree must look like this Lookahead S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a a peek inside! a … A a S • valid LR(0) items: A  a•A, A  a• A  •aA, A  •a action: shift parse tree must look like this

parse tree must look like this Lookahead S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a a a A a S • valid LR(0) items: A  a•A, A  a• A  •aA, A  •a action: reduce parse tree must look like this

LR(0) items vs. LR(1) items A a b • LR(1) A a b • A  a•Ab [A  a•Ab, b] A  aAb | ab

LR(1) items LR(1) items are of the form to represent this state in the parsing [A  a•b, x] or [A  a•b, e] A A a • b x a • b

Outline of LR(1) parsing algorithm Step 1: Build eNFA that describes valid item updates Step 2: Convert eNFA to DFA As in LR(0), DFA will have shift and reduce states Step 3: Run DFA on input, using stack to remember sequence of states Use lookahead to eliminate wrong reduce items

Recall eNFA transitions for LR(0) States of eNFA will be items (plus a start state q0) For every item S  •a we have a transition For every item A  •X we have a transition For every item A  a•Cb and production C  •d e q0 S  •a X A  •X A  X• e A  •C C  •d

eNFA transitions for LR(1) For every item [S  •a, e] we have a transition For every item A  •X we have a transition For every item [A  a•Cb, x] and production C  d for every y in FIRST(bx) e q0 [S  •a, e] X [A  •X, x] [A  X•, x] e [A  •C, x] [C  •d, y]

FIRST sets FIRST(a) is the set of terminals that occur on the left in some derivation starting from a Example FIRST(a) = {a} FIRST(A) = {a} FIRST(S) = {a, c} FIRST(bAc) = {b} FIRST(BA) = {a} FIRST(e) = ∅ S  A(1) | cB(2) A  aA(3) | a(4) B  a(5) | ab(6)

Explaining the transitions • X b x a X • b x X [A  •X, x] [A  X•, x] C b A y a • C b x • d e [A  •C, x] [C  •d, y] y ∈ FIRST(bx)

Example . . . S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) [S  A•, e] A [A  •aA, e] e [S  •A, e] [A  •a, e] e e . . . q0 [S  B•c, e] e B e [S  •Bc, e] [B  •a, c] e [B  •ab, c]

Convert NFA to DFA Each DFA state is a subset of LR(1) items, e.g. States can contain S/R, R/R conflicts But lookahead can always resolve such conflicts [A  a•A, ] [A  a•, ] [B  a•, c] [B  a•b, c] [A  •aA, ] [A  •a, ]

Example look ahead! S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) stack input valid items  a ab B Bc S abc bc c  [S  •A, ] [S  •Bc, ] [A  •aA, ] [A  •a, ] [B  •a, c] [B  •ab, c] S S R S [A  a•A, ] [A  a•, ] [B  a•, c] [B  a•b, c] [A  •aA, ] [A  •a, ] [B  ab•, c] [S  B•c, ] [S  Bc•, ]

LR(k) grammars A context-free grammar is LR(1) if all S/R, R/R conflicts can be resolved with one lookahead More generally, LR(k) grammars can resolve all conflicts with k lookahead symbols Items have the form [A  •, x1...xk] LR(1) grammars describe the semantics of most programming languages