LR(k) grammars The Chinese University of Hong Kong Fall 2008

Slides:

Advertisements

Similar presentations

Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.

Advertisements

1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.

CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.

Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Nondeterminism.

CS 536 Spring Bottom-Up Parsing: Algorithms, part 1 LR(0), SLR Lecture 12.

CSC 361Finite Automata1. CSC 361Finite Automata2 Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers.

Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA to regular.

LR(k) Parsing CPSC 388 Ellen Walker Hiram College.

CS 321 Programming Languages and Compilers Bottom Up Parsing.

Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.

Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Closure.

Chapter 5: Bottom-Up Parsing (Shift-Reduce)

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.

CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Limitations.

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars.

Transparency No. 1 Formal Language and Automata Theory Homework 5.

Bottom-Up Parsing Algorithms LR(k) parsing L: scan input Left to right R: produce Rightmost derivation k tokens of lookahead LR(0) zero tokens of look-ahead.

Mid-Terms Exam Scope and Introduction. Format Grades: 100 points -> 20% in the final grade Multiple Choice Questions –8 questions, 7 points each Short.

Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.

CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.

2016/7/9Page 1 Lecture 11: Semester Review COMP3100 Dept. Computer Science and Technology United International College.

Announcements/Reading

Nondeterminism The Chinese University of Hong Kong Fall 2011

CONTEXT-FREE LANGUAGES

Formal Language & Automata Theory

Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.

Programming Languages Translator

LR(k) grammars The Chinese University of Hong Kong Fall 2009

50/50 rule You need to get 50% from tests, AND

Bottom-Up Parsing.

Ambiguity Parsing algorithms

Syntax Specification and Analysis

Table-driven parsing Parsing performed by a finite state machine.

Compiler Construction

CS314 – Section 5 Recitation 3

COP4620 – Programming Language Translators Dr. Manuel E. Bermudez

Bottom-Up Syntax Analysis

Syntax Analysis Part II

Parsing Techniques.

Context-Free Languages

LR(0) grammars The Chinese University of Hong Kong Fall 2010

Syntax Analysis source program lexical analyzer tokens syntax analyzer

Pushdown automata and CFG ↔ PDA conversions

A New Look at LR(k) Bill McKeeman, MathWorks Fellow for

CSCI 3130: Formal languages and automata theory Tutorial 6

LR(1) grammars The Chinese University of Hong Kong Fall 2010

More on DFA minimization and DFA equivalence

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and

NFAs, DFAs, and regular expressions

Decidable and undecidable languages

Parsers for programming languages

Parsers for programming languages

Theory of Computation Lecture #

CFGs: Formal Definition

Chapter Fifteen: Stack Machine Applications

LR(1) grammars The Chinese University of Hong Kong Fall 2011

Limitations of pushdown automata

Pushdown automata The Chinese University of Hong Kong Fall 2011

Normal forms and parsing

Limitations of context-free languages

COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 7, 10/09/2003 Prof. Roy Levow.

Nondeterminism The Chinese University of Hong Kong Fall 2010

Presentation transcript:

LR(k) grammars The Chinese University of Hong Kong Fall 2008 CSC 3130: Automata theory and formal languages LR(k) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

LR(0) example from last time 4 A  aA•b a A b 2 A  a•Ab A  a•b A  •aAb A  •ab 5 1 A  aAb• A  •aAb A •ab a b 3 A  ab• A  aAb | ab

LR(0) parsing example revisited Stack Input S A  •aAb A •ab A  a•Ab A  a•b A  •ab A  aA•b A  aAb• A  ab• a b A 1 2 3 4 5 1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A aabb abb bb b  1 2 3 4 5 S R A • A a b • • • • a b • • A  aAb | ab A  aAb  aabb

Meaning of LR(0) items eNFA transitions to: X  •g A  aX•b A  a•Xb A undiscovered part shift focus to subtree rooted at X (if X is nonterminal) a • X b focus A  aX•b A  a•Xb move past subtree rooted at X

Outline of LR(0) parsing algorithm Algorithm can perform two actions: What if: no complete item is valid there is one valid item, and it is complete shift (S) reduce (R) some valid items complete, some not more than one valid complete item S / R conflict R / R conflict

Definition of LR(0) grammar A grammar is LR(0) if S/R, R/R conflicts never occur LR means parsing happens left to right and produces a rightmost derivation LR(0) grammars are unambiguous and have a fast parsing algorithm Unfortunately, they are not “expressive” enough to describe programming languages

Hierarchy of context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm

A grammar that is not LR(0) S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a

A grammar that is not LR(0) S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a possibilities: shift (3), reduce (4) reduce (5), shift (6) S S S valid LR(0) items: A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a A A B A A A S/R, R/R conflicts! a • a a a • a a • c

Lookahead input: valid LR(0) items: S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a peek inside! S S S valid LR(0) items: A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a A A B A A A a • a a a • a a • c

parse tree must look like this Lookahead S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a peek inside! a A a S • … valid LR(0) items: A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a action: shift parse tree must look like this

parse tree must look like this Lookahead S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a a peek inside! a … A a S • valid LR(0) items: A  a•A, A  a• A  •aA, A  •a action: shift parse tree must look like this

parse tree must look like this Lookahead S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) input: a a a A a S • valid LR(0) items: A  a•A, A  a• A  •aA, A  •a action: reduce parse tree must look like this

LR(0) items vs. LR(1) items A a b • LR(1) A a b • A  a•Ab [A  a•Ab, b] A  aAb | ab

LR(1) items LR(1) items are of the form to represent this state in the parsing [A  a•b, x] or [A  a•b, e] A A a • b x a • b

Outline of LR(1) parsing algorithm Step 1: Build eNFA that describes valid item updates Step 2: Convert eNFA to DFA As in LR(0), DFA will have shift and reduce states Step 3: Run DFA on input, using stack to remember sequence of states Use lookahead to eliminate wrong reduce items

Recall eNFA transitions for LR(0) States of eNFA will be items (plus a start state q0) For every item S  •a we have a transition For every item A  •X we have a transition For every item A  a•Cb and production C  •d e q0 S  •a X A  •X A  X• e A  •C C  •d

eNFA transitions for LR(1) For every item [S  •a, e] we have a transition For every item A  •X we have a transition For every item [A  a•Cb, x] and production C  d for every y in FIRST(bx) e q0 [S  •a, e] X [A  •X, x] [A  X•, x] e [A  •C, x] [C  •d, y]

FIRST sets FIRST(a) is the set of terminals that occur on the left in some derivation starting from a Example FIRST(a) = {a} FIRST(A) = {a} FIRST(S) = {a, c} FIRST(bAc) = {b} FIRST(BA) = {a} FIRST(e) = ∅ S  A(1) | cB(2) A  aA(3) | a(4) B  a(5) | ab(6)

Explaining the transitions • X b x a X • b x X [A  •X, x] [A  X•, x] C b A y a • C b x • d e [A  •C, x] [C  •d, y] y ∈ FIRST(bx)

Example . . . S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) [S  A•, e] A [A  •aA, e] e [S  •A, e] [A  •a, e] e e . . . q0 [S  B•c, e] e B e [S  •Bc, e] [B  •a, c] e [B  •ab, c]

Convert NFA to DFA Each DFA state is a subset of LR(1) items, e.g. States can contain S/R, R/R conflicts But lookahead can always resolve such conflicts [A  a•A, ] [A  a•, ] [B  a•, c] [B  a•b, c] [A  •aA, ] [A  •a, ]

Example look ahead! S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) stack input valid items  a ab B Bc S abc bc c  [S  •A, ] [S  •Bc, ] [A  •aA, ] [A  •a, ] [B  •a, c] [B  •ab, c] S S R S [A  a•A, ] [A  a•, ] [B  a•, c] [B  a•b, c] [A  •aA, ] [A  •a, ] [B  ab•, c] [S  B•c, ] [S  Bc•, ]

LR(k) grammars A context-free grammar is LR(1) if all S/R, R/R conflicts can be resolved with one lookahead More generally, LR(k) grammars can resolve all conflicts with k lookahead symbols Items have the form [A  •, x1...xk] LR(1) grammars describe the semantics of most programming languages