CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
CFGs and PDAs Sipser 2 (pages ). Long long ago…
Pushdown Automata Part II: PDAs and CFG Chapter 12.
CFGs and PDAs Sipser 2 (pages ). Last time…
Introduction to the Theory of Computation John Paxton Montana State University Summer 2003.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
CS 310 – Fall 2006 Pacific University CS310 Pushdown Automata Sections: 2.2 page 109 October 9, 2006.
Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Fall 2008.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
CS 490: Automata and Language Theory Daniel Firpo Spring 2003.
Bottom Up Parsing.
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
Topics Automata Theory Grammars and Languages Complexities
MATLAB objects using nested functions MathWorks Compiler Course – Day 2.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
CS 3240: Languages and Computation Pushdown Automata & CF Grammars NOTE: THESE ARE ONLY PARTIAL SLIDES RELATED TO WEEKS 9 AND 10. PLEASE REFER TO THE TEXTBOOK.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Pushdown Automata (PDA) Part 2 While I am distributing graded exams: Design a PDA to accept L = { a m b n : m  n; m, n > 0} MA/CSSE 474 Theory of Computation.
Pushdown Automata.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA to regular.
CONVERTING TO CHOMSKY NORMAL FORM
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
CSCI 2670 Introduction to Theory of Computing September 20, 2005.
Pushdown Automata CS 130: Theory of Computation HMU textbook, Chap 6.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Undecidable.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Undecidable.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Theory of Computation Automata Theory Dr. Ayman Srour.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
CSCI 2670 Introduction to Theory of Computing September 16, 2004.
Lecture 11  2004 SDU Lecture7 Pushdown Automaton.
Syntax and Semantics Structure of programming languages.
Nondeterminism The Chinese University of Hong Kong Fall 2011
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
LR(k) grammars The Chinese University of Hong Kong Fall 2009
Bottom-Up Parsing.
Bottom-Up Syntax Analysis
LR(0) grammars The Chinese University of Hong Kong Fall 2010
Pushdown automata and CFG ↔ PDA conversions
CSCI 3130: Formal languages and automata theory Tutorial 6
LR(1) grammars The Chinese University of Hong Kong Fall 2010
Parsers for programming languages
Parsers for programming languages
LR(1) grammars The Chinese University of Hong Kong Fall 2011
Pushdown automata The Chinese University of Hong Kong Fall 2011
LR(k) grammars The Chinese University of Hong Kong Fall 2008
Automata theory and formal languages COS 3112 – AUTOMATA THEORY PRELIM PERIOD WEEK 1 AND 2.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars Fall 2011

Parsing computer programs First the javac compiler does a lexical analysis: if (n == 0) { return x; } if (ID == INT_LIT) { return ID; } ID = identifier (name of variable, procedure, class,...) INT_LIT = integer literal (value) The alphabet of java CFG consists of symbols like:  = { if, return, (, ) {, }, ;, ==, ID, INT_LIT,...}

Parsing computer programs if (n == 0) { return x; } Statement if ParExpression Statement ( Expression ) ExpressionExpressionRest Infixop Expression Literal Primary Identifier Block { BlockStatements } BlockStatement return Expression ; Statement == INT_LIT ID Primary Identifier ID if (ID == INT_LIT) { return ID; } the parse tree of a java statement

CFG of the java programming language Identifier: ID QualifiedIdentifier: Identifier {. Identifier } Literal: IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteral NullLiteral Expression: Expression1 [AssignmentOperator Expression1]] AssignmentOperator: = += -= *= /= &= |= from /second_edition/html/syntax.doc.html#52996 …

Parsing java programs class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug;// A trick to help with debugging public Point2d (double px, double py) {// Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () {// Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } … Simple java program: about 500 symbols

Parsing algorithms How long would it take to parse this program? Can we parse faster? No! CYK is the fastest known general-purpose parsing algorithm for CFGs try all parse treesabout years CYK algorithmabout 1 week!

Another way of thinking Scientist: Find an algorithm that can parse any CFG Engineer: Design your CFG so it can be parsed very quickly

Parsing left to right S  Tc (1) T  TA (2) | A (3) A  aTb (4) | ab (5) input: abaabbc a a b b A ab A c T T T A S Try to match to the left of

Items An item is a production augmented with a The item is complete if the is the last symbol S  Tc T  TA T  A A  aTb A  ab S  Tc (1) T  A (3) T  TA (2) A  aTb (4) A  ab (5)

Meaning of items S  Tc (1) T  TA (2) | A (3) A  aTb (4) | ab (5) a a b b A ab A c T T Items represent possibilities at various stages of the parsing process A  aTb a a b b abc A  aTb A  ab

Meaning of items S  Tc (1) T  TA (2) | A (3) A  aTb (4) | ab (5) a a b b A ab A c T T When a complete item occurs, a part of the parse tree is discovered A  aTb a a b b abc A  aAb A  ab A a a b b A ab A c T T A  aTb

LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab

LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab

LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab A  aAb

LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab

A  aAb LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab

LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  ab A  aAb A

LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A A  aAb A

valid items registry A  ab two kinds of actions no complete item in registry shiftreduce exactly one complete item valid items registry A  aAb A  ab A  aAbA  ab aabb aabb A

LR(0) implementation: first take stack action valid items A  aAb | ab aabb S  S S R S R a aa aab aA aAb A A  aAb A  ab A  ab A  aAb A A

valid item update rules S   A   a  disappear a, b : terminals A, B, C : variables  : mixed strings notation initial valid items: shift updates: read a A   a  read b A   B  read x disappear reduce updates: A   B  disappear reduce B A   B  B   reduce C common updates:

valid item update rules S   a, b : terminals A, B, C : variables  : mixed strings X : terminal or variable notation initial valid items: shift updates: A   a  read a reduce updates: A   B  reduce B A   B  B   common updates: A  X  X

LR(0) parsing: NFA representation S   q0q0  A   X  X C    A   C  For every item S   For every item A  X  For every pair of items A   C , C   a, b : terminals A, B, C : variables  : mixed strings X : terminal or variable notation

NFA example A  aAb | ab A  aAb A  ab a   A a b b   q0q0 NFA alphabet is  = {a, b, A} start state is q 0 other states are items

NFA to DFA conversion A  aAb A  ab a   A a b b q0q0   A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a q die b, A a, A q die

Shift states and reduce states are shift states are reduce states A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a

LR(0) parsing: second take A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a aabb 1 S a2 S aa2 R aab5 ? How do we know what to reduce to?

remember state in stack! A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a aabb 1 S 1a2 S 1a2a2 R 1a2a2b5 A 1a2A backtrack two steps

remember state in stack! A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a aabb 1 S 1a2 S 1a2a2 R 1a2a2b5 A 1a2A 3 S 1a2A3b 4 R A

PDA for LR(0) parsing A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a aabb 1 S 12 S 122 R 1225 A 12 3 S R A

PDA for LR(0) parsing A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a pop state b, state a take transition A out of state a push state a pop state b, state A, state a take transition A out of state a push state a A A ,  /$ A, $/ 

Example 1 L = {w#w R : w ∈ {a, b}*} A  aAa | bAb | # A  aAa A  bAb A  # a b A b q0q0          # A a

Example 1 A  aAa | bAb |  A  aAa A  bAb A  # 4 # 1 3 A  aAa A  bAb A  # 2 A  aAa 7 5 A # baab A # A A stackstateaction 1S1S 14S14S 143S 1432R 1435S 14357R 146S 1468R 4 A  bAb A  aAa A  bAb A  # b a a b # a b A a 4 A  bAb 8 6 a

LR(0) grammars and deterministic PDAs The PDA for LR(0) parsing is deterministic Some CFLs require non-deterministic PDAs, e.g. What goes wrong when we do LR(0) parsing on L ? L = {ww R : w ∈ {a, b}*}

Example 2 L = {ww R : w ∈ {a, b}*} A  aAa | bAb |  A  aAa A  bAb A  a b A b q0q0          A a

shift-reduce conflict Example 2 L = {ww R : w ∈ {a, b}*} A  aAa | bAb |  input: abba A  aAa A  bAb A  4 A  aAa A  bAb A  A  aAa A A  bAb A  aAa A  bAb A  a b a b A a 4 A  bAb a

Parsing computer programs if (n == 0) { return x; } Statement if ParExpression Statement ( Expression ) ExpressionExpressionRest Infixop Expression Literal Primary Identifier Block { BlockStatements } BlockStatement return Expression ; Statement == INT_LIT ID Primary Identifier ID else { return x + 1; }

Parsing computer programs if (n == 0) { return x; } Statement ( Expression ) Block ID else { return x + 1; } if ParExpression Statement else Statement Block... LR(0) parsers cannot tell apart if... then from if... then... else

When you can’t LR(0) parse LR(0) parser can perform two actions: What if: no complete item is valid shift (S) there is one valid item, and it is complete reduce (R) some valid items complete, some not S / R conflict more than one valid complete item R / R conflict

context-free grammars LR(∞) grammars … Hierarchy of context-free grammars LR(1) grammars LR(0) grammars parse using LR(0) algorithm java perl python … to be continued…