CYK Algorithm for Parsing General Context-Free Grammars.

Slides:



Advertisements
Similar presentations
Simplifications of Context-Free Grammars
Advertisements

Grammar vs Recursive Descent Parser
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Closure Properties of CFL's
CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser.
Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
Courtesy Costas Buch - RPI1 Simplifications of Context-Free Grammars.
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
Parsing III (Eliminating left recursion, recursive descent parsing)
Costas Busch - RPI1 Context-Free Languages. Costas Busch - RPI2 Regular Languages.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule substitute B equivalent grammar.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 Compilers. 2 Compiler Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Normal forms for Context-Free Grammars
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Fall 2004COMP 3351 Context-Free Languages. Fall 2004COMP 3352 Regular Languages.
Prof. Busch - LSU1 Context-Free Languages. Prof. Busch - LSU2 Regular Languages Context-Free Languages.
Compilation 2007 Context-Free Languages Parsers and Scanners Michael I. Schwartzbach BRICS, University of Aarhus.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Efficiency in Parsing Arbitrary Grammars. Parsing using CYK Algorithm 1) Transform any grammar to Chomsky Form, in this order, to ensure: 1.terminals.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2005.
The CYK Parsing Method Chiyo Hotani Tanya Petrova CL2 Parsing Course 28 November, 2007.
نظریه زبان ها و ماشین ها فصل دوم Context-Free Languages دانشگاه صنعتی شریف بهار 88.
Pushdown Automata (PDA) Intro
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
CYK Algorithm for Parsing General Context-Free Grammars
Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Compiler Construction 2011 CYK Algorithm for Parsing General Context-Free Grammars
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Increasing power of LL(k) parsers.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Computing if a token can follow first(B 1... B p ) = {a  | B 1...B p ...  aw } follow(X) = {a  | S ... ...Xa... } There exists a derivation from.
Context-Free Languages. Regular Languages Context-Free Languages.
Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start.
Costas Busch - LSU1 Parsing. Costas Busch - LSU2 Compiler Program File v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,5.
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
Exercises on Chomsky Normal Form and CYK parsing
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Context-Free Languages
G. Pullaiah College of Engineering and Technology
Ambiguity Parsing algorithms
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
Parsing Costas Busch - LSU.
Kanat Bolazar February 16, 2010
Normal forms and parsing
Answer Questions about Exam2 problems
Presentation transcript:

CYK Algorithm for Parsing General Context-Free Grammars

Why Parse General Grammars Can be difficult or impossible to make grammar unambiguous –thus LL(k) and LR(k) methods cannot work, for such ambiguous grammars Some inputs are more complex than simple programming languages –mathematical formulas: x = y /\ z? (x=y) /\ z x = (y /\ z) –natural language: I saw the man with the telescope. –future programming languages

Ambiguity I saw the man with the telescope. 1) 2)

CYK Parsing Algorithm C: John CockeJohn Cocke and Jacob T. Schwartz (1970). Programming languages and their compilers: Preliminary notes. Technical report, Courant Institute of Mathematical Sciences, New York University.Courant Institute of Mathematical SciencesNew York University Y: Daniel H. Younger (1967). Recognition and parsing of context-free languages in time n 3. Information and Control 10(2): 189–208. K: T. KasamiT. Kasami (1965). An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL , Air Force Cambridge Research Lab, Bedford, MA.Bedford, MA

Two Steps in the Algorithm 1)Transform grammar to normal form called Chomsky Normal Form (Noam Chomsky, mathematical linguist) 2)Parse input using transformed grammar dynamic programming algorithm “a method for solving complex problems by breaking them down into simpler steps. It is applicable to problems exhibiting the properties of overlapping subproblems” (>WP)

Balanced Parentheses Grammar Original grammar G S  “” | ( S ) | S S Modified grammar in Chomsky Normal Form: S  “” | S’ S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) Terminals: ( ) Nonterminals: S S’ N S) N ) N (

Idea How We Obtained the Grammar S  ( S ) S’  N ( N S) | N ( N ) N (  ( N S)  S’ N ) N )  ) Chomsky Normal Form transformation can be done fully mechanically

Dynamic Programming to Parse Input Assume Chomsky Normal Form, 3 types of rules: S  “” | S’ (only for the start non-terminal) N j  t(names for terminals) N i  N j N k (just 2 non-terminals on RHS) Decomposing long input: find all ways to parse substrings of length 1,2,3,… ((()())())(()) NiNi NjNj NkNk

Parsing an Input S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) N(N( N(N( N)N) N(N( N)N) N(N( N)N) N)N) ambiguity (()()())

Parsing an Input S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) N(N( N(N( N)N) N(N( N)N) N(N( N)N) N)N) ambiguity (()()())

(()()()) Algorithm Idea S’  S’ S’ 1 N(N( N(N( N)N) N(N( N)N) N(N( N)N) N)N) w pq – substring from p to q d pq – all non-terminals that could expand to w pq Initially d pp has N w(p,p) key step of the algorithm: if X  Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q-p) w pq – substring from p to q d pq – all non-terminals that could expand to w pq Initially d pp has N w(p,p) key step of the algorithm: if X  Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q-p)

Algorithm INPUT: grammar G in Chomsky normal form word w to parse using G OUTPUT: true iff (w in L(G))true N = |w| varvar d : Array[N][N] for p = 1 to N { d(p)(p) = {X | G contains X->w(p)} for q in {p N} d(p)(q) = {} }for for k = 2 to N // substring length for p = 0 to N-k // initial position for j = 1 to k-1 // length of first halffor val r = p+j-1; val q = p+k-1; for (X::=Y Z) in G if Y in d(p)(r) and Z in d(r+1)(q) d(p)(q) = d(p)(q) union {X} return S in d(0)(N-1)forif return (()()()) What is the running time as a function of grammar size and the size of input? O( ) What is the running time as a function of grammar size and the size of input? O( )

Parsing another Input S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) ()()()() N(N( N)N) N(N( N)N) N(N( N)N) N(N( N)N)

Number of Parse Trees Let w denote word ()()() –it has two parse trees Give a lower bound on number of parse trees of the word w n (n is positive integer) w 5 is the word ()()() ()()() ()()() ()()() ()()() CYK represents all parse trees compactly –can re-run algorithm to extract first parse tree, or enumerate parse trees one by one

Transforming to Chomsky Form Steps: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start nullable symbols) 4.remove single non-terminal productions X::=Y 5.transform productions of arity more than two 6.make terminals occur alone on right-hand side

1) Unproductive non-terminals What is funny about this grammar: stmt ::= identifier := identifier | while (expr) stmt | if (expr) stmt else stmt expr ::= term + term | term – term term ::= factor * factor factor ::= ( expr ) There is no derivation of a sequence of tokens from expr Why? In every step will have at least one expr, term, or factor If it cannot derive sequence of tokens we call it unproductive How to compute them?

1) Unproductive non-terminals Productive symbols are obtained using these two rules (what remains is unproductive) –Terminals are productive –If X::= s 1 s 2 … s n is rule and each s i is productive then X is productive stmt ::= identifier := identifier | while (expr) stmt | if (expr) stmt else stmt expr ::= term + term | term – term term ::= factor * factor factor ::= ( expr ) program ::= stmt | stmt program Delete unproductive symbols. Will the meaning of top-level symbol (program) change?

2) Unreachable non-terminals What is funny about this grammar with starting terminal ‘program’ program ::= stmt | stmt program stmt ::= assignment | whileStmt assignment ::= expr = expr ifStmt ::= if (expr) stmt else stmt whileStmt ::= while (expr) stmt expr ::= identifier No way to reach symbol ‘ifStmt’ from ‘program’ What is the general algorithm?

2) Unreachable non-terminals Reachable terminals are obtained using the following rules (the rest are unreachable) –starting non-terminal is reachable (program) –If X::= s 1 s 2 … s n is rule and X is reachable then each non-terminal among s 1 s 2 … s n is reachable Delete unreachable symbols. Will the meaning of top-level symbol (program) change?

3) Removing Empty Strings Ensure only top-level symbol can be nullable program ::= stmtSeq stmtSeq ::= stmt | stmt ; stmtSeq stmt ::= “” | assignment | whileStmt | blockStmt blockStmt ::= { stmtSeq } assignment ::= expr = expr whileStmt ::= while (expr) stmt expr ::= identifier How to do it in this example?

3) Removing Empty Strings - Result program ::= “” | stmtSeq stmtSeq ::= stmt| stmt ; stmtSeq | | ; stmtSeq | stmt ; | ; stmt ::= assignment | whileStmt | blockStmt blockStmt ::= { stmtSeq } | { } assignment ::= expr = expr whileStmt ::= while (expr) stmt whileStmt ::= while (expr) expr ::= identifier

3) Removing Empty Strings - Algorithm Compute the set of nullable non-terminals Add extra rules –If X::= s 1 s 2 … s n is rule then add new rules of form X::= r 1 r 2 … r n where r i is either s i or, if Remove all empty right-hand sides If starting symbol S was nullable, then introduce a new start symbol S’ instead, and add rule S’ ::= S | “” s i is nullable then r i can also be the empty string (so it disappears)

3) Removing Empty Strings Since stmtSeq is nullable, the rule blockStmt ::= { stmtSeq } gives blockStmt ::= { stmtSeq } | { } Since stmtSeq and stmt are nullable, the rule stmtSeq ::= stmt | stmt ; stmtSeq gives stmtSeq ::= stmt | stmt ; stmtSeq | ; stmtSeq | stmt ; | ;

4) Eliminating single productions Single production is of the form X ::=Y where X,Y are non-terminals program ::= stmtSeq stmtSeq ::= stmt | stmt ; stmtSeq stmt ::= assignment | whileStmt assignment ::= expr = expr whileStmt ::= while (expr) stmt

4) Eliminate single productions - Result Generalizes removal of epsilon transitions from non-deterministic automata program ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmtSeq ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmt ::= expr = expr | while (expr) stmt assignment ::= expr = expr whileStmt ::= while (expr) stmt

4) “Single Production Terminator” If there is single production X ::=Y put an edge (X,Y) into graph If there is a path from X to Z in the graph, and there is rule Z ::= s 1 s 2 … s n then add rule program ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmtSeq ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmt ::= expr = expr | while (expr) stmt X ::= s 1 s 2 … s n At the end, remove all single productions.

5) No more than 2 symbols on RHS stmt ::= while (expr) stmt becomes stmt ::= while stmt 1 stmt 1 ::= ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= ) stmt

6) A non-terminal for each terminal stmt ::= while (expr) stmt becomes stmt ::= N while stmt 1 stmt 1 ::= N ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= N ) stmt N while ::= while N ( ::= ( N ) ::= )

Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start nullable symbols) 4.remove single non-terminal productions X::=Y 5.transform productions of arity more than two 6.make terminals occur alone on right-hand side Have only rules X ::= Y Z, X ::= t, and possibly S ::= “” Apply CYK dynamic programming algorithm