The Cocke-Kasami-Younger Algorithm

Slides:



Advertisements
Similar presentations
Theorem 7.16: Every CFL is a member of P Proof: Let G be a Chomsky normal form grammar for language L. The following O(n 3 ) algorithm decides whether.
Advertisements

Bottom-up Parsing A general style of bottom-up syntax analysis, known as shift-reduce parsing. Two types of bottom-up parsing: Operator-Precedence parsing.
CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Dept. of Computer Science & IT, FUUAST Automata Theory 2 Automata Theory VII.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
Transparency No. P2C4-1 Formal Language and Automata Theory Part II Chapter 4 Parse Trees and Parsing.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
Normal forms for Context-Free Grammars
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
1 Background Information for the Pumping Lemma for Context-Free Languages Definition: Let G = (V, T, P, S) be a CFL. If every production in P is of the.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Chapter 12: Context-Free Languages and Pushdown Automata
نظریه زبان ها و ماشین ها فصل دوم Context-Free Languages دانشگاه صنعتی شریف بهار 88.
CONVERTING TO CHOMSKY NORMAL FORM
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
1 Context-Free Languages Not all languages are regular. L 1 = {a n b n | n  0} is not regular. L 2 = {(), (()), ((())),...} is not regular.  some properties.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Chapter 5 Context-Free Grammars
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
Phrase-structure grammar A phrase-structure grammar is a quadruple G = (V, T, P, S) where V is a finite set of symbols called nonterminals, T is a set.
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
Grammar G = (V N, V T, P, S) –V N : Nonterminal symbols –V T : Terminal symbols V N  V T = , V N ∪ V T = V – P : a finite set of production rules α 
Section 12.4 Context-Free Language Topics
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Lecture # 31 Theory Of Automata By Dr. MM Alam 1.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Exercises on Chomsky Normal Form and CYK parsing
Chomsky Normal Form.
1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Lecture 16 Cocke-Younger-Kasimi Parsing Topics: Closure Properties of Context Free Languages Cocke-Younger-Kasimi Parsing Algorithm June 23, 2015 CSCE.
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
David Rodriguez-Velazquez CS – 6800 Summer I
Chomsky Normal Form CYK Algorithm
Ambiguity Parsing algorithms
7. Properties of Context-Free Languages
CSC312 Automata Theory Grammatical Format Chapter # 13 by Cohen
Theorem 29 Given any PDA, there is another PDA that accepts exactly the same language with the additional property that whenever a path leads to ACCEPT,
Lecture 22 Pumping Lemma for Context Free Languages
Recap Lecture 34 Example of Ambiguous Grammar, Example of Unambiguous Grammer (PALINDROME), Total Language tree with examples (Finite and infinite trees),
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Simplifications of Context-Free Grammars
Lecture 14 Grammars – Parse Trees– Normal Forms
Jaya Krishna, M.Tech, Assistant Professor
Context-Free Languages
Definition: Let G = (V, T, P, S) be a CFL
7. Properties of Context-Free Languages
Decidablity Following are the decidable problems w.r.t. CFG
Properties of Context-Free Languages
Recap lecture 42 Row language, nonterminals defined from summary table, productions defined by rows, rules for defining productions, all possible productions.
Compiler Construction
Normal forms and parsing
Presentation transcript:

The Cocke-Kasami-Younger Algorithm An example of a CFG in CNF An example of bottom-up parsing, for CFG in Chomsky normal form G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b 2 possibilities for first production S S S A B A B A B a abb aa bb aab b S S S Possible splits for the string aabb B B B B B B a abb aa bb aab b

The CKYounger Algorithm Provides an efficient way of generating substring devisions and checking whether each substring can be legally derived Thus if the cell (4,1) contains S, string  L(G) A non terminal will be placed in the cell (i,j) if it can derive i consecutive symbols of the string starting at jth position 2,1 4,1 3,2 3,1 2,3 2,2 1,1 1,4 1,3 1,2 b a If the cell (i,j) contains the nonterminal A1 and the cell (i’,i+j) contains the nonterminal A2 and there is a production A  A1 A2 then the cell (i+i’,j) will contain the nonterminal A

The CKYounger Algorithm Provides an efficient way of generating substring devisions and checking whether each substring can be legally derived G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b A nonterminal will be placed in the cell (i,j) if it can derive i consecutive symbols of the string starting at jth position 2,1 4,1 3,2 3,1 2,3 2,2 1,1 1,4 1,3 1,2 b a

The Cocke-Kasami-Younger Algorithm Relation derivation tree and pyramid S S S A B A B A B a abb aa bb aab b B S A a A S B a A S B a

S S S B B B B B B a abb aa bb aab b B S a B S a B S a

The Cocke-Kasami-Younger Algorithm 5/3/2019 Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b A B,C a Step 1, fill the cell at row 1 Because of A  a Because of B  b, and C  b

The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b B is in cell (2,3) Because of B  BB and B is in cell (1,3) and B is in cell (1,4) A C S,B B,C a Step 2, fill the cell at row 2 C is in cell (2,1) Because of C  AA and A is in cell (1,1) and A is in cell (1,2) A is in cell (2,2) Because of A  AB and A is in cell (1,2) and B is in cell (1,3) S is in cell (2,3) Because of S  BB and B is in cell (1,3) and B is in cell (1,4)

The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b C is in cell (3,1) Because of C  AA and A is in cell (1,1) and A is in cell (2,2) A C A,C S,A,C S,B B,C a Step 3, fill the cell at row 3 ? is in cell (3,1) Because of ?  XY X is in cell (1,1) Y is in cell (2,2) or X is in cell (2,1) Y is in cell (1,3) or A is in cell (3,1) Because of C  CC and C is in cell (2,1) and C is in cell (1,3)

The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Since S is at the top, aabb  L(G) A C A,C S,A,C A,B,C,S S,B B,C a Step 4, fill the cell at row 4 S General rule ? is in cell (i,j) Because of ?  XY X is in cell (m,j) Y is in cell (i-m,j+m) with 1 ≤ m ≤ i-1 Step i A B b C C A A b a a

The CKY algorithm is correct Theorem The CKY algorithm is correct Given a grammar (T, N, P, S) in Chomsky normal form and w = x1 ... xn  T* then A  N is in cell (i,j) of the CKY pyramid if and only if A  xj ... xj+i-1 Proof by induction on the row number Base step i= 1 in row 1 we get the nonterminals from which length 1 substrings of the string to parse can be derived. This is only possible by using productions of type A  a. Thus if A is in cell (1,i), 1 ≤ i ≤ n, then A  xi  P, thus A  xi Induction hypothesis theorem applies for all rows < i, i.e. all substrings of length < i. * *

* * * * * Induction step we first prove  Assume a derivation of a substring of length i, i>1, A  BC  xj ... xj+i-1, then for some m > 0 there must hold that B  xj ... xj+m-1 and C  xj+m ... xj+i-1. Thus by the induction hypothesis if B is in cell (m,j) and C in the cell (i-m, j+m). Since there is a production A  BC, A is in the cell (i,j). We now prove  Assume A is in the cell (i,j), then form A we can derive a string xj ... xj+i-1, with length i > 1, therefore there must be a production of the form A  BC with B,C  N, and for some m, 1 ≤ m ≤ i-1, B is in cell (m,j) and C in the cell (i-m, j+m). By the induction hypothesis we have B  xj ... xj+m-1 and C  xj+m ... xj+i-1. Therefore we can write A  BC  xj ... xj+i-1 and conclude A  xj ... xj+i-1 * * * * * Both cells have a lower row #, so induction hypothesis applies

The complexity of the CKY algorithm The time complexity for wL(G)? Let G = (T, N, P, S) be a CFG in Chomsky normal form, with k = #N. Then using the CKY algorithm, w  L(G) can be decided in time proportional to n3 , where n = |w|. Proof First notice that the number of entries in a cell is at most k. maximum number of productions is k3, I Complexity for row 1 cells For each A  N, we have to check if it can be placed in cell(1,i), i.e. if A derives (in 1 step) the terminal on position i. There are k nonterminals, thus cost per cell is k X 1. There are n row 1 cells, thus total cost for row 1 = kn. Each nonterminal can only occur once in a cell A  BC Cfr. 3

Since k is independent of n II Complexity for cell in a row > 1 The content of a cell is the result of at most n-1 pairings of lower cells. For each paring at most k nonterminals are paired with at most k other nonterminals, and each pairing is checked against at most k3 productions. Thus for each cell : cost ≤ k X k X k3 X 1 X (n-1) = k5 X (n-1) There are (n-1)+ (n-2) + …. + 1 = n(n-1)/2 cells in rows 2 to n, thus total cost for these rows is bounded above by n(n-1)/2 X k5 X (n-1) To conclude : The total cost is bounded above by : kn + n(n-1)/2 X k5 X (n-1) See slide 119 Cfr. 1 and 2 Since k is independent of n the conclusion is O(n3)

See course on compilers for faster algorithms Some remarks Not really of practical use since O(n3) is too slow the grammar must be converted to CNF only tests membership, this is not the complexity for building the derivation tree See course on compilers for faster algorithms Semantics!!!! To think about : CKY and unambiguous grammars.