Chomsky Normal Form CYK Algorithm

Slides:



Advertisements
Similar presentations
Hector Miguel Chavez Western Michigan University.
Advertisements

Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Closure Properties of CFL's
CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
Normal forms for Context-Free Grammars
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
How to Convert a Context-Free Grammar to Greibach Normal Form
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Homework #7 Solutions. #1. Use the pumping lemma for CFL’s to show L = {a i b j a i b j | i, j > 0} is not a CFL. Proof by contradiction using the Pumping.
Chapter 12: Context-Free Languages and Pushdown Automata
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
The CYK Parsing Method Chiyo Hotani Tanya Petrova CL2 Parsing Course 28 November, 2007.
CONVERTING TO CHOMSKY NORMAL FORM
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Context-Free Grammars – Chomsky Normal Form Lecture 16 Section 2.1 Wed, Sep 26, 2007.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Lecture 19 Naveen Z Quazilbash. Overview CNFs-Assignment Greibach Normal Forms.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
Section 12.4 Context-Free Language Topics
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
CSCI 3130: Formal languages and automata theory Tutorial 4 Chin.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Exercises on Chomsky Normal Form and CYK parsing
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
David Rodriguez-Velazquez CS – 6800 Summer I
Context-Free Grammars: an overview
Properties of Context-Free Languages
Complexity and Computability Theory I
7. Properties of Context-Free Languages
CSC312 Automata Theory Grammatical Format Chapter # 13 by Cohen
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
CSCI 2670 Introduction to Theory of Computing
Jaya Krishna, M.Tech, Assistant Professor
More on Context Free Grammars
Syntax Analysis Sections :.
Regular Grammars.
7. Properties of Context-Free Languages
Chapter 6 Simplification of Context-free Grammars and Normal Forms
CHAPTER 2 Context-Free Languages
Parsing Costas Busch - LSU.
The Cocke-Kasami-Younger Algorithm
Normal forms and parsing
Automata, Grammars and Languages
Normal Forms for Context-free Grammars
Context-Free Languages
Presentation transcript:

Chomsky Normal Form CYK Algorithm

Normal Forms There are some special forms in which I can bring the grammar to work with it more easily. Chomsky Normal Form Greibach Normal Form

Chomsky Normal Form A Context Free Grammar is in Chomsky Normal Form if the rules are of the form: A ⟶ BC A ⟶ a S ⟶ ε with A, B, C being variables (B,C not being the start variable), a being a terminal and S only being the start variable.

Chomsky Normal Form There are 5 steps to follow in order to transform a grammar into CNF: Add the a new start variable S0 and the production rule S0 ⟶ S. Eliminate the ε-rules. Eliminate the unary productions A ⟶ B. Add rules of the form Vt ⟶ t for every terminal t and replace t with the variable Vt. Transform the remaining of the rules to the form A ⟶ BC (A, B, C variables).

1. Add a new start variable We have to make sure that the start variable doesn’t occur to the right side of some rule. Thus, we add a new start variable S0 and the rule S0 ⟶ S, where S is the old start variable.

2. Eliminate ε-rules We have to eliminate all productions of the form A ⟶ ε, for A being any non-start variable. To do so we should remove the rule A ⟶ ε and replace every appearance of A with ε in all other rules.

3. Eliminate unary productions A unary production is a production of the form A ⟶ B (with both A, B being variables). There should only be productions of the form V1 ⟶ V2V3 involving variables, thus we have to eliminate unary productions. To do so, we replace B in A ⟶ B with the right parts of the rules involving B in the left part.

4. Add Vt ⟶ t and replace t with Vt There should only be rules of the form A ⟶ t involving terminals, thus terminals should disappear from every other rule involving more than just one single literal. To do so, we add a new variable Vt for every terminal t and we replace every appearance of t with Vt , except those in rules of the form A ⟶ t.

5. Transform rules to A ⟶ BC All the rules involving only variables should be of the form A ⟶ BC. Thus we should take care of all the rules involving more than 2 variables in the right part For the rule V ⟶ A1A2A3…An,we start reducing the size of the right part by replacing every two variables with one new variable (resulting in the creation of n-2 new variables).

5. Transform rules to A ⟶ BC V ⟶ A1A2A3A4A5A6…An

5. Transform rules to A ⟶ BC V ⟶ B1A3A4A5A6…An B1 ⟶ A1A2

5. Transform rules to A ⟶ BC V ⟶ B2A4A5A6…An B2 ⟶ B1A3 B1 ⟶ A1A2

5. Transform rules to A ⟶ BC V ⟶ B3A5A6…An B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2

5. Transform rules to A ⟶ BC V ⟶ B4A6…An B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2

5. Transform rules to A ⟶ BC V ⟶ Bn-2An Bn-2 ⟶ Bn-3An-1 … B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2

Example S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1

1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1 Example 1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1

2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1 Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1

Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B | CS | SC | S C ⟶ 00 B ⟶ 01B | 1

Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC | S C ⟶ 00 B ⟶ 01B | 1

Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1

Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1

Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1

Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1

Example 3. Eliminate Unary Productions S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1

Example 4. Create Vt for every terminal t S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1 Z ⟶ 0

Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0

Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0 A ⟶ 1

Example 4. Create Vt for every terminal t S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1

Example 5. Take care of long rules S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS

Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS

Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA

Example 5. Take care of long rules S0 ⟶ DC | EB | 1 | CS | SC S ⟶ DC | EB | 1 | CS | SC C ⟶ ZZ B ⟶ EB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA

CYK Introduction Problem: Given a context free grammar and a string s is it possible to decide whether s can be generated by the grammar or not? If the grammar is not in a very special form this is not so efficient. If the grammar is in Chomsky Normal Form, we have an elegant algorithm for testing this, the CYK algorithm.

The CYK algorithm Suppose that we are given a grammar in Chomsky Normal form S → AB A → BB | 0 B → AA |1 We would like to see if 10110 is generated by this grammar or not.

Substrings of length 1 Since the only way to produce terminals is by following the rules A → a, just replace every terminal with the variables that produce it. 1 0 1 1 0 B A B B A

Substrings of length 2 1 0 1 1 0 B A B B A - S A - Suppose now that we want to see how every substring of length 2 can be generated. This is equivalent with finding ways to produce all the length 2 substrings where terminals are replaced with the variables that represent them. But since every rule is of the form A → BC, it suffices to replace every two consecutive variables with the variables that produce them. 1 0 1 1 0 B A B B A - S A -

Substrings of length 3 To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here BS cannot be produced by any variable. 1 0 1 1 0 B A B B A - S A - -

Substrings of length 3 To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here we don’t have a pair since 10 cannot be produced. 1 0 1 1 0 B A B B A - S A - -

Substrings of length 3 To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here AA can be produced by B. 1 0 1 1 0 B A B B A - S A - - B

Substrings of length 3 To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here SB cannot be produced by any variable 1 0 1 1 0 B A B B A - S A - - B

Substrings of length 3 To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here we don’t have a pair since 10 cannot be produced by a variable. 1 0 1 1 0 B A B B A - S A - - B -

Substrings of length 3 To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here AA can be produced by B 1 0 1 1 0 B A B B A - S A - - B B

Substrings of length 4 To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here BB can be produced by A. 1 0 1 1 0 B A B B A - S A - - B B A

Substrings of length 4 To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 10 cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A

Substrings of length 4 To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 101 cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A

Substrings of length 4 To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here AB can be produced by S. 1 0 1 1 0 B A B B A - S A - - B B A S

Substrings of length 4 To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here we don’t have a pair since 10 cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S

Substrings of length 4 To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here BA cannot be produced by any variable. 1 0 1 1 0 B A B B A - S A - - B B A S

Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, BS cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S -

Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here we don’t have a pair. 1 0 1 1 0 B A B B A - S A - - B B A S -

Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here we don’t have a pair. 1 0 1 1 0 B A B B A - S A - - B B A S -

Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, AA is produced by B. 1 0 1 1 0 B A B B A - S A - - B B A S B

Answer If the last line contains the start variable S, we can find a derivation for the string following the way the S was produced backwards. In our example, 10110 cannot be generated since S was not found in the last line.

Mechanical way Now that we showed why this method works lets give an easy way to compute the table

Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A S

Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A S

Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A -

Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A -

Mechanical way So finally: 1 0 1 1 0 B A B B A - S A - - B B A S

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A A

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A -

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is:

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is:

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB → BABBB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB → BABBB

A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB → BABBB → 10111