Chomsky Normal Form CYK Algorithm
Normal Forms There are some special forms in which I can bring the grammar to work with it more easily. Chomsky Normal Form Greibach Normal Form
Chomsky Normal Form A Context Free Grammar is in Chomsky Normal Form if the rules are of the form: A ⟶ BC A ⟶ a S ⟶ ε with A, B, C being variables (B,C not being the start variable), a being a terminal and S only being the start variable.
Chomsky Normal Form There are 5 steps to follow in order to transform a grammar into CNF: Add the a new start variable S0 and the production rule S0 ⟶ S. Eliminate the ε-rules. Eliminate the unary productions A ⟶ B. Add rules of the form Vt ⟶ t for every terminal t and replace t with the variable Vt. Transform the remaining of the rules to the form A ⟶ BC (A, B, C variables).
1. Add a new start variable We have to make sure that the start variable doesn’t occur to the right side of some rule. Thus, we add a new start variable S0 and the rule S0 ⟶ S, where S is the old start variable.
2. Eliminate ε-rules We have to eliminate all productions of the form A ⟶ ε, for A being any non-start variable. To do so we should remove the rule A ⟶ ε and replace every appearance of A with ε in all other rules.
3. Eliminate unary productions A unary production is a production of the form A ⟶ B (with both A, B being variables). There should only be productions of the form V1 ⟶ V2V3 involving variables, thus we have to eliminate unary productions. To do so, we replace B in A ⟶ B with the right parts of the rules involving B in the left part.
4. Add Vt ⟶ t and replace t with Vt There should only be rules of the form A ⟶ t involving terminals, thus terminals should disappear from every other rule involving more than just one single literal. To do so, we add a new variable Vt for every terminal t and we replace every appearance of t with Vt , except those in rules of the form A ⟶ t.
5. Transform rules to A ⟶ BC All the rules involving only variables should be of the form A ⟶ BC. Thus we should take care of all the rules involving more than 2 variables in the right part For the rule V ⟶ A1A2A3…An,we start reducing the size of the right part by replacing every two variables with one new variable (resulting in the creation of n-2 new variables).
5. Transform rules to A ⟶ BC V ⟶ A1A2A3A4A5A6…An
5. Transform rules to A ⟶ BC V ⟶ B1A3A4A5A6…An B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ B2A4A5A6…An B2 ⟶ B1A3 B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ B3A5A6…An B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ B4A6…An B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ Bn-2An Bn-2 ⟶ Bn-3An-1 … B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2
Example S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1
1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1 Example 1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1
2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1 Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1
Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B | CS | SC | S C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC | S C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 4. Create Vt for every terminal t S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC | 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1 Z ⟶ 0
Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0
Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0 A ⟶ 1
Example 4. Create Vt for every terminal t S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1
Example 5. Take care of long rules S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS
Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS
Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA
Example 5. Take care of long rules S0 ⟶ DC | EB | 1 | CS | SC S ⟶ DC | EB | 1 | CS | SC C ⟶ ZZ B ⟶ EB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA
CYK Introduction Problem: Given a context free grammar and a string s is it possible to decide whether s can be generated by the grammar or not? If the grammar is not in a very special form this is not so efficient. If the grammar is in Chomsky Normal Form, we have an elegant algorithm for testing this, the CYK algorithm.
The CYK algorithm Suppose that we are given a grammar in Chomsky Normal form S → AB A → BB | 0 B → AA |1 We would like to see if 10110 is generated by this grammar or not.
Substrings of length 1 Since the only way to produce terminals is by following the rules A → a, just replace every terminal with the variables that produce it. 1 0 1 1 0 B A B B A
Substrings of length 2 1 0 1 1 0 B A B B A - S A - Suppose now that we want to see how every substring of length 2 can be generated. This is equivalent with finding ways to produce all the length 2 substrings where terminals are replaced with the variables that represent them. But since every rule is of the form A → BC, it suffices to replace every two consecutive variables with the variables that produce them. 1 0 1 1 0 B A B B A - S A -
Substrings of length 3 To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here BS cannot be produced by any variable. 1 0 1 1 0 B A B B A - S A - -
Substrings of length 3 To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here we don’t have a pair since 10 cannot be produced. 1 0 1 1 0 B A B B A - S A - -
Substrings of length 3 To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here AA can be produced by B. 1 0 1 1 0 B A B B A - S A - - B
Substrings of length 3 To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here SB cannot be produced by any variable 1 0 1 1 0 B A B B A - S A - - B
Substrings of length 3 To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here we don’t have a pair since 10 cannot be produced by a variable. 1 0 1 1 0 B A B B A - S A - - B -
Substrings of length 3 To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here AA can be produced by B 1 0 1 1 0 B A B B A - S A - - B B
Substrings of length 4 To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here BB can be produced by A. 1 0 1 1 0 B A B B A - S A - - B B A
Substrings of length 4 To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 10 cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A
Substrings of length 4 To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 101 cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A
Substrings of length 4 To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here AB can be produced by S. 1 0 1 1 0 B A B B A - S A - - B B A S
Substrings of length 4 To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here we don’t have a pair since 10 cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S
Substrings of length 4 To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here BA cannot be produced by any variable. 1 0 1 1 0 B A B B A - S A - - B B A S
Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, BS cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S -
Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here we don’t have a pair. 1 0 1 1 0 B A B B A - S A - - B B A S -
Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here we don’t have a pair. 1 0 1 1 0 B A B B A - S A - - B B A S -
Combine previous solutions In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, AA is produced by B. 1 0 1 1 0 B A B B A - S A - - B B A S B
Answer If the last line contains the start variable S, we can find a derivation for the string following the way the S was produced backwards. In our example, 10110 cannot be generated since S was not found in the last line.
Mechanical way Now that we showed why this method works lets give an easy way to compute the table
Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A S
Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A S
Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A -
Mechanical way Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 1 0 1 1 0 B A B B A - S A - - B B A -
Mechanical way So finally: 1 0 1 1 0 B A B B A - S A - - B B A S
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A A
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A -
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is:
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is:
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB → BABBB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB → BABBB
A string that is produced Run the CYK algorithm for the string 10111 1 0 1 1 1 B A B B B - S A - - B S A A S The derivation is: S → AB → BBB → BAAB → BABBB → 10111