Exercises on Chomsky Normal Form and CYK parsing

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Closure Properties of CFL's
Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)
Context-Free Grammars
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
Top-Down Parsing.
Lecture Note of 12/22 jinnjy. Outline Chomsky Normal Form and CYK Algorithm Pumping Lemma for Context-Free Languages Closure Properties of CFL.
Transparency No. P2C4-1 Formal Language and Automata Theory Part II Chapter 4 Parse Trees and Parsing.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Normal forms for Context-Free Grammars
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Chapter 12: Context-Free Languages and Pushdown Automata
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Top-Down Parsing - recursive descent - predictive parsing
CONVERTING TO CHOMSKY NORMAL FORM
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Context-Free Grammars – Chomsky Normal Form Lecture 16 Section 2.1 Wed, Sep 26, 2007.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
CYK Algorithm for Parsing General Context-Free Grammars
Section 12.4 Context-Free Language Topics
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
Parsing Top-Down.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
1 Chapter 6 Simplification of CFGs and Normal Forms.
CYK Algorithm for Parsing General Context-Free Grammars.
Top-Down Parsing.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Chomsky Normal Form.
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
9/30/2014IT 3271 How to construct an LL(1) parsing table ? 1.S  A S b 2.S  C 3.A  a 4.C  c C 5.C  abc$ S1222 A3 C545 LL(1) Parsing Table What is the.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Chomsky Normal Form CYK Algorithm
Ambiguity Parsing algorithms
Complexity and Computability Theory I
7. Properties of Context-Free Languages
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
Top-Down Parsing.
More on Context Free Grammars
7. Properties of Context-Free Languages
CHAPTER 2 Context-Free Languages
The Cocke-Kasami-Younger Algorithm
Normal forms and parsing
Context-Free Languages
Presentation transcript:

Exercises on Chomsky Normal Form and CYK parsing 1. Convert the following grammar to CNF S -> A ( S ) B | “” A -> S | S B | x | “” B -> S B | y This is exercise is available in the “grammar tutoring system” http://laraserver3.epfl.ch:9000 Select exercise type “CNF Conversion” -> choose the problem “Exercise 1 of lecturecise 12” Make sure you create a new start symbol S1 and add the production S1 -> S | “” to your grammar before CNF conversion as the start symbol ‘S’ is nullable and also appears on the right hand side

Exercise 2 Which of the following properties of a grammar are preserved by the “Epsilon Production Elimination” algorithm Ambiguity If a grammar is “ambiguous” does it remain ambiguous after removing epsilon productions ? LL(1) No Left Recursion We define left recursion as existence of productions of the form N -> N 𝛼 (or) 𝑁→ 𝑁 1 𝛼 1 , 𝑁 1 → 𝑁 2 𝛼 2 , ⋯, 𝑁 𝑛 →N 𝛼 𝑛 Unambiguity ? No, counter-example ? No, counter-example ? No, counter-example ? Yes, proof ?

Exercise 2(a) - Solution Ambiguity : If a grammar is “ambiguous” does it remain ambiguous after removing epsilon productions ? No, it need not. Eg. consider the following ambiguous grammar S -> a A | a A -> b | “” The grammar has two parse trees for the string: a After removing epsilon productions we get A -> b There is exactly one parse tree for every word generated by the grammar (which are a and ab)

Exercise 2(b) - Solution LL(1): If a grammar is LL(1) does it remain LL(1) after removing epsilon productions ? No, it need not. Eg. consider the following LL(1) grammar S -> a S b | “” After removing epsilon productions we get S’ -> S | “” S -> a S b | a b Note: we have created a new start symbol S’ as S was nullable and appeared on the right hand side The grammar is not LL(1) as First(a S b) and First(a b) intersect

Exercise 2(c) - Solution No left recursion: If a grammar has no left recursion does it remain without left recursion after removing epsilon productions ? No, it need not. Eg. consider the following non-left recursive grammar S -> B S a |a B -> “” | b After removing epsilon productions we get S -> B S a | S a | a The production S -> S a is a left recursive production Note: this also means that we have to eliminate epsilons before removing left recursion using the approach described in lecturecise 10

Exercise 2(d) - Solution Unambiguity: If a grammar is unambiguous does it remain unambiguous after removing epsilon productions ? Yes. Proof: Let G’ be obtained from G by eliminating epsilon productions Let’s prove the contra-positive form: if there are two left most derivations for a word w in G’ then there will be two left most derivations for w in G Let D1 and D2 be two left most derivations for the word w in G’ If all the productions used in D1 and D2 are also present in G then D1 and D2 are also valid in G, which will imply the claim. Therefore, say D1 or D2 use productions denoted using: 𝐴 𝑖 → 𝛾 𝑗 that did not belong to G. For each 𝐴 𝑖 → 𝛾 𝑗 (except when 𝐴 𝑖 is a start symbol), there exists 𝐴 𝑖 → 𝛼 𝑗 belonging to G such that 𝛾 𝑗 is obtained from 𝛼 𝑗 by removing nullable non-terminals from some positions (denoted by say 𝑋 𝑗 ). Hence, we can create new derivations D1’ and D2’ (that are valid in G) from D1 and D2 by replacing every use of the rule 𝐴 𝑖 → 𝛾 𝑗 by 𝐴→ 𝛼 𝑗 and deriving empty strings (using productions of G) from non-terminals in positions 𝑋 𝑗

Exercise 2(d) - Solution If 𝐴 𝑖 was the start symbol of G’ and it does not belong to G then we remove the derivation 𝐴 𝑖 →𝑆 at the start of D1 or D2 and replace it with the start symbol S of G Will the resulting derivations D1’ and D2’ be distinct ? Consider the point where D1 and D2 diverge. Let D1: 𝑆 ⇒ ∗ 𝑥𝐵𝛼⇒𝑥 𝛽 1 𝛼 ⇒ ∗ 𝑤 D2: 𝑆 ⇒ ∗ 𝑥𝐵𝛼⇒𝑥 𝛽 2 𝛼 ⇒ ∗ 𝑤 If neither 𝐵→ 𝛽 1 or 𝐵→ 𝛽 2 are of the form 𝐴 𝑖 → 𝛾 𝑗 then D1’ and D2’ will also diverge at the same point as we preserve 𝐵→ 𝛽 1 and 𝐵→ 𝛽 2 in D1’, D2’ respec. Case (i), say D1: 𝑆 ⇒ ∗ 𝑥 𝐴 𝑖 𝛼⇒𝑥 𝛾 𝑗 𝛼 ⇒ ∗ 𝑤 D2: 𝑆 ⇒ ∗ 𝑥 𝐴 𝑖 𝛼⇒𝑥𝛽𝛼 ⇒ ∗ 𝑤 , where 𝐴 𝑖 →𝛽 is a rule in G as well 𝛾 𝑗 is replaced by 𝛼 𝑗 in D1’ and some non-terminals in 𝛼 𝑗 derive epsilon. However, no non-terminal in 𝛽 derives epsilon in D2 as D2 is a derivation of G’ that has no epsilon productions except for the start symbol. By construction, the start symbol of G’ will never appear on the right hand side if it is nullable. Hence, 𝐴 𝑖 →𝛽 will be preserved in D2’ and no non-terminal in 𝛽 derives epsilon in D2’. Hence, D1’ will differ from D2’ irrespective of whether 𝛼 𝑗 =𝛽 or not.

Exercise 2(d) - Solution Case (ii), say D1: 𝑆 ⇒ ∗ 𝑥 𝐴 𝑖 𝛼⇒𝑥 𝛾 𝑗 𝛼 ⇒ ∗ 𝑤 D2: 𝑆 ⇒ ∗ 𝑥 𝐴 𝑖 𝛼⇒𝑥 𝛾 𝑘 𝛼 ⇒ ∗ 𝑤 𝛾 𝑗 is replaced by 𝛼 𝑗 in D1’ and 𝛾 𝑘 by 𝛼 𝑘 in D2’. Since 𝛾 𝑗 ≠ 𝛾 𝑘 either 𝛼 𝑗 and 𝛼 𝑘 are different or non-terminals at different positions in 𝛼 𝑗 and 𝛼 𝑘 are reduced to empty strings in D1’ and D2’, respectively. Hence, D1’ will differ from D2’

Exercise 3 Which of the following properties of a grammar are preserved by the “Unit Production Elimination” algorithm Ambiguity Left Recursion What about these rules: B -> A | a , A -> B ? LL(1) Unambiguity No, counter-example ? No Yes, proof ? Yes, proof ?

Exercise 3(a) - Solution Ambiguity : If a grammar is “ambiguous” does it remain ambiguous after removing unit productions ? No, it need not. Eg. consider the following ambiguous grammar S -> A | a A -> a There exists two parse tree for a: S -> A -> a and S -> a After removing the unit production we get S -> a There is exactly one parse tree for “a“

Exercise 3(b) - Solution Left recursion: If a grammar has left recursion will it have left recursion after removing unit productions ? No, it need not. Eg. consider the following left recursive grammar B -> A | a A -> B After removing the unit productions (using the graph based algorithm described in lecturecise 11) we get B -> a A -> a

Exercise 3(c) - Solution LL(1): If a grammar is LL(1) does it remain LL(1) after removing unit productions ? Yes. The following is a sketch of the proof (not a complete, rigorous proof). Let G’ be obtained from G by eliminating unit productions. We need to show that all 3 properties of LL(1) grammar holds in G’ if it holds in G Assume that only one unit production A -> B (where B -> 𝛽 1 ⋯ 𝛽 𝑛 ) is removed If there are many unit productions, say we remove them one after the other. Hence, it suffices to show that the property holds for this step. A -> B will be replaced by A -> 𝛽 1 ⋯ 𝛽 𝑛 Every alternative of A in G’ will have disjoint first sets. If there is an alternative whose first set intersects with 𝛽 𝑖 for some i then in G it would have intersected with first(B) contradicting the fact that the input grammar is LL(1) A will have at most one nullable alternative in G’. If in G, A’s nullable alternative was B then there will exist exactly one 𝛽 𝑖 that is nullable in G (and hence in G’) as G is in LL(1). If not, no 𝛽 𝑖 will be nullable in G (and hence in G’).

Exercise 3(c) – Solution [Cont.] By the same argument, every non-terminal that is nullable in G’ is also nullable in G. We now show that the follow set of every non-terminal in G’ can only become smaller than its corresponding follow set in G. Adding the production A -> 𝛽 1 ⋯ 𝛽 𝑛 can only affect the follow set of every non-terminal N in 𝛽 𝑖 s.t. 𝛽 𝑖 =𝛼 𝑁 𝛾 and either 𝛾 is empty or is nullable. In this case, follow(A) ⊆ follow(N) in G’. However, in G, follow(A) ⊆ follow(B) ⊆ follow(N) as A -> B and B -> 𝛽 𝑖 belong to G and 𝛾 is nullable in G’ implies it is also nullable in G. Hence, follow(N) cannot increase for any such N. Removing the production A -> B can only reduce the size of follow(B) in G’ compared to its size in G. Any reduction in the follow set of the non-terminals cannot violate the LL(1) property. Hence, G’ is also in LL(1)

Exercise 3(d) - Solution Unambiguity: If a grammar is unambiguous does it remain unambiguous after removing unit productions ? Yes. Sketch of the proof: Let G’ be obtained from G by eliminating epsilon productions Let’s prove the contra-positive form: if there are two left most derivations for a word w in G’ then there will be two left most derivations for w in G As in previous proof assume that we are removing only one unit production A -> B where B -> 𝛽 1 ⋯ 𝛽 𝑛 . A -> B will be replaced by A -> 𝛽 1 ⋯ 𝛽 𝑛 Let D1 and D2 be two left most derivations for the word w in G’ If all the productions used in D1 and D2 are present in G then D1 and D2 are also valid in G, which will imply the claim. Therefore, say D1 or D2 use productions denoted using: 𝐴→ 𝛽 𝑖 that did not belong to G. For each 𝐴→ 𝛽 𝑖 , there exists a derivation 𝐴→𝐵→ 𝛽 𝑖 belonging to G. Hence, we can create new derivations D1’ and D2’ (that are valid in G) from D1 and D2 by replacing every use of the rule 𝐴→ 𝛽 𝑖 by the derivation 𝐴→𝐵→ 𝛽 𝑖 Are D1’ and D2’ distinct ?

Exercise 3(d) - Solution [Cont.] Consider the point where D1 and D2 diverge. Let D1: 𝑆 ⇒ ∗ 𝑥𝐶𝛼⇒𝑥 𝛾 1 𝛼 ⇒ ∗ 𝑤 D2: 𝑆 ⇒ ∗ 𝑥𝐶𝛼⇒𝑥 𝛾 2 𝛼 ⇒ ∗ 𝑤 If neither of C → 𝛾 1 and C → 𝛾 2 are of the form 𝐴→𝛽 𝑖 then D1’ and D2’ will also diverge at the same point as we do not change such productions Case (i), say D1: 𝑆 ⇒ ∗ 𝑥𝐴𝛼⇒𝑥 𝛽 𝑖 𝛼 ⇒ ∗ 𝑤 D2: 𝑆 ⇒ ∗ 𝑥𝐴𝛼⇒𝑥 𝛾 2 𝛼 ⇒ ∗ 𝑤 In this case D1’ will be 𝑆 ⇒ ∗ 𝑥𝐴𝛼⇒𝑥𝐵𝛼⇒ 𝑥 𝛽 𝑖 𝛼 ⇒ ∗ 𝑤 and D2’ will have 𝑆 ⇒ ∗ 𝑥𝐴𝛼⇒𝑥 𝛾 2 𝛼. Moreover, 𝛾 2 is not B as A -> B does not belong to G’. Hence, D1’ will differ from D2’ Case (ii), say D1: 𝑆 ⇒ ∗ 𝑥 𝐴 𝑖 𝛼⇒𝑥 𝛽 𝑖 𝛼 ⇒ ∗ 𝑤 D2: 𝑆 ⇒ ∗ 𝑥 𝐴 𝑖 𝛼⇒𝑥 𝛽 𝑗 𝛼 ⇒ ∗ 𝑤 In this case D1’: 𝑆 ⇒ ∗ 𝑥𝐴𝛼⇒𝑥𝐵𝛼⇒ 𝑥 𝛽 𝑖 𝛼 ⇒ ∗ 𝑤 and D2’: 𝑆 ⇒ ∗ 𝑥𝐴𝛼⇒𝑥𝐵𝛼⇒ 𝑥 𝛽 𝑗 𝛼 ⇒ ∗ 𝑤 . Hence, D1’ will differ from D2’

Exercise 4 Given a grammar G in CNF, how many steps does it require to derive a string of size n.

Exercise 4 -Solution Intuition Consider a derivation of a string of length n obtained as follows: 1. Derive a string of exactly n nonterminals from the start symbol, then 2. Expand each nonterminal out to a single terminal. To obtain ‘n’ non-terminals from the start symbol, we need to apply productions of the form S → AB as that is the only way to generate non-terminals. How many times do we have to apply such productions ? Application of one such production will increase the number of nonterminals by 1, since you replace one nonterminal with two nonterminals. Since we start with one nonterminal, we need to repeat this n-1 times. We need n more steps to convert nonterminals to terminals Therefore, total number of steps = 2n – 1 Let’s try to prove this bound formally

Exercise 4 –Solution [Cont.] Denote the number of steps required to derive a string w from a non-terminal N as NS(N, w) If |w| = 1, we need exactly 1 step. NS(N, w) = 1 if |w| = 1 If |w| > 1, we first apply a production N -> N1 N2 which will derive w. Say 𝑁 ⇒ ∗ 𝑁1 𝑁2 ⇒ ∗ 𝑥 𝑁2 ⇒ ∗ 𝑥𝑦=𝑤 , where |x| = q (say) and |y| = |w| - q Hence, NS(N,w) = NS(N1, x) + NS(N2, y) + 1 We need a closed form for NS(N,w) that depends only on |w|. Say NS(N,w) = f(|w|) We have, 𝑓 |𝑤| = 1 if |𝑤|=1 𝑓 𝑞 +𝑓 𝑤 −𝑞 +1 if 𝑤 >1, 1 ≤q < |w| Where 2 𝑤 −1 is the solution for the above recurrence

Exercise 5 Assume a grammar in CNF has n non-terminals. Show that if the grammar can generate a word with a derivation having at least 2 𝑛 steps, then the recognized language should be infinite (see the pdf file uploaded in the lara wiki along with the slides)

Exercise 6 Show the CYK parsing table for the string “aabbab” for the grammar S -> AB| BA | SS | AC | BD A -> a B -> b C -> SB D -> SA What should be done to construct a parse tree for the string

Exercise 6 -Solution a a b b a b 0 1 2 3 4 5 For generating parse trees, modify the parse table d as below Make every entry (i,j), i < j, of the table a set of triples (N, s, p) where N accepts the sub-string from index i to j via a production of the form p: N -> N1 N2 and N1 accepts the substring from index i to (i+s-1) and N2 accepts the substring from index (i+s) to j 1 2 3 4 5 A - S D C B a a b b a b 0 1 2 3 4 5

Exercise 6 –Solution [Cont.] S -> AB (p1) | BA (p2) | SS (p3) | AC (p4) | BD (p5) A -> a (p6) B -> b (p7) C -> SB (p8) D -> SA (p9) 1 2 3 4 5 A - (S,1,p4) (D,4,p9) (S,4,p3) (S,1,p1) (C,2,p8) (S,2,p3) (C,4,p8) B (S,1,p2) See next slide for an algorithm for generating one parse tree given a table of the above form

Exercise 6 –solution [Cont.] Algorithm for generating one parse tree starting from a nonterminal N for a sub-string (i,j) ParseTree(N, i, j) If i = j, if N is in parseTable(i,j) return Leaf(N, w(i,j)) else report parse error Otherwise, pick an entry (N, s, p) from parseTable(i,j) If no such entry exist report that the sub-string cannot be parsed and return Let p: N -> N1 N2 leftChild= ParseTree(N1, i, i+s-1) rightChild = ParseTree(N2, i+s, j) Return Node(N, leftChild, rightChild)