104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1 L 2 (the union) is regular. (2) L 1 L 2 (the concatenation) is regular. (3) L 1 (the Kleene star) and L 1 + (the Kleene plus) are regular. (4) L 1 R (the reversed language) is regular. (5) L 1 (the complement) is regular. (6) L 1 L 2 (the intersection) is regular. (7) L 1 - L 2 (the set subtraction) is regular.
105 Proof of the Closure Properties We can either use regular grammars, FA, or regular expressions for the simplicity of the proof. Let r 1 and r 2 be regular expressions that, respectively, express the languages L 1 and L 2. (1)Clearly, r 1 + r 2 is a regular expression which denotes the union of two languages L 1 and L2, respectively, denoted by r 1 and r 2. Since every regular expression denotes a regular language, L 1 L 2 is regular. We can also constructively prove this property as follows; let G 1 = ( V N1, V T1, P 1, S 1 ) and G 2 = ( V N2, V T2, P 2, S 2 ) be regular grammars that generate L 1 and L 2, respectively. Without loss of generality, assume that V N1 and V N2 are disjoint, i.e., V N1 V N2 = . Otherwise, we can always convert the given grammars to the ones that satisfy such property. Construct a regular grammar G with production rules S S 1 | S 2 and all the rules in P 1 and P 2. Clearly, L(G) = L 1 L 2. (2) Clearly r 1 r 2 is a regular expression which denotes the language L 1 L 2, which means L 1 L 2 is regular. (3) Let r 1 be regular expression for L 1. Clearly, (r 1 ) * is regular expression for L 1 Since L 1 + = L 1 - { }, by property (7) that will be proved, L 1 + is regular.
106 Proof of the closure Properties (cont’ed) (4) Suppose that the following FSA M 1 accepts L 1. We modify M 1 as shown below. Clearly, the resulting automaton recognizes the reversed language of L 1. a b b a a a b start Add a new accepting state Let the new accepting state be the start state, reverse the direction of all the edges, and the old start state be the only accepting state. a b b a a a b start a b b a a a b
107 Proof of the Closure Properties (cont’ed) You can also prove part (4) using a regular grammar G 1 using another form of regular grammars, where the production rules are restricted to the form either A Bx, or A x, where A and B are arbitrary nonterminal symbols, and x is a string of terminal symbol, or . (Recall that we chose to restrict to A xB, or A x.) If we reverse the right side of each production rule, then the resulting grammar G generates L 1 R. (5) As for part (4), we modify the finite transition graph M 1 of an automaton that recognizes L 1 as follows. Add the dead state, if it is not shown in the transition graph. (Recall that we usually do not show the dead state for convenience.) Change accepting states to non-accepting states and non-accepting states to accepting states. (6) Since L 1 L 2 = L 1 L 2 = L 1 L 2, and regular language are closed under union and complementation (properties (1) and (5) above), L 1 L 2 is regular. (7) Since L 1 - L 2 = L 1 L 2, it is regular by properties (5) and (6)above.
108 Properties of Context-free Languages Let L 1 and L 2 be CFL’s. (1) L 1 L 2 (the union) is CFL. (2) L 1 L 2 (the concatenation) is CFL. (3) L 1 * (the Kleene star) and L 1 + (the Kleene plus) are CFL. (4) L 1 R (the reversed language) is CFL. (5) L 1 L 2 (the intersection) is not necessarily CFL. (6) L 1 (the complement) is not necessarily CFL.
109 Proof of the Context-free Language Properties Let G 1 = ( V N1, V T1, P 1, S 1 ) and G 2 = ( V N2, V T2, P 2, S 2 ) be CF grammars that generate L 1 and L 2, respectively. Without loss of generality, assume that V N1 and V N2 are disjoint, i.e., V N1 V N2 = . (Otherwise, we can modify them.) (1) Construct a CFG G by merging the rules of grammars G 1 and G 2 and adding new rules S S 1 | S 2. (This is the same technique for regular languages.) (2) Construct a CFG G by merging the rules of G 1 and G 2 and adding a new rule S S 1 S 2. (3) For L 1 * add rules S S 1 S | in grammar G 1. For L 1 + add rules S S 1 S | S 1,where S is new start symbol. (4) Construct a CFG from G 1 by changing each rule A to A R, i. e., reverse right side of each production rule.
110 (5) We know that L 1 = {a i b i c j i, j 0 } and L 2 = {a k b n c n k, n 0 } are CFL’s. But L 1 L 2 = {a i b i c i i 0 } is not CFL. (6) Suppose that CFL’s are closed under complementation. Since CFL’s are closed under union (property (1)), and L 1 L 2 = L 1 L 2, which implies CFL’s are closed under intersection. This contradicts to the proven fact of property (5). Proof of the Context-free Language Properties (cont’ed)
111 Minimizing the Number of -Production Rules Theorem. Given an arbitrary CFG G, we can construct a CFG G´ such that L(G) = L(G´) and if is not in L(G), then G´ dose not have - production rule. If L(G), then S is the only -production rule of G´. Proof (an algorithm). Let G = (V T, V N, P, S), and let A, B V N. We construct a CFG G´ = (V T,V N,P´, S) from G by the following steps. (1) Find the set W of all nonterminals of G which derive as follows; W 0 = {A| A V N and A is in P}; Do W i+1 = W i {A | A V N and A is in P, for some W i + }; until (W i+1 = W i ); W = W i ; //W contains all nonterminal symbols from which can be derived. (2) Delete all -production from P. Call this new set of productions P 1. (3) Modify P 1 to P´ as follows: If a production A is in P 1, then put the rules A and A into P´, for all ( ) which are obtained from by deleting one or more nonterminals in the set W constructed by step (1). (4) If S is in W, then add S in P´.
112 Minimizing the Number of -Production Rules (example) Convert the following CFG G to another CFG G´ such that L(G) = L(G´) and G´ has the smallest possible number of -production rules. G: S ADC | EFg A aA | D FGH | b C c | E a F f | G Gg | H H h | Computing W: W 0 = {A, C, F, H} W 1 = W 0 {G} = {A, C, F, G, H} W 2 = W 1 {D} = {A, C, D, F, G, H} W 3 = W 2 {S} = {A, C, D, F, G, H, S} W 4 = W 3 {} = {A, C, D, F, G, H, S} P 1 : S ADC | EFg A aA D FGH | b C c E a F f G Gg | H H h P´: S ADC | AD | AC | DC | A | D | C | | EFg | Eg A aA | a D FGH | FG | FH | GH | F | G | H | b C c E a F f G Gg | g | H H h
113 Eliminating Useless Symbols from a CFG Lemma 1. Given a CFG G = (V T, V N, P, S), we can construct an equivalent CFG G´ = (V T, V´ N, P´, S), such that every nonterminal symbol A in V´ N derives a string x (V T ) * Proof. Let OLDV and NEWV be sets of nonterminals, and A be an arbitrary nonterminal. We construct V´ N and P´ as follows. OLDV = ; NEWV = {A | A w is in P for some w (V T ) * }; while (OLDV NEWV) do { OLDV = NEWV; NEWV = OLDV {A | A for some in (V T OLDV) * }; } V´ N = NEWV; P´ = {A | A is in P and (V´ N V T ) * };
114 Eliminating Useless Symbols from a CFG (cont’ed) Lemma 2. Given a CFG G = (V T,V N, P, S), we can construct an equivalent CFG G´ = (V´ T,V´ N, P´, S), such that, for each symbol X V´ T V´ N, the start symbol derives X , for some , (V´ T V´ N ) *, i.e., S can derive a sentential form (a string of terminals and nonterminals) which contains symbol X. Proof. The following algorithm computes V´ T, V´ N and P´. (1) Let V´ T and V´ N be the empty sets. (2) Put S into V´ N. (3) If A V N is put into V´ N and A 1 | 2 |.... n, then all nonterminals in i, 1 i n, are put into V´ N and all terminals in are put into V´ T. (4) Repeat (3) until there is no symbol to be added to V´ N. (5) Let P´ contain all the productions in P except for the ones which have a symbol not in V´ T V´ N.
115 Eliminating Useless Symbols from a CFG (cont’ed) Theorem. Given arbitrary CFG G = (V T, V N, P, S), we can construct an equivalent CFG G´ = (V´ T, V´ N, P´, S), such that, (1) for each A V´ N, A (V´) * T (i.e., A derives a terminal string or ), and (2) for each X V´ T V´ N, S X , for some , V´ N (V´) * T, (i.e., the start symbol can drive a sentential form which contains X). Proof. Use Lemmas 1 and 2.
116 Eliminating Useless Symbols from a CFG (example) Example. Eliminate useless symbols from the following CFG G. G: S AD | EFg A aGD D FGd C cCEc E Ee F Ff | G Gg | g H hH | h Step 1: Apply Lemma 1 to find the set of nonterminals V´ N such that every nonterminal symbol in V´ N derives a string x (V T ) *. OLDV = {}; NEWV = {F, G, H} OLDV = NEWV; NEWV = OLDV {D} = {D, F, G, H}; OLDV = NEWV; NEWV = OLDV {A} = {A, D, F, G, H}; OLDV = NEWV; NEWV = OLDV {S} = {A, D, F, G, H, S}; OLDV = NEWV; NEWV = OLDV { } = {A, D, F, G, H, S}; V´ N = NEWV = {A, D, F, G, H, S} Find the set of rules P´. P´: S AD A aGD D FGd F Ff | G Gg | g H hH | h
117 P´: S AD A aGD D FGd F Ff | G Gg | g H hH | h Step 2: Find the set of symbols V´ = V´ T V´ N such that each symbol in V´ can be derived starting from S. 1.V´ T = V´ N = {}; // initialize with empty set 2.V´ N = V´ N {S} V´ T = V´ T {} 3.V´ N = V´ N {A, D} = {S, A, D} V´ T = V´ T {} 4.V´ N = V´ N {G, F} = {S, A, D, G, F} V´ T = V´ T {a, d} ={a, d} 5.V´ N = V´ N {} = {S, A, D, F, G} V´ T = V´ T {a, d} ={a, d, g, f} 6.V´ N = V´ N {} = {S, A, D, F, G} V´ T = V´ T {} ={a, d, g, f} Cleaned set of rules: P´: S AD A aGD D FGd F Ff | G Gg | g
118 Remark: Notice that applying Lemma 2 first and then Lemma 1 may fail to eliminate all useless productions. Example. Consider grammar with rules P = {S AB | a A a} By applying Lemma 1 first, we have P = {S a A a }, then applying Lemma 2, we have P = {S a }. However, if we apply Lemma 2 first, we have P = {S AB | a A a }. Then applying Lemma 1, we have P = {S a A a }, which still has a useless production. Eliminating Useless Symbols from a CFG (cont’ed)
119 There are two kinds of ambiguities in a language. Lexical ambiguity (or semantic ambiguity): A symbol or an expression has more than one meaning (e.g., story, saw). Syntactic ambiguity (or structural ambiguity): An expression can be parsed in two different ways. A CFG is ambiguous if the language has a string for which there are more than one parse tree. Ambiguous Context-free Grammars man A enteredroom the withpicture a man A enteredroom the withpicture a For a given context-free grammar G and a string x, the parse tree shows how x is derived with the rules of G (see an example on the next slide). In programming language different parse trees give different object codes. In this course we will only study syntactic ambiguity of context-free grammars. Example 1 (in natural language). “A man entered a room with a picture” can be interpreted in two different ways.
120 Ambiguous Context-free Grammars (cont’ed) Example 2 (in formal language). The following context-free grammar is ambiguous, because it has two parse trees shown in Figures (a) and (b) below for string p q r. G: S S S S S S A A p q r Figure (a) S S S SS A AA p qr S S S SSA AA pq r Figure (b)
121 Some Techniques for Designing Unambiguous CFG (1) Use parenthesis such that each derivation tree generates unique string. Notice that this technique changes the language by introducing new terminal symbols, the parentheses. Example: Ambiguous G 1 : S S S S S S A A p q r Unambiguous G 2 : S (S S) (S S) (S) A A p q r S S S SS A AA pq r Figure (b). ((p q ) r) ( ) () Figure (a) (p (q r)) S S S S A AA p qr ( ) ( ) S
122 Some Techniques for Designing Unambiguous CFG (2) Modify the production rules that cause the ambiguity. Examples: (a) Grammar G 3 below is clearly ambiguous grammar because it can either generate left side b first and then right side b or vice versa for string bcb. Grammar G 4 doesn’t have this possibility because it generates left side b’s first, if any. Ambiguous G 3 : S bS Sb c Unambiguous G 4 : S bS A A Ab c S b S b S c S b S bS c S b S b A A c Figure (a). Ambiguity of G 3 Figure (b). Unambiguous G 4.
123 Some Techniques for Designing Unambiquous CFG (cont’ed) (b) The following grammar G 5 is ambiguous, since it can generate in two ways. We eliminate this possibility by applying the technique of reducing -production rules. Grammar G 6 is the result. G 5 : S B D B bBc D dDe G 6 : S B D B bBc bc D dDe de (c) Grammar G 1 can be modified in two different ways to make it unambiguous. Notice that for G 7 we used the same technique for Example (a) above. G 7 : S A S A S S A A p q r G 8 : S D S D D C D C C C A A p q r For G 8 we set up a precedence rule such that , if any, is derived (by S) first, then (by D) and in that order from the top of the parse tree. The later an operator is derived the higher precedence it has over the others.
124 Known facts about ambiguous context-free grammars. There is no algorithm that can tell whether an arbitrary CFG is ambiguous or not. There is so called inherently ambiguous context-free languages for which every CFG is ambiguous. Here is an example. {a n b n c m d m n, m 1} {a n b m c m d n n, m 1}. There is no algorithm that can convert an arbitrary ambiguous CFG, which is not inherently ambiguous, to an unambiguous one.
125 Normal Forms of Context-free Grammars When we investigate context-free grammars and their languages, sometimes it is convenient to make the right side of each production rule meet certain form. Such form is called normal form. There are two normal forms for context-free grammars; Chomsky Normal Form(CNF) and Greibach Normal Form(GNF). Let G = (V N, V T, P, S) be a context-free grammar. Grammar G is in CNF, if all the production rules of the grammar are of the form A BC or A a, where A, B, C V N, a V T. A context-free grammar is in GNF, if every production rule of the grammar is of the form A a , where A V N, a V T, and (V N ) *. Notice that is a string of nonterminal symbols or a null string. We can show that every context-free grammar whose language does not contain can be converted to CNF and GNF. (Recall that we can eliminate all - production rules from a given context-free grammar, if its language does not contain .) The following example shows how to convert a context-free grammar to CNF. We can easily generalize the idea. Converting a context-free grammar to GNF is quite involved (see the text Chapter 6). We shall not study the proof.
126 Converting a Context-free Grammar to CNF(example) Suppose that a context-free grammar has a production rule A aBCDbE, which is not in CNF. We introduce new nonterminal symbols and production rules in CNF such that A can derive the right side string aBCDbE as follows; A A 1 B 1 A 1 a // and we let B 1 derive BCDbE as follows; B 1 BC 1 // and we let C 1 derive CDbE as follows; C 1 CD 1 // and we let D 1 derive DbE as follows; D 1 DE 1 // and we let E 1 derive bE as follows; E 1 F 1 E F 1 b // and we let E 1 derive bE as follows; Example. Convert the following context-free grammar to CNF. S AaBCb A abb B aC C aCb | ac Answer: S AA 1 A 1 A 2 A 3 A 2 a A 3 BA 4 A 4 CA 5 A 5 b A B 1 B 2 B 1 a B 2 B 3 B 4 B 3 b B 4 b B C 1 C C 1 a C D 1 D 2 | E 1 E 2 D 1 a D 2 CD 3 D 3 b E 1 a E 2 c