Module 28 Context Free Grammars Definition of a grammar G Deriving strings and defining L(G) Context-Free Language definition
Context-Free Grammars Definition
Definition A context-free grammar G = (V, S, S, P) V: finite set of variables (nonterminals) S: finite set of characters (terminals) S: start variable element of V role is similar to that of q0 for an FSA or NFA P: finite set of grammar rules or production rules Syntax of a production variable --> string of variables and terminals
English Context-Free Grammar ECFG = (V, S, S, P) V = {<sentence>, <noun phrase>, <verb phrase>, ... } people sometimes use < > to delimit variables In this course, we generally will use capital letters to denote variables S = {a, b, c, ..., z, ;, ,, ., ...} S = <sentence> P = { <sentence> --> <noun phrase> <verb phrase> <pct>, <noun phrase> --> <article> <adj> <noun>, ...}
{aibi | i>0} CFG ABG = (V, S, S, P) V = {S} S = {a, b} S = S P = {S --> aSb, S --> ab} or S --> aSb | ab second format saves some space
Context-Free Grammars Deriving strings, defining L(G), and defining context-free languages
Defining -->, ==> notation First: --> notation This is used to define the productions of a grammar S --> aSb | ab Second: ==>G notation This is used to denote the application of a production rule from a grammar G S ==>ABG aSb ==>ABG aaSbb ==>ABG aaabbb We say that string S derives string aSb (in one step) We say that string aSb derives string aaSbb (in one step) We say that string aaSbb derives string aaabbb (in one step) We often omit the grammar subscript when the intended grammar is unambiguous
Defining ==> continued Third: ==>kG notation This is used to denote k applications of production rules from a grammar G S ==>2ABG aaSbb We say that string S derives string aaSbb in two steps aSb ==>2ABG aaabbb We say that string aSb derives string aaabbb in two steps We often omit the grammar subscript when the intended grammar is unambiguous
Defining ==> continued Fourth: ==>*G notation This is used to denote 0 or more applications of production rules from a grammar G S ==>*ABG S We say that string S derives string S in 0 or more steps S ==>*ABG aaSbb We say that string S derives string aaSbb in 0 or more steps aSb ==>*ABG aaSbb We say that string aSb derives string aaSbb in 0 or more steps aSb ==>*ABG aaabbb We say that string aSb derives string aaabbb in 0 or more steps We often omit the grammar subscript when the intended grammar is unambiguous
Defining derivations * Derivation of a string x The complete step by step derivation of a string x from the start variable S Key fact: each step in a derivation makes only one application of a production rule from G Example: Derivation of string aaabbb using ABG S ==>ABG aSb ==>ABG aaSbb ==>ABG aaabbb Example 2: AG= (V, S, S, P) where P = S -->SS | a Deriving string aaa S ==> SS ==> Sa ==> SSa ==> aSa ==> aaa
Defining L(G) * Generating strings L(G) If S ==>G* x, then grammar G generates string x Note G generates strings which contain terminals and nonterminals aSb contains nonterminals and terminals S contains only nonterminals aaabbb contains only terminals L(G) The set of strings over S generated by grammar G Note we only consider terminal strings generated by G {aibi | i > 0} = L(ABG) {ai | i > 0} = L(AG)
Context-Free Languages * A language L is a context-free language (CFL) iff there exists a CFG G such that L(G) = L Results so far {ai | i > 0} is a CFL One CFG G such that L(G) = this language is AG Note this language is also regular {aibi | i > 0} is a CFL One CFG G such that L(G) = this language is ABG Note this language is NOT regular
Example * Let BAL = the set of strings over {(,)} in which the parentheses are balanced Prove that BAL is a CFL To prove this, you need to come up with a CFG BALG such that L(BALG) = BAL BALG = (V, S, S, P) V = {S} S = {(, )} S = S P = ? Give derivations of ((( ))) and ( )(( )) with your grammar
Module 29 Parse/Derivation Trees Ambiguous Grammars Leftmost derivations, rightmost derivations Ambiguous Grammars Examples Arithmetic expressions If-then-else Statements Inherently ambiguous CFL’s
Context-Free Grammars Parse Trees Leftmost/rightmost derivations Ambiguous grammars
Parse Tree Parse/derivation trees are structured derivations The structure graphically illustrates semantic information about the string Formalization of concept we encountered in regular languages unit Note, what we saw before were not exactly parse trees as we define them now, but they were close
Parse Tree Example Parse tree for string ( )(( )) and grammar BALG BALG = (V, S, S, P) V = {S}, S = {(, )}, S = S P = S --> SS | (S) | l One derivation of ( )(( )) S ==> SS ==> (S)S ==> ( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( )) Parse tree l S ( )
Comments about Example * Syntax: draw a unique arrow from each variable to each character that is a direct child of that variable A line instead of an arrow is ok The derived string can be read in a left to right traversal of the leaves Semantics The tree graphically illustrates the nesting structure of the string of parentheses l S ( )
Leftmost/Rightmost Derivations There is more than one derivation of the string ( )(( )). S ==> SS ==> (S)S ==>( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( )) S ==> SS ==> (S)S ==> (S)(S) ==> ( )(S) S ==> SS ==> S(S) ==> S((S)) ==> S(( )) ==> (S)(( )) ==>( )(( )) Leftmost derivation Leftmost variable is always expanded Which one of the above is leftmost? Rightmost derivation Rightmost variable is always expanded Which one of the above is rightmost? l S ( )
Comments Fix a string and a grammar Example Unique mappings ( ) Fix a string and a grammar Any derivation corresponds to a unique parse tree Any parse tree can correspond to many different derivations Example The one parse tree corresponds to all three derivations Unique mappings For any parse tree, there is a unique leftmost/rightmost derivation that it corresponds to S ==> SS ==> (S)S ==>( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( )) S ==> SS ==> (S)S ==> (S)(S) ==> ( )(S) S ==> SS ==> S(S) ==> S((S)) ==> S(( )) ==> (S)(( )) ==>( )(( ))
Example * S ==> SS ==> SSS ==> (S)SS ==> ( )SS ==> ( )S ==> ( ) The above is a leftmost derivation of the string ( ) from the grammar BALG Draw the corresponding parse tree Draw the corresponding rightmost derivation S ==> (S) ==> (SS) ==> (S(S)) ==> (S( )) ==> (( )) The above is a rightmost derivation of the string (( )) from the grammar BALG Draw the corresponding leftmost derivation
Ambiguous Grammars Examples: Arithmetic Expressions If-then-else statements Inherently ambiguous grammars
Ambiguous Grammars A grammar G is ambiguous if there exists a string x in L(G) with two or more distinct parse trees (2 or more distinct leftmost/rightmost derivations) Example Grammar AG is ambiguous String aaa in L(AG) has 2 rightmost derivations S ==> SS ==> SSS ==> SSa ==> Saa ==> aaa S ==> SS ==> Sa ==> SSa ==> Saa ==> aaa
2 Simple Examples Grammar BALG is ambiguous String ( ) in L(BALG) has >1 leftmost derivation S ==> (S) ==> ( ) S ==> (S) ==> (SS) ==>(S) ==>( ) Give another leftmost derivation of ( ) from BALG Grammar ABG is NOT ambiguous Consider any string x in {aibi | i > 0} There is a unique parse tree for x
Legal Arithmetic Expressions Develop a grammar MATHG = (V, S, S, P) for the language of legal arithmetic expressions S = {0, 1, +, *, -, /, (, )} Strings in the language include 10 10*11111+100 10*(11111+100) Strings not in the language include 10+ 11++101 )(
Grammar MATHG1 V = {E, N} S = {0, 1, +, *, -, /, (, )} S = E P: E --> N | E+E | E*E | E/E | E-E | (E) N --> N0 | N1 | 0 | 1
MATHG1 is ambiguous E --> N | E+E | E*E | E/E | E-E | (E) N --> N0 | N1 | 0 | 1 Come up with two distinct leftmost derivations of the string 11+0*11 E ==> E+E ==> N+E ==> N1+E ==> 11+E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==> 11+0*11 E ==> E*E ==> E+E*E ==> N+E*E ==> N1+E*E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==>11+0*11 Draw the corresponding parse trees
Corresponding Parse Trees E ==> E+E ==> N+E ==> N1+E ==> 11+E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==> 11+0*11 E ==> E*E ==> E+E*E ==> N+E*E ==> N1+E*E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==>11+0*11 E E * E E + + E N 1 N 1 E * N 1 N N N 1
Parse Tree Meanings E + * N 1 Note how the parse trees captures the semantic meaning of string 11+0*11. More specifically, what number does the first parse tree represent? What number does the second parse tree represent?
Implications Two interpretations of string 11+0*11 11+(0*11) = 11 (11+0)*11 = 1001 What if a line in a program is MSU_Tuition = 11+0*11; What is MSU_Tuition? Depends on how the expression 11+0*11 is parsed. This is not good. Ambiguity in grammars is undesirable, particularly if the grammar is used to develop a compiler for a programming language like C++. In this case, there is an unambiguous grammar for the language of arithmetic expressions
If-Then-Else Statements A grammar ITEG = (V, S, S, P) for the language of legal If-Then-Else statements V = (S, BOOL) S = {adv<80, adv>50, grade=3.5, grade=3.0, if, then, else} S = S P: S --> if BOOL then S else S | if BOOL then S |grade=3.5 | grade=3.0 BOOL --> adv<80 | adv>50
ITEG is ambiguous S --> if BOOL then S |grade=3.5 | grade=3.0 | if BOOL then S else S BOOL --> adv<80 | adv>50 Come up with two distinct leftmost derivations of the string if adv<80 then if adv>50 then grade=3.5 else grade=3.0 S ==>if BOOL then S else S ==> if adv<80 then S else S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 S ==>if BOOL then S ==> if adv<80 then S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 Draw the corresponding parse trees
Corresponding Parse Trees S ==>if BOOL then S else S ==> if adv<80 then S else S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 S ==>if BOOL then S ==> if adv<80 then S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 S S if B then S else if B then S adv<80 if B then S grade=3.0 adv<80 else S if B then adv>50 grade=3.5 adv>50 grade=3.5 grade=3.0
Parse Tree Meanings S S if B then S if B then S else S adv<80 if B then S else S adv<80 if B then S grade=3.0 adv>50 grade=3.5 grade=3.0 adv>50 grade=3.5 If you receive a 90 on advanced points, what is your grade? By parse tree 1 By parse tree 2
Implications Two interpretations of string if adv<80 then if adv>50 then grade=3.5 else grade=3.0 Issue is which if-then does the last ELSE attach to? This phenomenon is known as the “dangling else” Answer: Typically, else binds to NEAREST if-then In this case, there is an unambiguous grammar for handling if-then’s as well as if-then-else’s
Inherently ambiguous CFL’s A CFL L is inherently ambiguous iff for all CFG’s G such that L(G) = L, G is ambiguous Examples so far None of the CFL’s we’ve seen so far are inherently ambiguous While the CFG’s we’ve seen ambiguous, there do exist unambiguous CFG’s for those CFL’s. Later result There exist inherently ambiguous CFL’s Example: {aibjck | i=j or j=k or i=j=k} Note i=j=k is unnecessary, but I added it here for clarity
Summary Parse trees illustrate “semantic” information about strings Ambiguous grammars are undesirable This means there are multiple parse trees for some string These strings can be interpreted in multiple ways There are some heuristics people use for taking an ambiguous grammar and making it unambiguous, but this is not the focus of this course There are some inherently ambiguous CFL’s Thus, the above heuristics do not always work
Module 30 EQUAL language Designing a CFG Proving the CFG is correct
EQUAL language Designing a CFG
EQUAL EQUAL is the set of strings over {a,b} with an equal number of a’s and b’s Strings in EQUAL include aabbab bbbaaa abba Strings in {a,b}* not in EQUAL include aaa bbb aab ababa
Designing a CFG for EQUAL Think recursively Base Case What is the shortest possible string in EQUAL? Production Rule:
Recursive Case Recursive Case Now consider a longer string x in EQUAL Since x has length > 0, x must have a first character This must be a or b Two possibilities for what x looks like x = ay What must be true about relative number of a’s and b’s in y? x = bz What must be true about relative number of a’s and b’s in z?
Case 1: x=ay x = ay where y has one extra b What must y look like? Some examples b babba aabbbab aaabbbb Is there a general pattern that applies to all of the above examples? More specifically, show how we can decompose all of the above strings y into 3 pieces, two of which belong to EQUAL. Some of these pieces might be the empty string l
Decomposing y y has one extra b Possible examples Decomposition b, babba, aabbbab, aaabbbb Decomposition y = ubv where u and v both have an equal number of a’s and b’s Decompose the 4 strings above into u, b, v lbl, aabbbab, lbabba, aaabbbbl
Implication Case 1: x=ay Case 1 refined: x=aubv y has one extra b Case 1 refined: x=aubv u, v belong to EQUAL Production rule for this case?
Case 2: x=bz Case 2: x=bz Case 2 refined: x=buav z has one extra a Case 2 refined: x=buav u, v belong to EQUAL Production rule for this case?
Final Grammar EG = (V, S, S, P) V = {S} S = {a,b} S = S P:
EQUAL language Proving CFG is correct
Is our grammar correct? How do we prove our grammar is correct? Informal Test some strings Review logic behind program (CFG) design Formal First, show every string derived by EG belongs to EQUAL That is, show L(EG) is a subset of EQUAL Second, show every string in EQUAL can be derived by EG That is, show EQUAL is a subset of L(EG) Both proofs will be inductive proofs Inductive proofs and recursive algorithms go well together
L(EG) subset of EQUAL Let x be an arbitrary string in L(EG) What does this mean? S ==>*EG x Follows from definition of x in L(EG) We will prove the following If S ==>1EG x, then x is in EQUAL If S ==>2EG x, then x is in EQUAL If S ==>3EG x, then x is in EQUAL If S ==>4EG x, then x is in EQUAL ...
Base Case Statement to be proven: Base Case: For all n >= 1, if S ==>nEG x, then x is in EQUAL Prove this by induction on n Base Case: n = 1 What is the set of strings {x | S ==>1EG x}? What do we need to prove about this set of strings?
Inductive Case Inductive Hypothesis: For 1 <= j <= n, if S ==>jEG x, then x is in EQUAL Note, this is a “strong” induction hypothesis Traditional inductive hypothesis would take form: For some n >= 1, if S ==>nEG x, then x is in EQUAL The difference is we assume the basic hypothesis for all integers between 1 and n, not just n Statement to be Proven in Inductive Case: If S ==>n+1EG x, then x is in EQUAL
“Regular” induction vs Strong induction Infinite Set of Facts Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 … Base Case Prove fact 1 Regular inductive case For n >= 1, Fact n --> Fact n+1 Strong inductive case Fact 1 to Fact n --> Fact n+1
Visualization of Induction Regular Induction Strong Induction Fact 1 Fact 1 Fact 2 Fact 2 Fact 3 Fact 3 Fact 4 Fact 4 Fact 5 Fact 5 Fact 6 Fact 6 Fact 7 Fact 7 Fact 8 Fact 8 Fact 9 Fact 9 … …
Proving Inductive Case If S ==>n+1EG x, then x is in EQUAL Let x be an arbitrary string such that S ==>n+1EG x Examining EG, what are the three possible first derivation steps Case 1: S ==> ==>nEG x Case 2: S ==> ==>nEG x Case 3: S ==> ==>nEG x One of the cases is impossible. Which one and why?
Case 2: S ==> ==>nEG x This means x has the form aubv where What can we conclude about u (don’t apply IH)? What can we conclude about v (don’t apply IH)? Apply the inductive hypothesis u and v belong to EQUAL Why do we need the strong inductive hypothesis? Conclude x belongs to EQUAL x = aubv where u and v belong to EQUAL Clearly the number of a’s in x equals the number of b’s in x
Case 3: S ==> ==>nEG x This means x has the form buav where What can we conclude about u (no IH)? What can we conclude about v (no IH) Apply the inductive hypothesis u and v belong to EQUAL Why do we need the strong inductive hypothesis? Conclude x belongs to EQUAL x = buav where u and v belong to EQUAL Clearly the number of a’s in x equals the number of b’s in x
L(EG) subset of EQUAL Wrapping up inductive case Conclusion In all possible derivations of x, we have shown that x belongs to EQUAL Thus, we have proven the inductive case Conclusion By the principle of mathematical induction, we have shown that L(EG) is a subset of EQUAL
EQUAL subset of L(EG) Let x be an arbitrary string in EQUAL What does this mean? We will prove the following If |x| = 0 and x is in EQUAL, then x is in L(G) If |x| = 1 and x is in EQUAL, then x is in L(G) If |x| = 2 and x is in EQUAL, then x is in L(G) If |x| = 3 and x is in EQUAL, then x is in L(G) ...
EQUAL subset of L(EG) Statement to be proven: Base Case: For all n >= 0, if |x| = n and x is in EQUAL, then x is in L(EG) Prove this by induction on n Base Case: n = 0 What is the only string x such that |x|=0 and x is in EQUAL? Prove this string belongs to L(EG)
Inductive Case Inductive Hypothesis: For 0 <= j <= n, if |x| =j and x is in EQUAL, then x is in L(EG) Again, this is a “strong” induction hypothesis Statement to be Proven in Inductive Case: For n >= 0, if |x| = n+1 and x is in EQUAL, then x is in L(EG)
Proving Inductive Case If |x|=n+1 and x is in EQUAL, then x is in L(EG) Let x be an arbitrary string such that |x|=n+1 and x is in L(EG) Examining S, what are the two possibilities for the first character in x? Case 1: first character in x is Case 2: first character in x is In each case, what can we say about the remainder of x? Case 1: the remainder of x Case 2: the remainder of x
Case 1: x = ay What can we say about y in this case? This means x has the form aubv where u is in EQUAL and has length <= n v is in EQUAL and has length <= n Proving this statement true Consider all the prefixes of string y length 0: l length 1: y1 length 2: y1y2 … length n: y1y2 … yn = y
Case 1: x = ay Consider all the prefixes of string y length 0: l length 1: y1 length 2: y1y2 … length n: y1y2 … yn = y The first prefix l has the same number of a’s as b’s The last prefix y has one extra b The relative number of a’s and b’s changes in the length i prefix differs by only one from the length i-1 prefix Thus, there must be a first prefix t of y where t has one extra b Furthermore, the last character of t must be b Otherwise, t would not be the FIRST prefix of y with one extra b Break t into u and b and let the remainder of y be v The statement follows
Case 1: x = aubv * x = aubv Apply the induction hypothesis u is in EQUAL and has length <= n v is in EQUAL and has length <= n Apply the induction hypothesis What can we conclude from applying the IH? Why did we need a strong inductive hypothesis? Conclude x is in L(EG) by constructing a derivation S ==> aSbS ==>*EG aubS ==>*EG aubv
Case 2: x = buav x = buav Apply the induction hypothesis u is in EQUAL and has length <= n v is in EQUAL and has length <= n Apply the induction hypothesis What can we conclude about u and v? Conclude x is in L(EG) by constructing a derivation S ==> bSaS ==>*EG buaS ==>*EG buav Justify each of the steps in this derivation
EQUAL subset of L(EG) Wrapping up inductive case Conclusion For all possible first characters of x, we have shown that x belongs to L(EG) Thus, we have proven the inductive case Conclusion By the principle of mathematical induction, we have shown that EQUAL is a subset of L(EG)
Module 31 Closure Properties for CFL’s CFL’s versus regular languages Kleene Closure construction examples proof of correctness Others covered less thoroughly in lecture union, concatenation CFL’s versus regular languages regular languages subset of CFL
Closure Properties for CFL’s Kleene Closure
CFL closed under Kleene Closure Let L be an arbitrary CFL Let G1 be a CFG s.t. L(G1) = L G1 exists by definition of L1 in CFL Construct CFG G2 from CFG G1 Argue L(G2) = L* There exists CFG G2 s.t. L(G2) = L* L* is a CFL
Visualization L Let L be an arbitrary CFL Let G1 be a CFG s.t. L(G1) = L G1 exists by definition of L1 in CFL Construct CFG G2 from CFG G1 Argue L(G2) = L* There exists CFG G2 s.t. L(G2) = L* L* is a CFL L* G1 CFL CFG’s G2
Algorithm Specification Input CFG G1 Output CFG G2 such that L(G2) = (L(G1))* A CFG G1 CFG G2
Construction Input Output CFG G1 = (V1, S, S1, P1) V2 = V1 union {T} T is a new symbol not in V1 or S S2 = T P2 = P1 union ??
Closure Properties for CFL’s Kleene Closure Examples
Example 1 Input grammar: Output grammar V2 = V1 union {T} T is a new symbol not in V1 or S S2 = T P2 = P1 union {T --> ST | l} Input grammar: V = {S} S = {a,b} S = S P: S --> aa | ab | ba | bb Output grammar V = {S, T} S = {a,b} Start symbol is P:
Example 2 Input grammar: Output grammar V = {S, T} S = {a,b} V2 = V1 union {T} T is a new symbol not in V1 or S S2 = T P2 = P1 union {T --> ST | l} Input grammar: V = {S, T} S = {a,b} Start symbol is T P: T --> ST | l S --> aa | ab | ba | bb Output grammar V = {S, T, U} S = {a,b} Start symbol is P:
Closure Properties for CFL’s Kleene Closure Proof of Correctness
Is our construction correct? How do we prove our construction is correct? Informal Test some strings Review logic behind construction Formal First, show every string derived by G2 belongs to (L(G1))* That is, show L(G2) is a subset of (L(G1))* Second, show every string in (L(G1))* can be derived by G2 That is, show (L(G1))* is a subset of L(G2) Both proofs will be inductive proofs Inductive proofs and recursive algorithms go well together
L(G2) is a subset of (L(G1))* We want to prove the following If x in L(G2), then x is in (L(G1))* This is equivalent to the following If T ==>*G2 x, then x is in (L(G1))* The two statements are equivalent because x in L(G2) means that T ==>*G2 x We break the second statement down as follows: If T ==>1G2 x, then x is in (L(G1))* If T ==>2G2 x, then x is in (L(G1))* If T ==>3G2 x, then x is in (L(G1))* ...
L(G2) is a subset of (L(G1))* Statement to be proven: For all n >= 1, if T ==>nG2 x, then x is in (L(G1))* Prove this by induction on n Base Case: n = 1 Examining G2, what is the only string x such that T ==>1G2 x ? Prove this string is in (L(G1))*
Inductive Case Inductive Hypothesis: For 1 <= j <= n, if T ==>jG2 x, then x is in (L(G1))* Note, this is a “strong” induction hypothesis Statement to be Proven in Inductive Case: For n above, if T ==>n+1G2 x, then x is in (L(G1))* Proving this statement Let x be an arbitrary string such that T ==>n+1G2 x Examining G2, what are the two possible first derivation steps? Case 1: T ==>G2 ==>nG2 x Case 2: T ==>G2 ==>nG2 x
Case Analysis Case 1: T ==>G2 ==>n x is not possible Why not? Case 2: T ==>G2 ==>nG2 x This means x has the form uv where What can we say about u (no IH)? What can we say about v (no IH)? Applying the inductive hypothesis, what can we conclude?
Concluding Case 2: T ==>G2 ==>nG2 x Concluding string u belongs to L(G1) Follows from S ==>* G2 u and Our construction insures that all strings derived from S in L(G2) are also in L(G1) How do we conclude that x belongs to (L(G1))* Wrapping up inductive case In all possible derivations of x, we have shown that x belongs to (L(G1))* Thus, we have proven the inductive case Conclusion By the principle of mathematical induction, we have shown that L(G2) is a subset of (L(G1))*
(L(G1))* is a subset of L(G2) We want to prove the following If x is in (L(G1))*, then x is in L(G2) This is equivalent to the following If x is in (L(G1))*, then T ==>*G2 x The two statements are equivalent because x in L(G2) means that T ==>*G2 x We break the second statement down as follows: If x is in (L(G1))0, then T ==>*G2 x If x is in (L(G1))1, then T ==>*G2 x If x is in (L(G1))2, then T ==>*G2 x ...
(L(G1))* is a subset of L(G2) Statement to be proven: For all n >= 0, if x is in (L(G1))n, then x is in L(G2) Prove this by induction on n Base Case: n = 0 What is the only string x in (L(G1))0? Show this string belongs to L(G2)
Inductive Case Inductive Hypothesis: For n>=0, if x is in (L(G1))j, then T ==>*G2 x Note, this is a “normal” induction hypothesis Statement to be Proven in Inductive Case: For n>=0, if x is in (L(G1))n+1, then T ==>*G2 x Proving this statement Let x be an arbitrary string in (L(G1))n+1 This means x = uv where u in L(G1) What can we say about v?
Deriving x The inductive case follows x = uv where u is a string in L(G1) v is a string in Justify all the steps in the following derivation T ==> G2 ST ==>* G2 Sv ==>* G2 uv = x First step: Second step: Third step: Thus T ==>* G2 x The inductive case follows The result is proven by the principle of mathematical induction
Construction for Set Union Input CFG G1 = (V1, S, S1, P1) CFG G2 = (V2, S, S2, P2) Output CFG G3 = (V3, S, S3, P3) V3 = V1 union V2 union {T} Variable renaming to insure no names shared between V1 and V2 T is a new symbol not in V1 or V2 or S S3 = T P3 = P1 union P2 union {T --> }
Construction for Set Concatenation Input CFG G1 = (V1, S, S1, P1) CFG G2 = (V2, S, S2, P2) Output CFG G3 = (V3, S, S3, P3) V3 = V1 union V2 union {T} Variable renaming to insure no names shared between V1 and V2 T is a new symbol not in V1 or V2 or S S3 = T P3 = P1 union P2 union {T --> }
CFL’s and regular languages
CFL Closure Properties CFL’s are closed under Kleene closure Just proven, proof also in book CFL’s are closed under set union Proof in book CFL’s are closed under set concatenation What can we conclude from these 3 results? It follows that regular languages are a subset of CFL’s
Regular languages subset of CFL Recursive definition of regular languages Base Case: {}, {l}, {a}, {b} are regular languages over {a,b} P={}, P={S --> l}, P={S --> a}, P={S --> b} Inductive Case: If L1 and L2 are are regular languages, then L1*, L1L2, L1 union L2 are regular languages Use previous constructions to see that these resulting languages are also context-free
Other CFL Closure Properties We will show that CFL’s are NOT closed under many other set operations Examples include set complement set intersection set difference
Language class hierarchy All languages over alphabet S RE REC CFL REG Equal ? H H
Module 32 Pushdown Automata (PDA’s) definition Example We define configurations and computations of PDA’s We define L(M) for PDA’s
Definition and Motivating Example Pushdown Automata Definition and Motivating Example
Pushdown Automata (PDA) In this presentation we introduce the PDA model of computation (programming language). The key addition to a PDA (from an NFA-/\) is the addition of external memory in the form of an infinite capacity stack The word “pushdown” comes from the stacks of trays in cafeterias where you have to pushdown on the stack to add a tray to it.
NFA for {ambn | m,n >= 0} /\ a b I B C Consider the language {anbn | n >= 0}. This NFA can recognize strings which have the correct form, a’s followed by b’s. However, the NFA cannot remember the relative number of a’s and b’s seen at any point in time. What strings end up in each state of the above NFA? I: B: C:
PDA for {anbn | n >=0 } * /\ a b I B C NFA Imagine we now have memory in the form of a stack which we can use to help remember how many a’s we have seen by pushing onto and popping from the stack When we see an a in state I, we do the following two actions: 1) We push an a on the stack. 2) We stay in state I. When we see a b in state B, we do the following two actions: 1) We pop an a from the stack. 2) We stay in state B. From state B, we allow a /\-transition to state C only if 1) The stack is empty. Finally, when we begin, the stack should be empty. I B C b;pop a;push a /\;only if stack is empty PDA /\ Initialize stack to empty
Formal PDA definition PDA M = (Q, S, G, q0, Z, A, d) Modified elements G is the stack alphabet Z is a special character that is initially on the stack Often used to represent an empty stack d is modified as follows Pop to read the top character on the stack Stack update action What to push back on the stack If we push /\, then the net result of the action is a pop
Example PDA Q = {I, B, C} S = {a,b} G = {Z, a} q0 = I Z is the initial stack character A = {C} d: S a TopSt NS stack update I a a I push aa I a Z I push aZ I /\ a B push a I /\ Z B push Z B b a B push /\ B /\ Z C push Z I B C b;a;/\ a;a; aa a;Z; aZ /\;Z;Z /\;a;a Example PDA Initialize stack to only contain Z
Computing with PDA’s * Configurations change compared with NFA-/\’s Configuration components: current state remaining input to be processed stack contents Computations are essentially the same as with NFA-/\’s given the modified configurations Determining which transitions of a PDA can be applied to a given configuration is more complicated though
Computation Graph of PDA Computation graph for this PDA on the input string aabb Q = {I, B, C} S = {a,b} G = {Z, a} q0 = I Z is the initial stack character A = {C} d: S a TopSt NS stack update I a a I push aa I a Z I push aZ I /\ a B push a I /\ Z B push Z B b a B push /\ B /\ Z C push Z (I,aabb,Z) (I,abb,aZ) (B,aabb,Z) (B,abb,aZ) (I,bb,aaZ) (C,aabb,Z) (B,bb,aaZ) (B,b,aZ) (B,/\,Z) (C,/\,Z)
Definition of |- Input string aabb (I, aabb, Z) |- (I,abb,aZ) C b;a;/\ a;a; aa a;Z; aZ /\;Z;Z /\;a;a Definition of |- Input string aabb (I, aabb, Z) |- (I,abb,aZ) (I, aabb, Z) |- (B, aabb, Z) (I, aabb, Z) |-2 (C, aabb, Z) (I, aabb, Z) |-3 (B, bb, aaZ) (I, aabb, Z) |-* (B, abb, aZ) (I, aabb, Z) |-* (B, /\, Z) (I, aabb, Z) |-* (C, /\, Z) (I,aabb,Z) (C,aabb,Z) (B,bb,aaZ) (B,abb,aZ) (I,bb,aaZ) (I,abb,aZ) (B,aabb,Z) (B,b,aZ) (B,/\,Z) (C,/\,Z)
Acceptance and Rejection B C b;a;/\ a;a; aa a;Z; aZ /\;Z;Z /\;a;a Acceptance and Rejection Input string aabb M accepts string x if one of the configurations reached is an accepting configuration (q0, x, Z) |-* (f, /\, a),f in A, a in G* Stack contents can be anything M rejects string x if all configurations reached are either not halting configurations or are rejecting configurations (I,aabb,Z) (C,aabb,Z) (B,bb,aaZ) (B,abb,aZ) (I,bb,aaZ) (I,abb,aZ) (B,aabb,Z) (B,b,aZ) (B,/\,Z) (C,/\,Z) Not an accepting configuration since input not completely processed Not an accepting configuration since state is not accepting An accepting configuration
Defining L(M) and LPDA L(M) (or Y(M)) N(M) LPDA The set of strings ? Language L is in language class LPDA iff ? M accepts string x if one of the configurations reached is an accepting configuration (q0, x, Z) |-* (f, /\, a),f in A, a in G* Stack contents can be anything M rejects string x if all configurations reached are either not halting configurations or are rejecting configurations
Deterministic PDA’s A PDA is deterministic if its transition function satisfies both of the following properties For all q in Q, a in S union {/\}, and X in G, the set d(q,a,X) has at most one element For all q in Q and X in G, if d(q, /\, X) < > { }, then d(q,a,X) = { } for all a in S A computation graph is now just a path again Our default assumption is that PDA’s are nondeterministic
Two forms of nondeterminism Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a Z q0 aa 3 q0 /\ Z q0 aZ 4 q0 a Z q0 aa
LPDA and DCFL A language L is in language class LPDA if and only if there exists a PDA M such that L(M) = L A language L is in language class DCFL (Deterministic Context-Free Languages) if and only if there exists a deterministic PDA M such that L(M) = L To be proven LPDA = CFL CFL is a proper superset of DCFL
PDA Comments Note, we can use the stack for much more than just a counter See examples in chapter 7 for some details
Module 33 Pushdown Automata (PDA’s) Another example
Palindromes Let PAL be the set of palindromes over {a,b} Let PAL1 be the following related language: {wcwr | w consists only of a’s and b’s} we add c to the input alphabet as a special “marker” character Strings in PAL1 aca, bcb, abcba, aabcbaa, c strings not in PAL1 aaca, aaccaa, abccba, abcb, abba Let PAL2 be the set of even length palindromes {wwr | w consists only of a’s and b’s}
PAL1 Lets first construct a PDA for PAL1 Basic ideas Have one state remember first “half” of string Have one state “match” second half of string to first half Transition between these two states when the first c is encountered
PDA for PAL1 M = (Q, S, G, q0, Z, A, d) Q = {q0, qm, qf} S = {a, b, c} G = {Z, a, b} q0 = q0 Z = Z A = {qf}
Transition Function First three transitions push a on top of the stack Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 c Z qm Z 8 q0 c a qm a 9 q0 c b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z First three transitions push a on top of the stack Second three transitions push b on the stack Third three transitions switch state q0 to qm No change to stack Transitions 10 and 11 “match” characters from first and last half of input string
Notation comment We might represent transition 1 in two other ways Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 c Z qm Z 8 q0 c a qm a 9 q0 c b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z We might represent transition 1 in two other ways d(q0,a,Z) = (q0, aZ) (q0, a, Z, q0, aZ) Question Is this PDA deterministic?
Computation Graph 1 (q0, abcba, Z) (q0, bcba, aZ) (q0, cba, baZ) Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 c Z qm Z 8 q0 c a qm a 9 q0 c b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z (q0, abcba, Z) (q0, bcba, aZ) (q0, cba, baZ) (qm, ba, baZ) (qm, a, aZ) (qm, l, Z) (qf, l, Z)
Computation Graph 2 (q0, abcab, Z) (q0, bcab, aZ) (q0, cab, baZ) Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 c Z qm Z 8 q0 c a qm a 9 q0 c b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z (q0, abcab, Z) (q0, bcab, aZ) (q0, cab, baZ) (qm, ab, baZ)
Computation Graph 3 (q0, acab, Z) (q0, cab, aZ) (qm, ab, aZ) Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 c Z qm Z 8 q0 c a qm a 9 q0 c b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z (q0, acab, Z) (q0, cab, aZ) (qm, ab, aZ) (qm, b, Z) (qf, b, Z)
PAL2 Lets now construct a PDA for PAL What is harder this time? When do we switch from putting strings on the stack to matching? Example After seeing aab, should we switch to match mode or stay in stack mode? Solution Do both using nondeterminism
PDA for PAL2 M = (Q, S, G, q0, Z, A, d) Q = {q0, qm, qf} S = {a, b} G = {Z, a, b} q0 = q0 Z = Z A = {qf}
Transition Relation First three transitions push a on top of the stack Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 l Z qm Z 8 q0 l a qm a 9 q0 l b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z First three transitions push a on top of the stack Second three transitions push b on the stack Third three transitions switch state q0 to qm Is the PDA deterministic or nondeterministic?
Computation Graph 1 Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 l Z qm Z 8 q0 l a qm a 9 q0 l b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z (q0, abba, Z) (q0, bba, aZ) (qm, abba, Z) (q0, ba, baZ) (qm, bba, aZ) (qf, abba, Z) (q0, a, bbaZ) (qm, ba, baZ) (q0, l, abbaZ) (qm, a, bbaZ) (qm, a, aZ) (qm, l, abbaZ) (qm, l, Z) (qf, l, Z)
Computation Graph 2 Trans Current Input Top of Next Stack # State Char. Stack State Update ------------------------------------------------------- 1 q0 a Z q0 aZ 2 q0 a a q0 aa 3 q0 a b q0 ab 4 q0 b Z q0 bZ 5 q0 b a q0 ba 6 q0 b b q0 bb 7 q0 l Z qm Z 8 q0 l a qm a 9 q0 l b qm b 10 qm a a qm l 11 qm b b qm l 12 qm l Z qf Z (q0, aba, Z) (q0, ba, aZ) (qm, aba, Z) (q0, a, baZ) (qm, ba, aZ) (qf, aba, Z) (q0, l, abaZ) (qm, a, baZ) (qm, l, abaZ)
PAL Challenge Construct a PDA for PAL First step Then Construct a PDA for odd length palindromes Then Combine PDA’s for odd length and even length palindromes
Module 34 CFG --> PDA construction Shows that for any CFL L, there exists a PDA M such that L(M) = L The reverse is true as well, but we do not prove that here
CFL subset LPDA Let L be an arbitrary CFL Let G be the CFG such that L(G) = L G exists by definition of L is CF Construct a PDA M such that L(M) = L M is constructed from CFG G Argue L(M) = L There exists a PDA M such that L(M) = L L is in LPDA By definition of L in LPDA
Visualization L L CFL LPDA G M CFG’s PDA’s Let L be an arbitrary CFL Let G be the CFG such that L(G) = L G exists by definition of L is CF Construct a PDA M such that L(M) = L M is constructed from CFG G Argue L(M) = L There exists a PDA M such that L(M) = L L is in LPDA By definition of L in LPDA G CFL LPDA CFG’s PDA’s M
Algorithm Specification Input CFG G Output PDA M such that L(M) = L(G) A CFG G PDA M
Construction Idea The basic idea is to have a 2-phase PDA Phase 1: Derive all strings in L(G) on the stack nondeterministically Do not process any input while we are deriving the string on the stack Phase 2: Match the input string against the derived string on the stack This is a deterministic process Move to an accepting state only when the stack is empty
Illustration Input Grammar G What is L(G)? V = {S} S = {a,b} S = S P: 1. Derive all strings in L(G) on the stack 2. Match the derived string against input Input Grammar G V = {S} S = {a,b} S = S P: S --> aSb | l What is L(G)? Illustration of how the PDA might work, though not completely accurate. (q0, aabb, Z) /* put S on stack */ (q1, aabb, SZ) /* derive aabb on stack */ (q1, aabb, aSbZ) (q1, aabb, aaSbbZ) (q1, aabb, aabbZ) /* match stack vs input */ (q2, aabb, aabbZ) (q2, abb, abbZ) (q2, bb, bbZ) (q2, b, bZ) (q2,l, Z) (q3, l, Z)
Difficulty 1. Derive all strings in L(G) on the stack 2. Match the derived string against input (q0, aabb, Z) /* put S on stack */ (q1, aabb, SZ) /* derive aabb on stack */ (q1, aabb, aSbZ) (q1, aabb, aaSbbZ) (q1, aabb, aabbZ) /* match stack vs input */ (q2, aabb, aabbZ) (q2, abb, abbZ) (q2, bb, bbZ) (q2, b, bZ) (q2,l, Z) (q3, l, Z) What is illegal with the computation graph on the left?
Construction Input Grammar Output PDA d: G=(V,S, S, P) M=(Q, S, G, q0, Z, F, d) Q = {q0, q1, q2} S = S G = V union S union {Z} Z = Z q0 = q0 F = {q2} d: d(q0, l, Z) = (q1, SZ) d(q1, l, Z) = (q2, Z) For all productions A --> a d(q1, l, A) = (q1, a) For all a in S d(q1, a, a) = (q1, l)
Examples
Palindromes PALG: Output PDA M=(Q,S,G,q0,Z,F,d) d: d(q0, l, Z) = (q1, SZ) d(q1, l, Z) = (q2, Z) Production Transitions d(q1, l, S) = (q1, aSa) d(q1, l, S) = (q1, bSb) d(q1, l, S) = (q1, a) d(q1, l, S) = (q1, b) d(q1, l, S) = (q1, l) Matching transitions d(q1, a, a) = (q1, l) d(q1, b, b) = (q1, l) PALG: V = {S} S = {a,b} S = S P: S --> aSa | bSb | a | b | l Output PDA M=(Q,S,G,q0,Z,F,d) Q = {q0, q1, q2} G = {a,b,S,Z} q0 = q0 Z = Z F = {q2} d:
Palindrome Transition Table Transition Current Input Top of Next Stack Number State Symbol Stack State Update --------------------------------------------------------------------------------- 1 q0 l Z q1 SZ 2 q1 l Z q2 Z 3 q1 l S q1 aSa 4 q1 l S q1 bSb 5 q1 l S q1 a 6 q1 l S q1 b 7 q1 l S q1 l 8 q1 a a q1 l 9 q1 b b q1 l
Partial Computation Graph (q0, aba, Z) (q1, aba, SZ) (q1, aba, aSaZ) (other branches not shown) (q1, ba, SaZ) (q1, ba, baZ) (other branches not shown) (q1, a, aZ) (q1, l, Z) (q2, l, Z) On your own, draw computation trees for other strings not in the language and see that they are not accepted.
{anbn | n >= 0} Grammar G: Output PDA M=(Q,S,G,q0,Z,F,d) d: d(q0, l, Z) = (q1, SZ) d(q1, l, Z) = (q2, Z) Production Transitions Matching transitions Grammar G: V = {S} S = {a,b} S = S P: S --> aSb | l Output PDA M=(Q,S,G,q0,Z,F,d) Q = {q0, q1, q2} G = {a,b,S,Z} q0 = q0 Z = Z F = {q2} d:
{anbn | n >= 0} Transition Table Transition Current Input Top of Next Stack Number State Symbol Stack State Update --------------------------------------------------------------------------------- 1 q0 l Z 2 q1 l Z 3 q1 l S 4 q1 l S 5 q1 a a 6 q1 b b
Partial Computation Graph (q0, aabb, Z) (q1, aabb, SZ) (q1, aabb, aSbZ) (other branch not shown) (q1, abb, SbZ) (q1, abb, aSbbZ) (other branch not shown) (q1, bb, SbbZ) (q1, bb, bbZ) (other branch not shown) (q1, b, bZ) (q1, l, Z) (q2, l, Z)
{aibj | i = j or i = 2j} d Grammar G: Output PDA M=(Q,S,G,q0,Z,F,d) d(q0, l, Z) = (q1, SZ) d(q1, l, Z) = (q2, Z) Production Transitions d(q1, l, S) = (q1, T) d(q1, l, S) = (q1, U) d(q1, l, T) = (q1, aTb) d(q1, l, T) = (q1, l) d(q1, l, U) = (q1, aaUb) d(q1, l, U) = (q1, l) Matching transitions d(q1, a, a) = (q1, l) d(q1, b, b) = (q1, l) Grammar G: V = {S,T,U} S = {a,b} S = S P: S --> T | U T --> aTb | l U --> aaUb | l Output PDA M=(Q,S,G,q0,Z,F,d) Q = {q0, q1, q2} G = {a,b,S,T,U,Z} q0 = q0 Z = Z F = {q2}
{aibj | i = j or i = 2j} Transition Table Transition Current Input Top of Next Stack Number State Symbol Stack State Update --------------------------------------------------------------------------------- 1 q0 l Z q1 SZ 2 q1 l Z q2 Z 3 q1 l S q1 T 4 q1 l S q1 U 5 q1 l T q1 aTb 6 q1 l T q1 l 7 q1 l U q1 aaUb 8 q1 l U q1 l 9 q1 a a q1 l 10 q1 b b q1 l
Partial Computation Graph (q0, aab, Z) (q1, aab, SZ) (q1, aab, UZ) (other branch not shown) (q1, aab, aaUbZ) (other branch not shown) (q1, ab, aUbZ) (q1, b, UbZ) (q1, b, bZ) (other branch not shown) (q1, l, Z) (q2, l, Z)
Things you should be able to do You should be able to execute this algorithm Given any CFG, construct an equivalent PDA You should understand the idea behind this algorithm Derive string on stack and then match it against input You should understand how this construction can help you design PDA’s You should understand that it can be used in answer-preserving input transformations between decision problems about CFL’s.
Module 35 Attempt to prove that CFL’s are closed under intersection Review previous constructions Translate previous constructions to current setting Prove modified result
High Level Overview
CFL closed under set intersection Let L1 and L2 be arbitrary CFL’s Let M1 and M2 be PDA’s s.t. L(M1) = L1, L(M2) = L2 M1 and M2 exist by definition of L1 and L2 are CFL’s and the fact that for every CFG, there is an equivalent PDA Construct PDA M3 from PDA’s M1 and M2 Argue L(M3) = L1 intersect L2 There exists a PDA M3 s.t. L(M3) = L1 intersect L2 L1 intersect L2 is a CFL
Visualization L1 L1 intersect L2 L2 CFL M1 M3 M2 PDA’s Let L1 and L2 be arbitrary CFL’s Let M1 and M2 be PDA’s s.t. L(M1) = L1, L(M2) = L2 M1 and M2 exist by definition of L1 and L2 are CFL’s and the fact that for every CFG, there is an equivalent PDA Construct PDA M3 from PDA’s M1 and M2 Argue L(M3) = L1 intersect L2 There exists a PDA M3 s.t. L(M3) = L1 intersect L2 L1 intersect L2 is a CFL L1 L2 L1 intersect L2 M1 M2 CFL PDA’s M3
Algorithm Specification Input Two PDA’s M1 and M2 Output PDA M3 such that L(M3) = L(M1) intersection L(M2) PDA M1 PDA M2 A PDA M3
Review Previous Results
Underlying Idea Previous Results recursive languages are closed under set intersection r.e. languages are closed under set intersection regular languages are closed under set intersection What is the idea underlying the constructions used to prove these previous results?
Implementation with FSA’s * Given the basic idea underlying these constructions, how was this idea implemented in when dealing with FSA’s? That is, restate the construction used to prove that the regular languages are closed under set intersection. Specify the output FSA in terms of the input FSA’s
Applying previous approach to PDA’s
Applying approach to PDA’s * Given the basic idea underlying these constructions, try and implement this idea in a construction working with PDA’s rather than FSA’s. That is, give a construction specifying how the output PDA is built out of the input PDA’s
Problem Describe what goes wrong when applying this idea to PDA’s instead of FSA’s. Does this prove that CFL’s are NOT closed under set intersection?
Modified Result * What happens if the inputs are 1 FSA and 1 PDA? What modified result does the resulting construction prove?
Module 36 Non context-free languages Pumping lemma for CFL’s Examples and Intuition Pumping lemma for CFL’s Pumping condition No proof of pumping lemma Applying pumping lemma to prove that some languages are not CFL’s
Examples and Intuition
Examples What are some examples of nonregular languages? Can we build on any of these languages to create a non context-free language?
Intuition Try and prove that these languages are CFL’s and identify the stumbling blocks Why can’t we construct a CFG to generate this language? Why can’t we construct a PDA to accept this language? Compare to similar CFL languages to try and identify differences.
Pumping Lemma for CFL’s
Comparison to regular language pumping lemma/condition
What’s different about CFL’s than regular languages? * In regular languages, a single substring “pumps” Consider the language of even length strings over {a,b} We can identify a single substring which can be pumped In CFL’s, multiple substrings can “pump” Consider the language {anbn | n > 0} No single substring can be pumped and allow us to stay in the language However, there do exist pairs of substrings which can be pumped resulting in strings which stay in the language This results in a modified pumping condition
Modified Pumping Condition A language L satisfies the regular language pumping condition if: there exists an integer n > 0 such that for all strings x in L of length at least n there exist strings u, v, w such that x = uvw and |uv| <= n and |v| >= 1 and For all k >= 0, uvkw is in L A language L satisfies the CFL pumping condition if: there exists an integer n > 0 such that for all strings x in L of length at least n there exist strings u, v, w, y, z such that x = uvwyz and |vwy| <= n and |vy| >= 1 and For all k >= 0, uvkwykz is in L
Pumping Lemma All CFL’s satisfy the CFL pumping condition CFL’s All languages over {a,b} “Pumping Languages” CFL’s
Pumping Implications CFL We can use the pumping lemma to prove a language L is not a CFL Show L does not satisfy the CFL pumping condition We cannot use the pumping lemma to prove a language is context-free Showing L satisfies the pumping condition does not guarantee that L is context-free
Pumping Lemma What does it mean?
Pumping Condition A language L satisfies the CFL pumping condition if: there exists an integer n > 0 such that for all strings x in L of length at least n there exist strings u, v, w, y, z such that x = uvwyz and |vwy| <= n and |vy| >= 1 and For all k >= 0, uvkwykz is in L
v and y can be pumped 1) x in L 2) x = uvwyz 3) For all k >= 0, uvkwykz is in L Let x = abcdefg be in L Then there exist 2 substrings v and y in x such that v and y can be repeated (pumped) in place any number of times and the resulting string is still in L uvkwykz is in L for all k >= 0 For example v = cd and y = f uv0wy0z = uwz = abeg is in L uv1wy1z = uvwyz = abcdefg is in L uv2wy2z = uvvwyyz = abcdcdeffg is in L uv3wy3z = uvvvwyyyz = abcdcdcdefffg is in L …
What the other parts mean A language L satisfies the CFL pumping condition if: there exists an integer n > 0 such that Since we skip this proof, we will not see what n really means for all strings x in L of length at least n x must be in L and have sufficient length there exist strings u, v, w, y, z such that x = uvwyz and |vwy| <= n and v and y are contained within n characters of x Note: these are NOT necessarily the first n characters of x |vy| >= 1 and v and y cannot both be l, One of them might be l, but not both For all k >= 0, uvkwykz is in L
Example Let L be the set of palindromes over {a,b} Let x = aabaa Let n = 3 What are the possibilities for v and y ignoring the pumping constraint? Which ones satisfy the pumping lemma?
Applying it to prove a specific language L is not context-free Pumping Lemma Applying it to prove a specific language L is not context-free
How we use the Pumping Lemma We choose a specific language L For example, {ajbjcj | j > 0} We show that L does not satisfy the pumping condition We conclude that L is not context-free
Showing L “does not pump” A language L satisfies the CFL pumping condition if: there exists an integer n > 0 such that for all strings x in L of length at least n there exist strings u, v, w, y, z such that x = uvwyz and |vwy| <= n and |vy| >= 1 and For all k >= 0, uvkwykz is in L A language L does not satisfy the CFL pumping condition if: for all integers n of sufficient size there exists a string x in L of length at least n such that for all strings u, v, w, y, z such that x = uvwyz and |vwy| <= n and |vy| >= 1 There exists a k >= 0 such that uvkwykz is not in L
Example Proof Proof that L = {aibici | i>0} does not satisfy the CFL pumping condition Let n be the integer from the pumping lemma Choose x = anbncn Consider all strings u, v, w, y, z s.t. x = uvwyz and |vwy| <= n and |vy| >= 1 Argue that uvkwykz is not in L for some k >= 0 Argument must apply to all possible u,v,w,y,z Continued on next slide A language L does not satisfy the CFL pumping condition if: for all integers n of sufficient size there exists a string x in L of length at least n such that for all strings u, v, w, y, z such that x = uvwyz and |vwy| <= n and |vy| >= 1 There exists a k >= 0 such that uvkwykz is not in L
Example Proof Continued Proof that L = {aibici | i>0} does not satisfy the CFL pumping condition Let n be the integer from the pumping lemma Choose x = anbncn Consider all strings u, v, w, y, z s.t. x = uvwyz and |vwy| <= n and |vy| >= 1 Argue that uvkwykz is not in L for some k >= 0 Argument must apply to all possible u,v,w,y,z Continued next column Identify possible cases for vwy What is impossible for vwy? Case 1 vwy contains no a’s Case 2 vwy contains no c’s Must argue uvkwykz is not in L for both cases described above Can use different values of k Continued on next slide
Example Proof Continued Identify possible cases for vwy What is impossible for vwy? Case 1 vwy contains no a’s Case 2 vwy contains no c’s Must argue uvkwykz is not in L for both cases described above Can use different values of k Continued next column Case 1: vwy contains no a’s vy contains at least 1 b or c follows from vwy contains no a’s and |vy| >= 1 uwz is not in L uwz has n a’s follows from fact vwy contains no a’s and x originally had n a’s uwz has fewer than n b’s or fewer than n c’s follows from vy contains at least 1 b or c and x originally only had n b’s and n c’s Continued next slide
Example Proof Continued Case 1: vwy contains no a’s vy contains at least 1 b or c follows from vwy contains no a’s and |vy| >= 1 uwz is not in L uwz has n a’s follows from fact vwy contains no a’s and x originally had n a’s uwz has fewer than n b’s or fewer than n c’s follows from vy contains at least 1 b or c and x originally only had n b’s and n c’s Continued next column Case 2: vwy contains no c’s vy contains at least uv2wy2z is not in L uv2wy2z has n c’s follows from fact vwy contains no c’s and x originally had n c’s uv2wy2z has more than n a’s or more than n b’s follows from vy contains at least 1 a or b and x originally has n a’s and n b’s Continued next slide
Example Proof Completed Case 2: vwy contains no c’s vy contains at least uv2wy2z is not in L uv2wy2z has n c’s follows from fact vwy contains no c’s and x originally had n c’s uv2wy2z has more than n a’s or more than n b’s follows from vy contains at least 1 a or b and x originally has n a’s and n b’s Continued next column For all possible u, v, w, y, z, we have shown there exists a k>=0 such that uvkwykz is not in L Note, we used a different value of k for each case (though we didn’t have to) Therefore L does not satisfy the CFL pumping condition There L is not a CFL
Other example languages TWOCOPIES = {ww | w is in {a,b}* } abbabb is in TWOCOPIES but abaabb is not EQUAL3 = the set of strings over {a, b, c} such that the number of a’s equals the number of b’s equals the number of c’s {aibjck | i < j < k}
Pumping Lemma Two rules of thumb
Two Rules of Thumb Try to use blocks of at least n characters in x For TWOCOPIES, choose x = anbnanbn rather than anbanb Guarantees v and y cannot be in more than 2 blocks of x Try k=0 or k=2 k=0 This reduces number of occurrences of v and y k=2 This increases number of occurrences of v and y
Summary We use the Pumping Lemma to prove a language is not a CFL Note, does not work for all non CFL languages Can be strengthened to Ogden’s Lemma In book Choosing a good string x is first key step Choosing a good k is second key step Typically have several cases for v, w, y
Module 37 Showing CFL’s not closed under set intersection and set complement
Nonclosure Properties for CFL’s
CFL’s not closed under set intersection How can we prove that CFL’s are not closed under set intersection?
Counterexample What is a possible L1 intersect L2? What non-CFL languages do we know? What could L1 and L2 be? L1 = L2 = How can we prove that L1 and L2 are context-free?
CFL’s not closed under complement How can we prove that CFL’s are not closed under complement? Another way Use fact that any language class which is closed under union and complement must also be closed under intersection
Language class hierarchy All languages over alphabet S RE REC CFL REG Equal Equal-3 H H