Nonregular languages & the pumping lemma CS 350 — Fall 2018 gilray.org/classes/fall2018/cs350/
Is the language L regular Is the language L regular? Can we model it using a finite automaton (NFA/DFA)? 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ}
Essentially the same as the language of matching parenthesis! Is the language L regular? Can we model it using a finite automaton (NFA/DFA)? 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} Essentially the same as the language of matching parenthesis! 𝐿 ′ ={ ( 𝑛 ) 𝑛 |𝑛∈ℕ}
aaaaaaaaaaaaaaabbbbbbbbbbbbbbb The DFA state after reading this prefix, must encode , an unbounded number. 𝑚 𝑚 aaaaaaaaaaaaaaabbbbbbbbbbbbbbb 𝑛
aaaaaaaaaaaaaaabbbbbbbbbbbbbbb The number of states in the DFA modeling L would thus need to be countably infinite. (This reasoning is informal—the forthcoming pumping lemma will help make it rigorous.) aaaaaaaaaaaaaaabbbbbbbbbbbbbbb 𝑛 Just as here, the DFA needs to record that bs must be encountered to accept. 𝑛
The pumping lemma (for regular languages): (a necessary, but not sufficient, condition for regular languages) If L is a regular language, then there exists some non-zero constant length, p ∈ ℕ ∧ p > 0, such that any string w in the language of at least length p, w ∈ L ∧ |w| ≥ p, can be subdivided into three substrings with a non-empty middle substring, w = xyz ∧ y≠ε, such that p ≥ |xy| and all strings made instead from zero or more repetitions of the middle substring, y, must also be in the language: { xynz | n ∈ ℕ } ⊆ L.
All finite languages are regular! Language {s1, …, sn} can be encoded as regular expression: s1 | s2 | s3 | … | sn For example, the language of strings a…z with 1 character removed: bcd…xyz | ac….yz | abde…yz | … | ab…wxz | a…vwxy
All finite languages are regular! b 𝜖 𝜖 a 𝜖 𝜖 𝜖 a 𝜖 𝜖 𝜖
Note: for a finite language, we do not require kleene star to define a regular expression and we do not need loops to define a finite automaton! (In fact, these are our only way of defining a language of infinitely many strings using an RE/DFA/NFA!) Thus the language defined by a RE/NFA/DFA may contain infinitely many strings, but if-so, all sufficiently long strings must exhibit a pattern/regularity in how the language generates an infinite number of strings—a pattern that may be characterized by looping/repetition.
The pumping lemma (for regular languages): If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
The pumping lemma in terms of a DFA We fix p to be the number of states in the DFA q0 qF w Some path must accept a string w of length at least p (or language L is finite)
The pumping lemma in terms of a DFA To accept the string w, must traverse a loop at least once. wp-1 w1 q2 q0 qp-1 q1 . . . qp-2 w0 wp-2 This is due to the pigeonhole principle. Avoiding a loop as long as possible, we could cross p-1 edges to p distinct states—the pth letter in w (wp-1), however, must still return to some previously seen state.
The pumping lemma in terms of a DFA there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| ∧ |xyz| ≥ p such that, for all n ∈ ℕ, xynz ∈ L q0 qF w = xyz We know this must be true because—by the pigeonhole principle—there must be some state (need not be qF), call it qy, that is visited more than once!
The pumping lemma in terms of a DFA there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| ∧ |xyz| ≥ p such that, for all n ∈ ℕ, xynz ∈ L y q0 qF z qy x We know this must be true because—by the pigeonhole principle—there must be some state, call it qy, that is visited more than once!
The pumping lemma as a proposition If L is a regular language, then ∃𝑝∈ℕ.𝑝>0 ∀𝑤∈𝐿.|𝑤|≥𝑝 ∃𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∀𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∈𝐿
Contrapositive of the pumping lemma! ¬∃𝑝∈ℕ.𝑝>0 ∀𝑤∈𝐿.|𝑤|≥𝑝 ∃𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∀𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∈𝐿 Then L is not a regular language!
Contrapositive of the pumping lemma! ∀𝑝∈ℕ.𝑝>0 ¬∀𝑤∈𝐿.|𝑤|≥𝑝 ∃𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∀𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∈𝐿 Then L is not a regular language!
Contrapositive of the pumping lemma! ∀𝑝∈ℕ.𝑝>0 ∃𝑤∈𝐿.|𝑤|≥𝑝 ¬∃𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∀𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∈𝐿 Then L is not a regular language!
Contrapositive of the pumping lemma! ∀𝑝∈ℕ.𝑝>0 ∃𝑤∈𝐿.|𝑤|≥𝑝 ∀𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ¬∀𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∈𝐿 Then L is not a regular language!
Contrapositive of the pumping lemma! ∀𝑝∈ℕ.𝑝>0 ∃𝑤∈𝐿.|𝑤|≥𝑝 ∀𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∃𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∉𝐿 Then L is not a regular language!
∀𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 The pumping lemma as an adversarial game “For all” means that we must remain general & assume an adversary may pick any value for p ∀𝑝∈ℕ.𝑝>0 ∃𝑤∈𝐿.|𝑤|≥𝑝 ∀𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∃𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∉𝐿 Then L is not a regular language!
∀𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 The pumping lemma as an adversarial game ∀𝑝∈ℕ.𝑝>0 “There exists” here means we may select a particular string w to suite our proof strategy. ∃𝑤∈𝐿.|𝑤|≥𝑝 ∀𝑥,𝑦,𝑧∈ Σ ∗ .𝑤=𝑥𝑦𝑧∧|𝑦|>0∧|𝑥𝑦|≤𝑝 ∃𝑛∈ℕ.𝑥 𝑦 𝑛 𝑧∉𝐿 Then L is not a regular language!
Try an example: use the pumping lemma to show that the language L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ}
Try an example: use the pumping lemma to show that the language L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} First, without loss of generality, we assume the pumping length is p. If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
Try an example: use the pumping lemma to show that the language L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} Second, to show the lemma is not true of all sufficiently long strings w in L, it suffices to show there exists a sufficiently long string w in L for which the lemma does not hold. (Recall De Morgan’s Laws.) If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
Try an example: use the pumping lemma to show that the language L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} We choose the string apbp as a sufficiently long string w, where the pumping lemma should hold (if L were regular), but does not. If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
Try an example: use the pumping lemma to show that the language L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} Thus, for the pumping lemma to hold, we must be able to decompose apbp into substrings xyz such that |xy| is not greater than p, y is nonempty, and y can be repeated any number of times. If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
aaaaaaaaaaaaaaabbbbbbbbbbbbbbb Try an example: use the pumping lemma to show that the language L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} 𝑝≥|𝑥𝑦| aaaaaaaaaaaaaaabbbbbbbbbbbbbbb |𝑥| |𝑦|>0 If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
aaaaaaaaaaaaaaabbbbbbbbbbbbbbb There we cannot remove y, or “pump” it more than once without upsetting the balance of as and bs! Therefore, L is not regular. 𝐿={ a 𝑛 b 𝑛 |𝑛∈ℕ} 𝑝≥|𝑥𝑦| aaaaaaaaaaaaaaabbbbbbbbbbbbbbb |𝑥| |𝑦|>0 If L is a regular language, then there exists a pumping length p ∈ ℕ ∧ p > 0, such that for all strings w where w ∈ L ∧ |w| ≥ p, there exists strings x,y,z ∈ Σ* where w = xyz ∧ y≠ε ∧ p ≥ |xy| such that, for all n ∈ ℕ, xynz ∈ L
𝐿={ 𝑠 1 … 𝑠 𝑘 |𝑘∈ℕ∧ 𝑠 𝑖 ∈{a,b}∧ 𝑠 𝑗 = 𝑠 𝑘+1−𝑗 } Try an example: use the pumping lemma to show that the language, L, of all palindromes over alphabet {a,b} is not regular. 𝐿={ 𝑠 1 … 𝑠 𝑘 |𝑘∈ℕ∧ 𝑠 𝑖 ∈{a,b}∧ 𝑠 𝑗 = 𝑠 𝑘+1−𝑗 }
𝐿={ 𝑠 1 … 𝑠 𝑘 |𝑘∈ℕ∧ 𝑠 𝑖 ∈{a,b}∧ 𝑠 𝑗 = 𝑠 𝑘+1−𝑗 } Try an example: use the pumping lemma to show that the language, L, of all palindromes over alphabet {a,b} is not regular. 𝐿={ 𝑠 1 … 𝑠 𝑘 |𝑘∈ℕ∧ 𝑠 𝑖 ∈{a,b}∧ 𝑠 𝑗 = 𝑠 𝑘+1−𝑗 } We assume L is regular and, without loss of generality, that the pumping length is some positive integer p. Select string w to be apbap. If the pumping lemma held, it would be the case that w = apbap = xyz where |xy| is no larger than p, |y| is nonzero, and xyyz is also in language L. Since string y must be am for some positive integer m, the string xyyz would be ap+|y|bap which cannot be a palindrome as the string’s only b is off-center.