Instructor: Aaron Roth

Instructor: Aaron Roth aaroth@cis.upenn.edu
CIS 262 Automata, Computability, and Complexity Spring Instructor: Aaron Roth Lecture: February 18, 2019

Regular Expressions: Definition
e is a regular expression Only the empty string matches this reg-ex: L(e) = { e } F is a regular expression No string matches this reg-ex: L(F) = { } For each symbol s in S, s is a regular expression The only string matching reg-ex s is the string s itself: L(s) = { s } If r is a regular expression, so is ( r ) Parantheses used only for parsing: L( ( r) ) = L(r)

Regular Expressions: Definition
5. If r and r’ are regular expressions, then so is r.r’ A string w matches r.r’ if it can be split in two parts w=u.v such that u matches r and v matches r’ That is, L(r.r’) = L(r) . L(r’) 6. If r and r’ are regular expressions, then so is r U r’ A string matches r U r’ if it matches either r or r’ L(r U r’) = L(r) U L(r’) 7. If r is a regular expression, then so is r* A string w matches r* if w can be split into multiple (0 or more) parts such that each part matches r: L(r*) = L(r)*

From Regular Expressions to NFAs
Goal: Given a regular expression r, construct an e-NFA M(r) that accepts the language L(r) Construction by induction on the structure of r r equals e r equals F r equals a r equals ( r’ ) : M(r) is same as M(r’) a

From Regular Expressions to NFA
r equals r1.r2 Build M(r1) Build M(r2)

r equals r1.r2 Build M(r) from M(r1) and M(r2) using concatenation construction M(r1) M(r2) e e

6. r equals r1 U r2 Build M(r1) Build M(r2)

6. r equals r1 U r2 Build M(r) from M(r1) and M(r2) by adding a new initial state M(r1) e M(r2) e

7. r equals r’ * Build M(r’) Apply Kleene-* construction e e e

Example Translation (a b)* (a U e ) (a b) (a U e) (a b)* (a b)*(a U e)

Regular Expression Compilation: Summary
Regular expression r e-NFA M(r) DFA M’(r) For every regular expression r, L(r) is a regular language

Regular Expression Compilation: Complexity
How many states does e-NFA M(r) have in terms of size of r ? Observe: each base case NFA has one or two states, and handling each operator adds zero or one more new state Number of states in M(r) is about the same as size of r, O(|r|) Determinization using subset construction causes exponential blow-up: Size of DFA M’(r) is about 2k, where k is size of r This is the best we can hope for: regular expressions can described the desired language much more succinctly compared to DFAs

Do we have enough operators ?
Our definition of regular expressions uses the operations of union, concatenation, and Kleene-* Do we have enough operators, or do we need to include more as core operations? Can every regular language be captured by a regular expression? Given a DFA M, can we construct a regular expression r s.t. L(r)=L(M)? The answer is YES! The interest in construction is theoretical: tells us that union, concatenation and Kleene-* capture regularity!

From DFAs to Regular Expressions
Goal: given a DFA M construct an equivalent regular expression r Our construction builds the desired expression by “dynamic programming”, a useful algorithmic technique (CIS 320 for more!) This construction actually works even if M is an NFA or e-NFA Note: Textbook has a different presentation

Goal: given a DFA M construct an equivalent regular expression r If M has n states, without loss of generality, assume states are 1,2,… n For i=1…n, j=1…n, and k=0…n, consider the languages L[i,j,k] = { w | starting in state i, while processing w, M ends up in state j while visiting only states indexed <= k along the way } i <=k j w1 w2 wm

L[i,j,k] = { w | starting in state i, while processing w, M ends up in state j while visiting only states indexed <= k along the way } L[i,j,0] = { w | w takes M from state i to state j without any intermediate states, that is, in a single transition } L[i,j,n] = { w | d*(i, w) = j } If 1 is initial state, and say, 2 and 5, are all the final states, then L(M) = L[1,2,n] U L[1,5,n]

L[i,j,k] = { w | starting in state i, while processing w, M ends up in state j while visiting only states indexed <= k along the way } Goal: for each i, j, k, construct a regular expression r[i,j,k] which captures exactly L[i,j,k] First construct the reg-expressions r[i,j,0] for all i,j, then the expressions r[i,j,1] for all i,j, and then all r[i,j,2], and so on, finally giving us the regular expressions r[i,j,n] for all i,j If 1 is the initial state, and, say, 2 and 5, are all the final states then the reg-exp for L(M) is r[1,2,n] U r[1,5,n]

DFAs to Regular Expressions: Initialization
L[i,j,0] = { w | starting in state i, while processing w, M ends up in state j while visiting only states indexed <= 0 along the way } w is in L[i,j,0] if it takes M directly from state i to state j What is r[i,j,0] ?

Example Construction Entries of matrix give r[i,j,0] 1 2 3 1 2 3 e
b a, b 1 2 3 e a U b F e U a b a

From DFAs to Regular Expressions: Base Case
w is in L[i,j,0] if it takes M directly from state i to state j r [i, i, 0] = e U U s Note: by convention, union over an empty set gives F For i != j, r [i, j, 0] = U s { s | d(i, s) = i} { s | d(i, s) = j}

DFAs to Regular Expressions: Iterative Case
Having constructed r[i,j,k] expressions, can we construct r[i,j,k+1] String w is L[i,j,k+1] if 1. it takes M from state i to state j 2. intermediate states have index <= k+1 For 2 to hold: either intermediate states have index <= k, means w is in L[i,j,k] or state k+1 is visited once or more

Consider strings w that take M from state i to state j, state k+1 is visited once or more along the way (remaining states are <= k) <= k <= k <= k i k+1 k+1 k+1 j in L[i,k+1,k] in L[k+1,k+1,k] in L[k+1,j,k] in L[k+1,k+1,k] * Strings w of this desired pattern are characterized by L[i, k+1, k] . L[k+1, k+1, k]* . L[k+1, j, k]

Having constructed r[i,j,k] expressions, can we construct r[i,j,k+1] String w is L[i,j,k+1] if 1. it is in L[i, j, k], or 2. in L[i, k+1, k] . L[k+1, k+1, k]* . L[k+1, j, k] Hence, L[i, j, k+1] is captured by the regular expression: r[i, j, k+1] = r[i, j, k] U r[i, k+1, k] . r[k+1, k+1, k]* . r[k+1, j, k] <= k <= k <= k i k+1 j <= k

Example Construction Expressions for r[i,j,0] r[1,2,1] r[3,2,1]
b a, b 1 2 3 e a U b F e U a b a r[1,2,1] = r[1,2,0]U r[1,1,0].r[1,1,0]*.r[1,2,0] = (a U b) U e.e*.(a U b) = a U b (simplified) r[3,2,1] = r[3,2,0]U r[3,1,0].r[1,1,0]*.r[1,2,0] = b U a.e*.(a U b) = b U a (a U b) (simplified)

Example Construction Expressions for r[i,j,1] r[1,3,2]
b a, b 1 2 3 e a U b F e U a b a b U a(a U b) r[1,3,2] = r[1,3,1] U r[1,2,1].r[2,2,1]*.r[2,3,1] = F U (a U b) (e U a)* b = (a U b) a* b (simplified) r[1, 1, 3] captures the language of the DFA

Regular Languages A language L is regular if
1. there is a DFA M such that L(M) = L 2. there is an NFA M such that L(M) = L 3. there is an e-NFA M such that L(M) = L 4. there is a regular expression r such that L(r) = L The fact that all these concepts coincide tells us that the notion of regularity is robust, fundamental, and worth studying!

Proving Non-regularity
L = { w | count(w,a) = count(w,b) } Does there exist a DFA M that accepts L ? How do we establish that L is non-regular?

Recap: Lower Bounds on State Complexity
Definition: Strings u and v are distinguishable with respect to a language L if there exists w such that only one of u.w and v.w is in L If strings u and v are distinguishable with respect to L, then corresponding DFA cannot end up in the same state after reading u and v If there is a set S of k strings such that every pair of strings in S is distinguishable, then a machine for L must have at least k states If there is an infinite set S of pairwise distinguishable strings, what can we conclude?

L = { w | count(w,a) = count(w,b) } Consider S = { e, a, aa, aaa, … } = { ak | k >= 0 } S contains infinitely many strings S contains pairwise distinguishable strings: Consider two strings ai and aj with i != j ai. bi has equal number of a’s and b’s, so is in L aj. bi has unequal number of a’s and b’s, so is not in L Conclusion: no finite number of states suffice to accept L. For every number k, a DFA for L must have at least k states. No DFA can accept L, that is, L is not regular !

Regularity and Distinguishability
Theorem: If there exists an infinite set S of pairwise distinguishable (w.r.t. L) strings, then L is not regular Proof: Suppose S is an infinite set of pairwise distinguishable strings To prove: L is not regular Assume to the contrary By definition, there exists a DFM M that accepts L We know that if there are k pairwise distinguishable strings, then k is a lower bound on the number of states of any DFA for L. Hence, number of strings in S <= number of states of M S cannot be infinite, contradiction! The converse of the theorem also holds! (we won’t prove it)

To prove that a language L is not regular, identify a set S of strings such that 1. S is infinite 2. for every pair of distinct strings u and v in S, u and v are distinguishable w.r.t. L (that is, find a string w such that only one of u.w and v.w is in L) Textbook contains an alternative method for showing that a language L is not regular, called Pumping Lemma method (section 1.4) You can use this method in your answers as long as you use it correctly

Example Language L = { w.w | w in {a,b}* } L = { e, aa, bb, abab, aaaa, baba, bbbb, … } A string w belongs to L if w can be split into two identical halves Is L regular ? As the machine scans the input from left to right, the amount of information that needs to be stored bounded a priori, independent of the current input ?

L = { w.w | w in {a,b}* } Consider S = { e, a, aa, aaa, … } = { ak | k >= 0 } S contains infinitely many strings S contains pairwise distinguishable strings: Consider two strings ai and aj with i != j ai. ai can be split into two identical halves, so is in L aj. ai cannot be split into two identical halves, so is not in L Conclusion: L is not regular. Is the proof correct ?

Bug in the Proof Consider two strings ai and aj with i != j ai. ai can be split into two identical halves, so is in L aj. ai cannot be split into two identical halves, so is not in L This conclusion is false !! Note this claim should hold for all values of i and j with i != j But consider the case when i=4 and j=2. The string a2. a4 is a6 , and is in L !

Correct Proof of Non-regularity
L = { w.w | w in {a,b}* } Consider S = { e, a, aa, aaa, … } = { ak | k >= 0 } S contains infinitely many strings S contains pairwise distinguishable strings: Consider two strings ai and aj with i != j ai. b ai b can be split into two identical halves, so is in L Consider aj. b ai b Since i != j, if we split this string into two parts of equal length, first part cannot end with b. but second does, so two parts cannot be identical, and string is not in L Conclusion: L is not regular.

Modified Example L = { w.w | w in {a}* } Is L regular ? L = { e, aa, aaaa, aaaaaa, … } L = { w | w contains only a’s and has even length } Regular !

Another Example = { a }, L = { w | length of w is a perfect square }
L = { e, a, a4, a9, a16, … } Is L regular ?

Proof of Non-regularity
S = { a }, L = { w | length of w is a perfect square } Consider S = { e, a, aa, aaa, … } = { ak | k >= 0 } S contains infinitely many strings To show that S contains pairwise distinguishable strings, consider two strings ai and aj with i < j. Goal: find a value p (that depends on i and j) such that i+p is a perfect square but j+p is guaranteed not to be a perfect square If we succeed, then ai . ap is in L, but aj . ap is not in L. Hence, strings ai and aj are distinguishable w.r.t. L L is not regular.

Instructor: Aaron Roth

Similar presentations

Presentation on theme: "Instructor: Aaron Roth"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructor: Aaron Roth

Similar presentations

Presentation on theme: "Instructor: Aaron Roth"— Presentation transcript:

Similar presentations

About project

Feedback