FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
1 0 0 q1q1 q2q2 q3q3 q4q4 ε
Q is the set of states Σ is the alphabet : Q Σ ε → 2 Q is the transition function Q 0 Q is the set of start states F Q is the set of accept states A non-deterministic finite automaton (NFA) is a 5-tuple N = (Q, Σ, , Q 0, F) 2 Q is the set of subsets of Q and Σ ε = Σ {ε}
1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 } Q ε 0 1ε q1q1 q2q2 q3q3 q4q4
1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 } Q 0 1ε q1q1 {q 3 } q2q2 {q 4 } q3q3 {q2}{q2} q4q4 ε
1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 } Q ε, 0 0 1ε q1q1 {q 3 } q2q2 {q 4 } q3q3 {q 2,q 4 } {q2}{q2} q4q4 00 L(N)? 01 L(N)?
Let w Σ* and suppose w can be written as w 1... w n where w i Σ ε (ε is viewed as representing the empty string ) Then N accepts w if there are r 0, r 1,..., r n Q such that 1.r 0 Q 0 2.r i+1 (r i, w i+1 ) for i = 0,..., n-1, and 3.r n F A language L is recognized by an NFA N if L = L (N). L(N) = the language of machine N = set of all strings machine N accepts
1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 } Q ε, 0 0 1ε q1q1 {q 3 } q2q2 {q 4 } q3q3 {q 2,q 4 } {q2}{q2} q4q4 00 L(N)? 01 L(N)?
Q = 2 Q : Q Σ → Q (R, ) = ε( (r, ) ) rRrR q 0 = ε(Q 0 ) F = { R Q | f R for some f F } FROM NFA TO DFA Input: N = (Q, Σ, , Q 0, F) Output: M = (Q, Σ, , q 0, F) * For R Q, the ε-closure of R, ε(R) = {q that can be reached from some r R by traveling along zero or more ε arrows} *
a a, b a b ε Given: NFA N = ( {1,2,3}, {a.b}, , {1}, {1} ) Construct: equivalent DFA M ε({1}) = {1,3} N
a a, b a b ε Given: NFA N = ( {1,2,3}, {a,b}, , {1}, {1} ) Construct: equivalent DFA M ε({1}) = {1,3} N N = ( Q, Σ, , Q 0, F ) = (Q, Σ, , q 0, F) ab {1} {2} {2,3} {3} {1,3} {1,2} {2,3} {1,3} {2} {2,3} {1,2,3} {3} {1,2,3} {2,3} q 0 =
a a, b a b ε Given: NFA N = ( {1,2,3}, {a,b}, , {1}, {1} ) Construct: equivalent DFA M ε({1}) = {1,3} N N = ( Q, Σ, , Q 0, F ) = (Q, Σ, , q 0, F) ab {1} {2} {2,3} {3} {1,3} {1,2} {2,3} {1,3} {2} {2,3} {1,2,3} {3} {1,2,3} {2,3} q 0 =
REGULAR LANGUAGES CLOSED UNDER STAR Let L be a regular language and M be a DFA for L We construct an NFA N that recognizes L* 0 0, ε ε ε
Formally: Input: M = (Q, Σ, , q 1, F) Output: N = (Q, Σ, , {q 0 }, F) Q = Q {q 0 } F = F {q 0 } (q,a) = { (q,a)} {q 1 } if q Q and a ≠ ε if q F and a = ε if q = q 0 and a = ε if q = q 0 and a ≠ ε else DFA NFA
Show: L (N) = L* 1.L (N) L* 2. L (N) L*
1. L (N) L* Assume w = w 1 …w k is in L*, where w 1,…,w k L We show N accepts w by induction on k Base Cases: k = 0 k = 1 Inductive Step: Assume N accepts all strings v = v 1 …v k L*, v i L and let u = u 1 …u k u k+1 L*, v j L Since N accepts u 1 …u k (by induction) and M accepts u k+1, N must accept u (w = ε) (w L)
Assume w is accepted by N, we show w L* If w = ε, then w L* If w ≠ ε accept ε ε L* 2. L (N) L* By induction
REGULAR LANGUAGES ARE CLOSED UNDER REGULAR OPERATIONS Union: A B = { w | w A or w B } Intersection: A B = { w | w A and w B } Negation: A = { w Σ* | w A } Reverse: A R = { w 1 …w k | w k …w 1 A } Concatenation: A B = { vw | v A and w B } Star: A* = { w 1 …w k | k ≥ 0 and each w i A }
REGULAR EXPRESSIONS THURSDAY Jan 21
SOME LANGUAGES ARE NOT REGULAR B = {0 n 1 n | n ≥ 0} is NOT regular!
WHICH OF THESE ARE REGULAR C = { w | w has equal number of 1s and 0s} D = { w | w has equal number of occurrences of 01 and 10} NOT REGULAR REGULAR!!!
THE PUMPING LEMMA Let L be a regular language with |L| = Then there exists a positive integer P such that 1. |y| > 0 2. |xy| ≤ P 3. xy i z L for any i ≥ 0 if w L and |w| ≥ P then w = xyz, where:
Let P be the number of states in M Assume w L is such that |w| ≥ P q0q0 qiqi qjqj q |w| … There must be j > i such that q i = q j Let M be a DFA that recognizes L 1. |y| > 0 2. |xy| ≤ P 3. xy i z L for any i ≥ 0 We show w = xyz x
USING THE PUMPING LEMMA Use the pumping lemma to prove that B = {0 n 1 n | n ≥ 0} is not regular Hint: Assume B is regular Let B = L(M), for DFA M, and let P be larger than the number of states in M Try pumping s = 0 P 1 P
USING THE PUMPING LEMMA Use the pumping lemma to prove that B = {0 n 1 n | n ≥ 0} is not regular Hint: Assume B is regular, and try pumping s = 0 P 1 P If B is regular, s can be split into s = xyz, where for any i ≥ 0, xy i z is also in B If y is all 0s: xyyz has more 0s than 1s If y is all 1s: xyyz has more 1s than 0s If y has both 1s and 0s: xyyz will have some 1s before some 0s
Use the pumping lemma to prove that C = { w | w has an equal number of 0s and 1s} is not regular Hint: Try pumping s = 0 P 1 P If C is regular, s can be split into s = xyz, where for any i ≥ 0, xy i z is also in C and |xy| ≤ P
WHAT DOES D LOOK LIKE? 1 0 ε 0(0 1)*0 1(0 1)*1 D = { w | w has equal number of occurrences of 01 and 10} = { w | w = 1, w = 0, w = ε or w starts with a 0 and ends with a 0 or w starts with a 1 and ends with a 1 }
REGULAR EXPRESSIONS is a regexp representing { } ε is a regexp representing {ε} is a regexp representing If R 1 and R 2 are regular expressions representing L 1 and L 2 then: (R 1 R 2 ) represents L 1 L 2 (R 1 R 2 ) represents L 1 L 2 (R 1 )* represents L 1 *
PRECEDENCE Tightest Star (“*”) Concatenation (“.”, “”) LoosestUnion (“ ”, “+”, “|”)
R2R2 R1*R1*( EXAMPLE R 1 *R 2 R 3 = ()) R3R3
{ w | w has exactly a single 1 } 0*10*
What language does * represent? {ε}{ε}
{ w | w has length ≥ 3 and its 3rd symbol is 0 } (0 1)(0 1)0(0 1)* 000(0 1)* 010(0 1)* 100(0 1)* 110(0 1)*
{ w | every odd position of w is a 1 } (1(0 1))*(1 ε) 1((0 1)1)*(0 1 ε) ε Also
EQUIVALENCE L can be represented by a regexp L is regular
L can be represented by a regexp L is regular EQUIVALENCE Given regular expression R, we show there exists NFA N such that R represents L(N) Induction on the length of R:
Given regular expression R, we show there exists NFA N such that R represents L(N) Induction on the length of R: L can be represented by a regexp L is regular Base Cases (R has length 1): R = R = ε R =
Inductive Step: Assume R has length k > 1, and that every regular expression of length < k represents a regular language Three possibilities for R: R = R 1 R 2 R = R 1 R 2 R = (R 1 )* (Union Theorem!) (Concatenation) (Star) Therefore: L can be represented by a regexp L is regular
L can be represented by a regexp L is a regular language Have Shown
Give an NFA that accepts the language represented by (1(0 1))* 1ε1,0 ε
L can be represented by a regexp L is a regular language Proof idea: Transform an NFA for L into a regular expression by removing states and re-labeling arrows with regular expressions
NFA ε ε ε ε ε Add unique and distinct start and accept states While machine has more than 2 states: Pick an internal state, rip it out and re-label the arrows with regexps, to account for the missing state *0
NFA ε ε ε ε ε While machine has more than 2 states: In general: R(q 1,q 2 ) R(q 2,q 2 ) R(q 2,q 3 ) R(q 1,q 2 )R(q 2,q 2 )*R(q 2,q 3 ) R(q 1,q 3 ) q1q1 q2q2 q3q3 G R(q 1,q 3 )
q1q1 b a ε q2q2 a,b ε a*b (a*b)(a b)* q0q0 q3q3 R(q 0,q 3 ) = (a*b)(a b)* represents L(N)
Formally: Run CONVERT(G): (Outputs a regexp) If #states = 2 return the expression on the arrow going from q start to q accept Add q start and q accept to create G If #states > 2
Formally:Add q start and q accept to create G If #states > 2 select q rip Q different from q start and q accept define Q = Q – {q rip } define R as: R(q i,q j ) = R(q i,q rip )R(q rip,q rip )*R(q rip,q j ) R(q i,q j ) return CONVERT(G) Run CONVERT(G): (Outputs a regexp) } Defines: G (GNFA) (R = the regexps for edges in G)
Claim: CONVERT(G) is equivalent to G Proof by induction on k (number of states in G) Base Case: k = 2 Inductive Step: Assume claim is true for k-1 state GNFAs We first note that G and G are equivalent But, by the induction hypothesis, G is equivalent to CONVERT(G) Thus: CONVERT(G) equivalent to CONVERT(G) QED
q3q3 q2q2 b a b q1q1 b a a ε ε ε bb
q2q2 b a b q1q1 a a ε ε ε a ba b
bb (a ba)b*a a ba q2q2 b q1q1 a ε ε bb b b (a ba)b* (bb (a ba)b*a)* (b (a ba)b*)
Convert the NFA to a regular expression q3q3 q2q2 b b q1q1 a a, b ε ε ε b (a b)b*b bb*b (a b)b*b(bb*b)* (a b)b*b(bb*b)*a
((a b)b*b(bb*b)*a)* ((a b)b*b(bb*b)*a)*(a b)b*b(bb*b)*
ε
DFA NFA Regular Language Regular Expression DEFINITION
FLAC Read Chapter 2.1 of the book for next time