Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expression (EXTRA)

Similar presentations


Presentation on theme: "Regular Expression (EXTRA)"— Presentation transcript:

1 Regular Expression (EXTRA)
Chap 1 Regular Expression (EXTRA)

2 Equivalence Relation Def: Assume R is a relation on a set A, that is, R⊆AxA. We write aRb which means (a,b)∈R to indicate that a is related to b via the relation R: R is reflexive if for every a∈A, aRa R is symmetric if for every a and b in A, if aRb, then bRa R is transitive if for every a, b, and c in A, if aRb and bRc, then aRc R is an equivalence relation on A if R is reflexive, symmetric, and transitive

3 Equivalence Class Suppose R is any equivalence relation on A. For any element a of A, we denote by [a]R , or sometimes simply by [a], the equivalence class containing a, that is, [a]R = {x∈A | xRa}

4 Thm: For any partition C of a set A, the relation R on A, defined by xRy ⇔ x and y belong to the same element of C, is an equivalence relation on A. Conversely, if R is any equivalence relation on A, the set of equivalence classes is a partition of A, and two elements of A are equivalent ⇔ they are in the same equivalence class.

5 Distinguishing One String from Another
Def: Let L be a language in * Two strings x and y in * are distinguishable with respect to L if there is a string z∈ * (which may depend on x and y) so that exactly one of the strings xz and yz is in L The string z is said to distinguish x and y with respect to L

6 We may say that x and y are indistinguishable with respect to L if there is no such string z;
In other words, if for every z, both xz and yz have the same status---either both in L or both not in L Eg: L={x∈{0,1}* | x ends with 10} the strings and 100 are distinguishable with respect to L because for z=0, 01011z∈L and 100z∉L. The strings 0 and 100 are indistinguishable with respect to L

7 Lemma Suppose that L⊆* and M=(Q,  , q0, F,  ) is any FA recognizing L. If x and y are two strings in * for which *(q0,x)=*(q0,y), then x and y are indistinguishable with respect to L

8 Proof: Let z be any string in * , and consider the two strings xz and yz. *(q0,xz)= *(*(q0,x),z) *(q0,yz)= *(*(q0,y),z) and therefore, by our assumption, *(q0,xz)=*(q0,yz). Since M is assumed to recognize L, these two strings are either both in L or both not in L. Therefore, x and y are indistinguishable with respect to L. ■

9 Thm Suppose that L⊆* and, for some positive integer n, there are n strings in * , any two of which are distinguishable with respect to L. Then there can be no FA recognizing L with fewer than n states.

10 Proof: Suppose that x1, x2 , …, xn are n strings, any two of which are distinguishable with respect to L. If M=(Q,  , q0, A,  ) is any FA with fewer than n states, then by the pigeonhole principle, the states *(q0,x1), *(q0,x2), …,*(q0,xn) cannot all be distinct, and so for some i≠j, *(q0,xi) = *(q0,xj). Since xi and xj are distinguishable with respect to L, M cannot recognize L ■

11 A Criterion for Regularity
Def Let L be any language in * . The relation IL on * (the indistinguishability relation) is defined as follows: For any two strings x,y∈ * , x IL y ⇔ x and y are indistinguishable with respect to L. In other words, x IL y if , for any z ∈ * , either xz and yz are both in L, or xz and yz are both in L’

12 Lemma For any language L, IL is an equivalence relation on * Proof:
It is obvious that IL is reflexive and symmetric. Suppose x IL y and y IL w. We must show that x IL w. Let z be any string in * . If xz ∈L, then yz ∈L, since x IL y, and therefore wz ∈L, since y IL w. Similarly, if xz∉ L, then wz∉ L. Therefore x IL w.

13 Assume L is a regular language, and FA M=(Q,  , q0, F,  ) recognizes L. If q∈ Q, we let
Lq={x∈ *| *(q0,x)=q } If we start with an equivalence class q containing a string x, then (q,a) should be the equivalence class containing xa, that is ([x],a)=[xa]

14 Lemma: IL is right invariant with respect to concatenation.
In other words, for any x,y∈* and any a∈ , if x IL y , then xa IL ya. Equivalently, if [x]=[y], then [xa]=[ya] Proof: Suppose that x IL y and a∈ . For any z’ ∈* , xz' and yz' are either both in L or both not in L. With z'=az, we complete our proof. ■

15 Thm Let L ⊆ * , and let QL be the set of equivalence classes of the relation IL on * . If QL is a finite set, then ML=(QL,  , q0, FL,  ) is a finite automaton accepting L, where q0=[], FL={q∈QL | q∩L≠∅} and  : QL ×  → QL is defined by the formula ([x],a)=[xa]. Furthermore, ML has the fewest states of any FA accepting L.

16 Proof: (1) By the previous lemma,  ([x],a)=[xa] is a meaningful definition. To verify that ML recognizes L, need to prove *([x], y)=[xy], for any x, y. By induction, it is clear for y= . Suppose it is true for some y. Consider *([x], ya)=  (*([x], y), a)= ([xy], a) = [xya]. Thus *(q0, x)= *([], x) =[x]. Since FL={q∈QL | q∩L≠∅}, ML accepts x iff [x]∩L≠∅. It is clear that if x∈L, then [x]∩L≠∅. On the other hand, if [x] contains y of L, then x must be in L; otherwise  will distinguish x and y. Therefore, ML accepts L. (2)If there are n equivalence classes of IL, then there are n pair-wise distinguishable strings. By a previous theorem, any FA accepting L has at least n states. ML has exactly n states, it has the fewest possible.

17 Corollary: (Myhill-Nerode theorem)
L is a regular language ⇔ The set of equivalence classes of IL is finite Proof: Follow from the slide9 & 15 ■

18 Eg: Let L={0n1n|n≥0}. The intuitive reason L is not regular :
we must remember how many 0’s we have seen S={0n|n≥0} : infinite set 0i , 0j ∈S, i≠j 0i1i∈L but 0j1i∉ L. ⇒ 1i distinguishes 0i and 0j ⇒ The relation IL has infinitely many distinct equivalence classes and that L is not regular.

19 Eg: S={0n|n≥0} :infinite set, and L={ww|w∈{0,1}*} is a language.
z=1n0n1n is a string. ⇒ 0nz ∈ L , 0mz∉ L ⇒ z distinguishes 0n and 0m

20 Minimal Finite Automata
Eg: Two FAs for {0,1}*{10} and the corresponding partitions of {0,1}* 4 1 2 5 1 1 1 1 1 1 A B C 3 6 1 1 1 7 L1={} L3={1} 1 L2={0} LA LB L5=*{01} L4=*{00} L7=*{11} L6=L=*{10} LC=L

21 Identifying those pairs (p,q) of states for which Lp and Lq are subsets of different equivalence classes , denote it as p≢q Lemma For p,q∈Q , p≢q ⇔there exists z∈* so that exactly one of the two states *(p,z) and *(q,z) is in F

22 Algorithm (for identifying the pairs (p,q) with p≢q)
1. List all (unordered) pairs of states (p,q) of which p≠q. 2. Make a sequence of passes through the pairs. 3. On the first pass, mark each pair of which exactly one element is in A. 4. On each subsequent pass, mark any pair (r,s) if there is an a ∈  for which (r,a)=p, (s,a)=q, and (p,q) is already marked 5. After a pass in which no new pairs are marked, stop. The marked pairs (p,q) are precisely those for which p≢q ■

23 Eg: (cf slide 19) 2 3 2 2 1 2 1 4 p1 p2 p3 1 5 2 2 2 6 1 1 1 1 1 7 2 2 2 1 1 2 3 4 5 6


Download ppt "Regular Expression (EXTRA)"

Similar presentations


Ads by Google