1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American.

1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American Symposium, Cancun, Mexico, April 3-6, 2002. Proceedings. Rytter, W. Advisor: Prof. R. C. T. Lee Reporter: L. Y. Huang

2 Maximal Suffix A maximal suffix of a string is a suffix which is lexicographically maximal of all suffixes of a string. The maximal suffix of string w is denoted by MaxSuf(w) Ex: Consider string w = abaaba The set of its suffixes : {a, ba, aba, aaba, baaba, abaaba} The set of its sorted suffixes:{a, aaba, aba, abaaba, ba, baaba} Thus we can find that MaxSuf(w) = baaba.

3 Self-Maximal String A string w is said to be self-maximal if MaxSuf(w) = w. Ex: Consider strings w = abaaba, x = baaba. –The MaxSuf(w) = baaba. –The MaxSuf(x) = baaba. Hence, we say that x is a self-maximal string but w is not.

4 Important Properties of Self-Maximal Strings By definition, we have the following observation about self-maximal strings: For a self-maximal string P, suppose a prefix P 1,P 2,…,P i of P is equal to a substring, P k,P k+1,…, P k+i-1, of P, then P i+1 >=P k+i. xy  x > y uu P …

5 Example: TCATBTCATA is a self-maximal string. But, TBATATBATB is not a self-maximal string because B after the substring TBAT is lexically larger than A after prefix TBAT.

6 The Period of a String A period of a string w is an integer p,, such that : Ex: Consider string w = bbabbabbabba –bbabbabbabba → period = 3 and period =6. –abcdefg →period=word length=7 –abcdeab →period=5 We define period(w) as the smallest period of w. If w = bbabbabbabba, period(w) is 3.

7 Given a string P, we are actually interested in the period of every prefix. i 123456789 P abcaabcab period 123344447 prefix 000112342 i-prefix(i) 123344447 Note that the period of i-prefix(i) in the MP-algorithm which is the number of steps which we can move the pattern. (The index starts from 1 in this case.)

8 Why are we interested in the period function? If the period function is actually the same as the prefix function of the MP_algorithm, why are we interested in it? To calculate the prefix function, we must use pointers which point back to some characters way back. In the following, we shall introduce a naïve period function which never looks back.

9 Naive-Period Function Function Naive-Period can be used to compute the period of a string if this string is self- maximal. For a general string, the Naive-Period function will not work. This is why our algorithm only works for the self-maximal strings.

10 Function Naive-Period (j); { computes the period of self-maximal pat} period (1):= 1; for i := 2 to j do if then period (i):= i ; else period(i) := period(i - 1) return period; Algorithm of Naive-Period Function

11 An Example of Naive-Period Function w bbabbabba b i 12 i-period(i-1) 01 period 11 Function Naive-Period (j); { computes the period of self-maximal pat} period (1):= 1; for i := 2 to j do if then period (i):= i ; else period(i) := period(i - 1) return period;

12 An Example of Naive-Period Function Consider a string w = bbabbabbab –w is a self-maximal string and period(w)=3. w bbabbabba b i 123 i-period(i-1) 012 period 113

19 An Example of Naive-Period Function Consider a string w = bbabbabbab –w is a self-maximal string and period(w)=3. w bbabbabba b i 12345678910 i-period(i-1) 0121234567 Period(i) 1133333333

20 Given any pattern P, let k be the length of the longest proper suffix of P[1, i-1] equal to a prefix P[1, k] of a P[1, i-1]. Let k’ be the length of the longest proper suffix of P[1, i] equal to a prefix P[1, k’] of a P[1, i]. For any i, we consider the following possibilities: Why can Naïve period work in the self-maximal string? i i-1 k’ k k P P

21 1. k ≠ 0 and P[k + 1] = P[i] : Period(i) = Period(i - 1) 2. k ≠ 0, P[k + 1] ≠ P[i] and k’ ≠ 0 : Period(i) = i – k’ 3. k ≠ 0, P[k + 1] ≠ P[i] and k’ = 0 : Period(i) = i 4. k = 0 and k’ ≠ 0 : Period(i) = i – k’ 5. k = 0 and k’ = 0 : Period(i) = i

22 1. k ≠ 0 and P[k + 1] = P[i] : Period(i) = Period(i - 1) i 12345678 P abcaabca period 12334444 For i = 8, the substring “abc” of length 3 (k = 3) is the longest suffix of P(1, 7) which equals to a prefix of P(1, 7) and P(8) = P(4)  period(8) = period(7)=4.

23 2. k ≠ 0, P[k + 1] ≠ P[i] and k’ ≠ 0 : Period(i) = i – k’ i 123456789 P abcaabcab period 123344447 For i = 9, the substring “abca” of length 4 (k = 4) is the longest suffix of P(1, 8) which equals to a prefix of P(1, 8) and P(9) ≠ P(5) There is a suffix of P(1, 9) which equals to a prefix of P(1, 9), P(1, 2) = ab of length 2 (k’ = 2)  period(9) = i - | P(1, 2) | = 9 - 2 =7.

24 3. k ≠ 0, P[k + 1] ≠ P[i] and k’ = 0 : Period(i) = i i 123456789 P abccabccb period 123444449 For i = 9, the substring “abcc” of length 4 (k = 4) is the longest suffix of P(1, 8) which equals to a prefix of P(1, 8) and P(9) ≠ P(5) There is no suffix of P(1, 9) which equals to a prefix of P(1, 9), (k’ = 0).  period(9) = i = 9.

25 4. k = 0 and k’ ≠ 0 : Period(i) = i – k’ i 123456789 P abccbbcca period 123456788 For i = 9, the is no suffix of P(1, 8) which equals to a prefix of P(1, 8), (k = 0) The substring “a” of length 1 (k’ = 1) is a suffix of P(1, 9) which equals to a prefix of P(1, 9), P(1, 1) = a.  period(9) = i - |P(1, 1)| = 9-1 = 8.

26 5. k = 0 and k’ = 0 : Period(i) = i i 123456789 P abccbbccb period 123456789 For i = 9, there is no suffix of P(1, 8) which equals to a prefix of P(1, 8), (k = 0). There is no suffix of P(1, 9) which equals to a prefix of P(1, 9), (k’ = 0).  period(9) = i = 9.

27 Assume that the conditions 2 & 4 holds. There must be a suffix which is equal to a prefix. Let u be the such a longest suffix. But, the conditions 2 (k ≠ 0, P[k + 1] ≠ P[i] and k’ ≠ 0) and 4 (k = 0 and k’ ≠ 0) do not exist in self- maximal suffix. Why?

28 2. k ≠ 0, P[k + 1] ≠ P[i] and k’ ≠ 0 xy i j period uu Suppose that P is self-maximal. Since P[i]=y≠P[j]=x holds, x >y. Since k’ ≠ 0, there is a v+y which is the longest suffix of P(1,i) equal to a prefix of P(1,i) as shown above. P vyxvy i period u u P

29 vyvxvyvy i j period uu P vyxvy i uu P Since k ≠ 0, we must have the following. Since P is a self-maximal string, from the prefix u, we may conclude that y>x. Contradiction! k ≠ 0, P[k + 1] ≠ P[i] and k’ ≠ 0 cannot hold for self-maximal strings.

30 Using similar reasoning, we can prove that for self-maximal strings, k = 0 and k’ ≠ 0 does not hold. Thus we may have the following: For self-maximal strings, Period(i)=Period(i - 1) or Period(i)=i. That is, the naïve period function works for Self-maximal strings.

31 What is the advantage of the naïve-period function? It is linear and we never need to look back to some characters way back, as we need in calculating the prefix function in MP- algorithm.

32 For a string which is not self-maximal, we use the following algorithm, called the Max-Suffix Matching Algorithm.

33 MaxSuffix-Matching Algorithm First, we decompose the pattern string P to be u · v, where v= MaxSuf(P) and u is the other part of P. Note that v is unique in the string P, and this is a very important property. Property 1: No suffix of u is equal to a prefix of v., because v is uniqueness. Example ： P = dababdadad MaxSuf(P) = dadad P = u·v = dabab ·dadad

34 MaxSuffix-Matching Algorithm If v is found in T, we next find the part u of P which occurs in the left of v by a naive testing way. Assume i is the location of an occurrence of v in T and the string before i is denoted as prev because of Property 1. Text v v i prev

35 Maxsuffix-Matching Algorithm Algorithm Maxsuffix-Matching i:= 0; j:=0; period:=1;prev:=0; while i ≤ n - |v| do begin while j < |v| and v[i+1]= T[i+j+1] do begin j=j+1; if j > period and v[j] ≠ v[j -period] then period:=j end; {MATCH OF v} if j = |u| then begin if i − prev > |u| and u = T[i − |u| + 1… i] then report match at i − |u|; prev := i; end i := i + period; if j ≥ 2 ・ period then j := j − period else begin j:= 0; period := 1 end; end; Naive-Period Function Test u by using any algorithm

36 Example Text = adadaddadabababadada P = u·v = abababa · dada case1 –If i < |u|, that there is no occurrence of u·v at beginning. a d a d a d d a d a b a b a b d a d a Text d a d a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 i

37 Example Text = adadaddadabababadada P = u·v = abababa · dada Case2 –If i – prev <|u|, then there is no occurrence of u·v at position i - |u|. This is because the maximal suffix v of P only start at one position on P. d a d a a d a d a d d a d a b a b a b d a d aText d a d a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 d a i = 7, |u| = 7, prev =2

38 Example Text = adadaddadabababadada P = u·v = abababa · dada So, we only need to check whether u exists in the left of third v in this example. d a d a a d a d a d d a d a b a b a b d a d aText d a d a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 d a First occurrence Second occurrence Third occurrence

39 Time Complexity and Space Complexity Hence, the MaxSuffix-Matching Algorithm can find all occurrences of a pattern in O(1) space (i, j, period) and linear time complexity.

40 Reference Maxime Crochemore, String-matching on ordered alphabets, Theoretical Computer Science, v.92 n.1, p.33-47, Jan. 6, 1992 Maxime Crochemore, Dominique Perrin, Two-way string- matching, Journal of the ACM (JACM), v.38 n.3, p.650-674, July 1991 Maxime Crochemore, Wojcjech Rvtter, Text algorithms, Oxford University Press, Inc.,New York, NY, 1994 M. Crochemore, W. Rytter, Cubes, squares and time space efficient string matching, Algorithmica 13 (5) (1995) 405-425. J.-P. Duval, Factorizing words over an ordered alphabet, J. Algorithms 4 (1983) 363-381.

41 Reference Z Galil, J. Seiferas, Time-space-optimal string matching, J. Comput. System Sci. 26 (1983) 280-294. L. Gasieniec, W. Plandowski, W. Rytter, Constant-space string matching with smaller number of comparisons: sequential sampling, in: Z. Galil, E. Ukkonen (Eds.), Combinatorial Pattern Matching, 6th Annual Symposium, CPM gs, Lecture Notes in Computer Science, Vol. 937, Springer, Berlin, 1995, pp. 78-89. Leszek Gasieniec, Woiciech Plandowski, Woiciech Rytter, The zooming method: a recursive approach to time-space efficient string-matching, Theoretical Computer Science, v. 147 n. 1-2, p. 19-30, Aug. 7, 1995 D.E. Knuth, J.H. Morris, V.R. Pratt, Fast pattern matching in strings, SIAM J. Comput. 6 (1977) 322-350. M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, USA, 1983.

42 ~Thank You~

1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American.

Similar presentations

Presentation on theme: "1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American.

Similar presentations

Presentation on theme: "1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American."— Presentation transcript:

Similar presentations

About project

Feedback