Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005,

Similar presentations


Presentation on theme: "1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005,"— Presentation transcript:

1 1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005, pp. 391-404 Professor R.C.T Lee Speaker K.W. Liu Department of Computer Science National Chi Nan University

2 2 Given : a text string T = t 1 t 2 t 3 …t n. a pattern string P = p 1 p 2 p 3 …p m. where |P|≤|T|. Output: All occurrence(s) of the pattern string within the text string. T =ababababaabbaaabbababab P =aabbaaab String Matching Problem Example T = ababababaabbaabbabababa P = aabbaabb

3 3 Example T = ababababaabbaabbabababa P = aabbaabb ababababaabbaaabbababab aabbaaab aabbaaab aabbaaab aabbaaab Traditional Method

4 4 In this talk, we shall provide three ideas: 1.The wide-window method 2.The convolution method 3.The bit pattern (modified convolution) method.

5 5 2|P|-1 |P||P||P|-1 T2T2 T1T1 ▫Open a window with size 2|P|-1. ▫Divide it into two parts: The first one denoted as T 1 is with size |P|-1 The second part denoted as T 2 is with size |P| T Basic Idea of the Wide Window Approach Since |T 1 |<|P|, some suffix of P must be in T 2 if it exists.

6 6 2|P|-1 |P||P|-1 ▫Find all prefixes of T 2 which are also suffixes of P. ▫Let r denote the length of such a longest prefix. ▫We can be sure that one part of T 2 can be ignored as shown. T2T2 T1T1 r Can be ignored. T

7 7 ▫For every prefix of T 2 which is a suffix of P, we should find whether there exists a suffix in T 1 which is also a prefix of P. 2|P|-1 |P||P|-1 T2T2 T1T1 T r

8 8 n-suffix : Given a string S, n-suffix of S is the suffix of S whose length is n. -1< n < |S|+1 Example: S = abcde 0-suffix of S = ε 1-suffix of S = e 2-suffix of S = de 3-suffix of S = cde n-prefix : Given a string S, n-prefix of S is the prefix of S whose length is n. -1< n < |S|+1 Example: S = abcde 0-prefix of S = ε 1-prefix of S = a 2-prefix of S = ab 3-prefix of S = abc Definition:

9 9 Given: T = aababcbdcea P = abcbd Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1 In this case, |P|=5, 2|P| - 1 = 9 T =aababcbdcea aababcbdcea T = |P|-1|P| T2T2 T1T1 An Example of the Wide Window Approach

10 10 We first find all prefixes of T 2 which are equal to some suffixes of P. In this case, we obtain bcbd whose length is 4. |P|-4 = 5-4 = 1 If the 1-suffix of T 1 is the 1-prefix of P, we have found a matching. 1-suffix of T 1 = a 1-prefix of P = a ∴ 1-suffix of T 1 = 1-prefix of P. Thus we conclude that a matching is found. aababcbdcea abcbd T = P = T2T2 T1T1

11 11 Given:T = ababa P = aba Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1 In this case, |P| = 3, 2|P| - 1 = 5 T = ababa ababa T = |P|-1|P||P| T2T2 T1T1 Another Example

12 12 We first find all prefixes of T 2 which are equal to some suffixes of P. In our case, we obtain aba and a where lengths are 3 and 1. |P| - 3 = 3 - 3 = 0 (۞ the whole P is equal to T 2 ۞ one matching is found ) |P| - 1 = 3 – 1 = 2 ababa T = |P|-1|P||P| T2T2 T1T1 aba aba P = If the 2-suffix of T 1 is the 1- prefix of P, we have found a matching. 2-suffix of T 1 = ab 2-prefix of P = ab ∴ 2-suffix of T 1 = 2-prefix of P. Thus we conclude that two matchings are found.

13 13 Question: How can we find a suffix of a string S 1 to be a prefix of S 2 ? Answer : We use the convolution method.

14 14 Convolution Method T = aabc, P = ab = ba aabc ba 1100 0010 01200 T=aabc P=ab 0 T=aabc P=ab 10 T=aabc P=ab 11 T=aabc P=ab 00 T=aabc P=ab 0

15 15 We may use the convolution method to find all prefixes of T 2 which are equal to some suffixes of P. T 2 = bcbdc, P = abcbd = cdbcb a b c b d c d b c b 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 4 0 1 0 + = P = A 4-suffix of P equal to a prefix of T 2. The unused region to find matching! If any zero appears in the column, we can not get a matching.

16 16 a b c b d c d b c b 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 4 0 1 0 + = P = Pabcbd T2T2 bcbdc 0 Pabcbd T2T2 bcbdc 10 Pabcbd T2T2 bcbdc 000 Pabcbd T2T2 bcbdc 1111 Pabcbd T2T2 bcbdc 00000 May be ignored. No further sliding to the left is needed.

17 17 a b c b d c d b c b 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 4 0 1 0 + = P = The unused region to find matching! 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 A 4-suffix of P equal to a prefix of T 2 & We may also use the logic operator (AND &) to find all prefixes of T 2 which are equal to some suffixes of P. T 2 = bcbdc P = abcbd

18 18 We may use the convolution method to find all suffixes of T 1 which are equal to some prefixes of P. T 1 = aaba,P = abcbd = dbcba d b c b a a a b a 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 1 1 2 0 1 + T 1 = = A 1-prefix of P equal to a suffix of T 1. The unused region to find matching! If any zero appears in the column, we can not get a matching.

19 19 d b c b a a a b a 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 1 1 2 0 1 + T 1 = = Pabcbd T1T1 aaba 1 Pabcbd T1T1 aaba 00 Pabcbd T1T1 aaba 110 Pabcbd T1T1 aaba 1000 May be ignored. No further sliding to the right is needed.

20 20 We may use the logic operator (AND &) to find all suffixes of T 1 which are equal to some prefixes of P. T 1 = aaba P = abcbd d b c b a a a b a 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 1 1 2 0 1 + T 1 = = The unused region to find matching! 0 0 0 1 0 1 0 0 1 1 0 0 0 1 & A 1-prefix of P equals to a suffix of T 1. ∴ 1-suffix of T 1 = 1-prefix of P. Thus we conclude that a matching is found.

21 21 The Bit Pattern Approach Let us consider the following case: T = bcbdc P = abcbd Our job is to determine whether there is a prefix in T which is a suffix of P. Indeed, in this case, we have 4-prefix of T (bcbd) which is also the 4-suffix of P. As indicated before, we may use convolution.

22 22 P=P=abcbd =cdbcb 01010 00100 01010 00001 00100 000001000 Convolution A 4-suffix of T is a 4-prefix of P. AND OPERATION V1V1 V2V2 V3V3 V4V4 V5V5 What are the vectors V 1,V 2,…,V 5 ?

23 23 Given a string S = s 1 s 2 …s n and a character α, the α-bit pattern of S is defined as b 1 b 2 …b n where b i =1 if s i = α and b i =0 if otherwise. Example: S = abcbd a-bit pattern of S = 1 0 0 0 0 b-bit pattern of S = 0 1 0 1 0 c-bit pattern of S = 0 0 1 0 0 d-bit pattern of S = 0 0 0 0 1

24 24 T = b c b d c, P = a b c b d P=P=abcbd =cdbcb 01010 00100 01010 00001 00100 000001000 AND OPERATION V1V1 V2V2 V3V3 V4V4 V5V5 We can now observe that 1.V 1 = b-bit pattern of P as we are comparing T[1] = b with P, 2.V 2 = c-bit pattern of P as we are comparing T[2] = c with P, 3.V 3 = b-bit pattern of P as we are comparing T[3] = b with P, 4.V 4 = d-bit pattern of P as we are comparing T[4] = d with P, 5.V 5 = c-bit pattern of P as we are comparing T[5] = c with P. and

25 25 T = bcbdc P = abcbd (1)T[1]=b. We want to decide whether P[5] = T[1] = b. b-bit vector of P = 0 1 0 1 0 The last bit is 0 ≠1. T[1] ≠ P[5] Besides, we know that T[1] = P[2] = P[4]

26 26 T = b c b d c P = a b c b d (2) T[2] = c. We want to decide whether T[1]T[2] = bc= P[4]P[5]. c-bit pattern of P = 0 0 1 0 0 AND-operation of T[1]-bit pattern of P and T[2]-bit pattern of P in the following way: 01010 00100 001000 ignore The last bit is 0. The 2-prefix of T ≠ the 2-suffix of P. What does 0 1 0 0 mean ? It means that T[1] = P[2] = b and T[2] = P[3] = c. We keep this result 0 1 0 0. Last bit T[2] = P[3] T[1] = P[2] The result of comparing T[1] and P[5] can be ignored from now on.

27 27 T = b c b d c P = a b c b d Resulting vector = 0 1 0 0 (3) T[3] = b. We want to decide whether T[1]T[2]T[3] = P[3]P[4]P[5]. b-bit pattern of P = 0 1 0 1 0 We only take the last 3 bits, namely 0 1 0 because we are interested in P[3]P[4]P[5]. 0100 010 0100 ignore The last bit is 0. The 3-prefix of T ≠ the 3-suffix of P. 0 1 0 means that T[1] = P[2] = b, T[2] = P[3] (previously obtained) and T[3] = P[4] We keep the resulting vector 010. Result AND-operation Last bit

28 28 T = b c b d c P = a b c b d Resulting vector = 0 1 0 (4) T[4] = d. We want to decide whether T[1]T[2]T[3]T[4] = P[2]P[3]P[4]P[5]. d-bit pattern of P = 0 0 0 0 1 We only take the last 2 bits, namely 0 1. 010 01 010 ignore The last bit is 1. The 4-prefix of T = the 4-suffix of P. 0 1 means that T[1] = P[2] = b, T[2] = P[3]=c, T[3] = P[4]=b (previously obtained) and T[4] = P[5] = d Result AND-operation Last bit

29 29 T = b c b d c P = a b c b d Resulting vector = 0 1 (4)T[5] = c. We want to decide whether T[1]T[2]T[3]T[4]T[5] = P[1]P[2]P[3]P[4]P[5]. c-bit pattern of P = 0 0 1 0 0 We only take the last 1 bits, namely 0. 01 0 01 ignore The last bit is 0. The 5-prefix of T ≠ the 5-suffix of P. 0 means that T[1] = P[2] = b, T[2] = P[3], T[3] = P[4],T[4] = P[5] (previously obtained) Result AND-operation

30 30 The Logic Operator (AND &) 1 & 1 = 1 1 & 0 = 0 0 & 1 = 0 0 & 0 = 0 Bit Pattern Of String - BPS Given a string S which is composed of n characters. S = abcabcabc S is composed of 3 characters which are a, b and c. BPS means to make bit patterns where each pattern represents each character appeared position in string. a-bit pattern = 1 0 0 1 0 0 1 0 0 b-bit pattern = 0 1 0 0 1 0 0 1 0 c-bit pattern = 0 0 1 0 0 1 0 0 1 Definition:

31 31 ww( T =t 1 t 2 …t m, P=p 1 p 2 …p n ) Preprocessing Find the character set of P Build the character_bit pattern of P the character_rbit pattern of inversed P Search For k do Open a wide window whose length is 2m-1 and its center point is at km Let the window be denoted as a 1 a 2 …a 2m-1 Let a 1 a 2 …a m-1 be denoted as T 1 Let a m a m+1 …a 2m-1 be denoted as T 2 /*we use modified convolution method to find out the matching*/ Find out all prefixes of T 2 which are the suffix of P. (page 33-34) state 1: Find out the corresponding prefixes of P which are the suffix of T 1.(page 35-36) /*each time we can jump the wide window |P|*/ state 2: End For The Algorithm

32 32 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T 1 is equal to the prefix of P. Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T 2 is equal to the suffix of P.

33 33 Suf_bit = 1 … 1 // temporary space for storing the result |Suf_bit| = | T 2 | // the length of tem space is equal to the length of T 2 x = 1 // x is the index for reading T 2 Read T 2 from left to right // if the reading character of T 2 is one of the characters of P. if the character which belongs to the character set of P // We use AND-operation to simulate the convolution method // After each simulation, we store the result into the temporary space Suf_bit[|P|…x] = Suf_bit[|P|….x] & character_bit[(|P|-x+1)...1] /*check whether the |P| th to x th bit of Suf_bit are zeros, they are all zeros means no more prefix of T 2 will be equal to the suffix of P. Therefore we can skip the remaining reading character from T 2. */ if the |P| th to x th bit of Suf_bit are all zero goto state 1 end if else // if the reading character of T 2 is not one of the characters of P. Set the |P| th to x th bit of Suf_bit to zero. goto state 2 // finish the reading from T 2 end if Find out all prefixes of T 2 which are the suffix of P.

34 34 //if the x th bit of suf_bit is 1, x-suffix of T 2 is equal to the x-prefix of P if Suf_bit[x] == 1 if x == |P|, //if the length of suffix of T 2 is equal to the length of P, we found a matching we found a matching at km else we found x-suffix end if x ++ // increase the index for reading next character Read next character Fig :: Find out all prefixes of T2 which are the suffix of P.

35 35 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T 1 is equal to the prefix of P.

36 36 state 1: //if in the previous processing, we did not find any prefix of T 2 is equal to the suffix of P //We do not need to find any corresponding suffix of T 1 is equal to the prefix of P if the |P| th to 1 st bit of Suf_bit are all zero goto state 2 else Pre_bit = 1 …1 // temporary space for storing the result y = |Pre_bit| = | T 1 | // the length of tem space is equal to the length of T 1 z = 1 // z is the index for reading T 1 Read T 1 from right to left // if the reading character of T 1 is one of the characters of P if the character which belongs to the character set of P, // We use AND-operation to simulate the convolution method Pre_bit[y…z] = Pre_bit[y...z ] & character_rbit[( y-z+1)...1 ] /* if the (|P|-1) th to y th bit of Pre_bit are zeros, they are all zeros means no more suffix of T 1 will be equal to the prefix of P. Therefore we can skip the remaining reading character from T 1. */ Find out the corresponding prefixes of P which are the suffix of T 1.

37 37 if the (|P|-1) th to y th bit of Pre_bit are all zero goto state 2 end if else // if the reading character of T 1 is not one of the characters of P. goto state 2 // finish the reading character from T 1 end if //if the x th bit of Pre_bit is 1, x-suffix of T 1 is equal to the x-prefix of P if (Pre_bit[z] == 1) then /* if we found a suffix of T 1 is equal to the a prefix of P, we need to check the whether the corresponding prefix of T 2 appeared in the Suf_bit pattern. */ if ( Pre_bit[z] & Suf_bit[|P|-z]) Found a matching at km – y end if z ++ Read next character end if state 2: Fig :: Find out the corresponding prefixes of P which are the suffix of T1.

38 38 Example: T = aababcbdc P = abcbd Let us produce a wide windows where length is |P| - 1 + |P| = 2|P| - 1 In this case, |P|=5, 2|P| - 1 = 9 aababcbdc T = |P|-1|P||P| T2T2 T1T1

39 39 Preprocessing Build character bit pattern of P P = abcbd Find all bit patterns of P, P is composed of a, b, c, b, d. The character set of P = {a, b, c, b, d} a_bit = 1 0 0 0 0 b_bit = 0 1 0 1 0 c_bit = 0 0 1 0 0 d_bit = 0 0 0 0 1 Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T 2 is equal to the suffix of P.

40 40 Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T 2 is equal to the suffix of P. Build character bit pattern of reversed P P = abcbd Find all character bit patterns of reversed P, P is composed of a, b, c, b, d. The character set of P = {a, b, c, b, d} a_rbit = 0 0 0 0 1 b_rbit = 0 1 0 1 0 c_rbit = 0 0 1 0 0 d_rbit = 1 0 0 0 0

41 41 the character is ‘b’ ∴ Suf_bit[5...1] = Suf_bit[5...1] & b_bit[5...1] ∵ the last bit is ‘0’, no1-suffix of T 2 is equal to 1-prefix of P Suf_bit [5…1] = 0 1 0 1 0 P=abcbd =cdbcb 01010 00100 01010 &00001 00100 000001000 =11111&01010 =01010 Step 1T2T2 =bcbdc Pabcbd T2T2 bcbdc 0

42 42 the character is ‘c’ ∴ Suf_bit[5...2] = Suf_bit[5…2] & c_bit[4…1] Suf_bit [5…1] = 0 1 0 0 0 P=abcbd =cdbcb 01010 00100 01010 &00001 00100 000001000 =0101&0100 =0100 Step 2T2T2 =bcbdc Pabcbd T2T2 bcbdc 10 ∵ the last bit is ‘0’, no2-suffix of T 2 is equal to 2-prefix of P

43 43 the character is ‘b’ ∴ Suf_bit[5...3] = Suf_bit[5...3] & b_bit[3...1] Suf_bit [5…1] = 0 1 0 0 0 P=abcbd =cdbcb 01010 00100 01010 &00001 00100 000001000 =010&010 =010 Step 3T2T2 =bcbdc Pabcbd T2T2 bcbdc 000 ∵ the last bit is ‘0’, no 3-suffix of T 2 is equal to 3-prefix of P

44 44 the character is ‘d’ ∴ Suf_bit[5...4] = Suf_bit[5…4] & c_bit[2...1] Suf_bit [5…1] = 0 1 0 0 0 P=abcbd =cdbcb 01010 00100 01010 &00001 00100 000001000 =01&01 =01 Step 4T2T2 =bcbdc Pabcbd T2T2 bcbdc 1111 ∵ the last bit is ‘1’, 4-suffix of T 2 is equal to 4-prefix of P

45 45 We have found one suffix which is 4-suffix. The corresponding prefix which we need to find is (|P|-4)-prefix. If we found, we got a matching. the character is ‘c’ ∴ Suf_bit[5...5] = Suf_bit[5...5] & c_bit[1...1] ∴ Suf-bit [5…1] = 0 1 0 0 0 P=abcbd =cdbcb 01010 00100 01010 &00001 00100 000001000 =0&0 =0 Step 5T2T2 =bcbdc Pabcbd T2T2 bcbdc 00000 ∵ the last bit is ‘0’, no5-suffix of T 2 is equal to 5-prefix of P

46 46 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T 1 is equal to the prefix of P.

47 47 Pre_bit[4...1] = Pre_bit[4...1] & a_rbit[4...1] if (Pre_bit[1] == 1) then if ( Pre_bit[1] & Suf_bit[5-1]) Found a matching ∴ Suf-bit [5…1] = 0 1 0 0 0 ∴ Pre-bit [4…1] = 0 0 0 1 Check |prefix|+|suffix| = |P| ? =dbcba T1=T1=aaba 00001 01010 00001 &00001 00000001 =1111&0001 =0001 Step 1T1T1 =aaba Pabcbd T2T2 aaba 1 ∵ the last bit is ‘0’, 1-prefix of T 1 is equal to 1-suffix of P

48 48 Pre_bit[4…2] = Pre_bit[4...2] & b_rbit[3...1] if (Pre_bit[2] == 1) then if ( Pre_bit[2] & Suf_bit[5-2])  no need to check Pre-bit[4..1] = 0 0 0 1 =dbcba T1=T1=aaba 00001 01010 00001 &00001 00000001 =000&0001 =000 Step 2T1T1 =aaba Pabcbd T2T2 aaba 00 ∵ the last bit is ‘0’, no 2-prefix of T 1 is equal to 2-suffix of P

49 49 Pre_bit[4...3] = Pre_bit[4...3] & a_rbit[2...1] if (Pre_bit[3] == 1) then if ( Pre_bit[3] & Suf_bit[5-3])  no need to check Pre-bit[4..1] = 0 0 0 1 =dbcba T1=T1=aaba 00001 01010 00001 &00001 00000001 =00&01 =00 Step 3T1T1 =aaba Pabcbd T2T2 aaba 110 ∵ the last bit is ‘0’, no 3-prefix of T 1 is equal to 3-suffix of P

50 50 Pre_bit[4...4] = Pre_bit[4…4] & a_rbit[1…1] if (Pre-bit[4] == 0) then if ( Pre-bit[4] & Suf-bit[5-4])  no need to check Pre-bit[4..1] = 0 0 0 1 =dbcba T1=T1=aaba 00001 01010 00001 &00001 00000001 =0&0 =0 Step 3T1T1 =aaba ∵ the last bit is ‘0’, no 3-prefix of T 1 is equal to 3-suffix of P Pabcbd T1T1 aaba 1000

51 51 References [1] Simple optimal string matching algorithm, C. Allauzen, M. Raffinot, J. Algorithms 36 (1) (2000) 102–116. [2] A new approach to text searching, R. Baeza-Yates, G.H. Gonnet, Comm. ACM 35 (10) (1992) 74–82. [3] A fast string searching algorithm, R.S. Boyer, J.S. Moore, Comm. ACM 20 (10) (1977) 62–72. [4] Handbook of Exact String Matching Algorithms, C. Charras, T. Lecroq, King’s College London Publications, 2004. [5] A very fast string matching algorithm for small alphabets and long Patterns, C. Charras, T. Lecroq, J.D. Pehoushek,, in: M. Farach-Colton (Ed.), Proc. of the 9thAnn.Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 1448, Springer, Piscataway, NJ, USA, 1998, pp. 55–64. [6] Transducers and repetitions, M. Crochemore, Theoret. Comput. Sci. 45 (1) (1986) 63–86.

52 52 [7] Off-line serial exact string searching, M. Crochemore, in: A. Apostolico, Z. Galil (Eds.), Pattern Matching Algorithms, OxfordUni versity Press, Oxford, 1997, pp. 1–53, (Chapter 1). [8] Reducing space for index implementation, M. Crochemore, Theoret. Comput. Sci. 292 (1) (2003) 185–197. [9] Speeding up two string-matching algorithms, M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq,W. Plandowski,W. Rytter, Algorithmica 12 (4/5) (1994) 247–267. [10] Automata for matching patterns, M. Crochemore, C. Hancart, in: G. Rozenberg,A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 2, Linear Modeling: Background and Application, Springer, Berlin, 1997, pp. 399–462 (Chapter 9). [11] Text algorithms, M. Crochemore,W. Rytter, OxfordUniversity Press, Oxford, 1994, 412pp.

53 53 [12] Jewels of Stringology, M. Crochemore,W. Rytter, WorldScientific, Singapore, 2002. [13] Linear nondeterministic dawg string matching algorithm, L. He, B. Fang, in: A. Alberto, M. Massimo (Eds.), String Processing andInformation Retrieval, 11th Internat. Symp. (SPIRE 2004), Lecture Notes in Computer Science, Vol. 3246, Springer, Padova, Italy, 2004, pp. 70–71. [14] Fast string searching, A. Hume, D. Sunday, Software Pract. Exper. 21 (11) (1991) 1221–1248. [15] Fast pattern matching in strings, D.E. Knuth, J.H. Morris, V.R. Pratt, SIAM J. Comput. 6 (2) (1977) 323–350. [16] A variation on the Boyer–Moore algorithm, T. Lecroq, Theoret. Comput. Sci. 92 (1) (1992) 119–144.

54 54 [17] Fast and flexible string matching by combining bit-parallelism and suffix automata, G. Navarro, M. Raffinot, ACM J. Exp. Algorithmics (JEA) 5 (4) (2000) 1–36. [18] Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, G. Navarro, M. Raffinot, Cambridge University Press, Cambridge, 2002. [19] Alternative algorithms for bit-parallel string matching, H. Peltola, J. Tarhio, in: M.A. Nascimento, E.S. de Moura, A.L. Oliveira (Eds.), Proc. 10th Internat. Symp. on String Processing and Information Retrieval (SPIRE’03), Lecture Notes in Computer Science, Vol. 2857, Springer, Manaus, Brazil, 2003, pp. 80–94. [20] Asymptotic estimation of the average number of terminal states in dawgs, M. Raffinot, in: R. Baeza-Yates (Ed.), Proc. 4th SouthAmericanWorkshop on String Processing, Carleton University Press,Valparaiso, Chile, 1997, pp. 140–148. [21] On the multi backward dawg matching algorithm (MultiBDM), M. Raffinot, in: R. Baeza-Yates (Ed.), Proc.4th South American Workshop on String Processing, Carleton University Press, Valparaiso, Chile, 1997, pp. 149–165.

55 55 [22] Computing Patterns in Strings, W.F. Smyth, Pearson AddisonWesley, 2003. [23] A very fast substring search algorithm, D.M. Sunday, Comm. ACM 33 (8) (1990) 132–142. [24] Average case analysis of the boyer–moore algorithm, T.-H. Tsai, in: http://www.stat.sinica.edu.tw/chonghi/stat.htm, 2003. [25] The complexity of pattern matching for a random string, A.C.C. Yao, SIAM J. Comput. 8 (3) (1979) 368–387.

56 56 Thank you


Download ppt "1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005,"

Similar presentations


Ads by Google