1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005,

Slides:

Advertisements

Similar presentations

1 Very fast and simple approximate string matching Information Processing Letters, 72:65-70, G. Navarro and R. Baeza-Yates Advisor: Prof. R. C. T.

Advertisements

1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.

Speaker: C. C. Lin Adviser: R. C. T. Lee

Tuned Boyer Moore Algorithm

Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu

1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American.

Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.

1 String Matching of Bit Parallel Suffix Automata.

1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California,

Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.

Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen

1 The Colussi Algorithm Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen Correctness and Efficiency of Pattern Matching Algorithms Information and Computation,

1 Reverse Factor Algorithm Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen Speeding up on two string matching algorithms, Algorithmica, Vol.12, 1994, pp

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 2: Boyer-Moore Algorithm.

1 Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng Two exact string matching algorithms using suffix to prefix rule.

1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.

1 String Matching Algorithms Based upon the Uniqueness Property Advisor ： Prof. R. C. T. Lee Speaker ： C. W. Lu C. W. Lu and R. C. T. Lee, 2007, String.

Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.

1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.

1 Two Way Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. C. Yen Two-way string-matching Journal of the ACM 38(3): , 1991 Crochemore M., Perrin.

Boyer-Moore Algorithm 3 main ideas –right to left scan –bad character rule –good suffix rule.

1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,

Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:

1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

1 KMP algorithm Advisor: Prof. R. C. T. Lee Reporter: C. W. Lu KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal.

Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:

1 Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science.

Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)

Reverse Colussi algorithm

Backward Nondeterministic DAWG Matching Algorithm

1 Boyer and Moore Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen A fast string searching algorithm. Communications of the ACM. Vol. 20 p.p ,

Indexing and Searching

Algorithms and Data Structures. /course/eleg67701-f/Topic-1b2 Outline  Data Structures  Space Complexity  Case Study: string matching Array implementation.

1 Turbo-BM Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen Deux méthodes pour accélérer l'algorithme de Boyer-Moore, Théorie des Automates et Applications.,

1 Boyer-Moore Charles Yan Exact Matching Boyer-Moore ( worst-case: linear time, Typical: sublinear time ) Aho-Corasik ( A set of pattern )

1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.

A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.

On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.

Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)

String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.

KMP String Matching Prepared By: Carlens Faustin.

1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,

Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku

Semi-Numerical String Matching. All the methods we’ve seen so far have been based on comparisons. We propose alternative methods of computation such as:

20/10/2015Applied Algorithmics - week31 String Processing  Typical applications: pattern matching/recognition molecular biology, comparative genomics,

Tamanna Chhabra, Sukhpal Singh Ghuman, Jorma Tarhio Tuning Algorithms for Jumbeled Matching.

Great Theoretical Ideas in Computer Science.

MCS 101: Algorithms Instructor Neelima Gupta

Exact String Matching Algorithms: A Survey Mehreen Ali, Hina Naz Khan, Shumaila Sayyab, Nadeem Iftikhar Department of Bio-Science Mohammad Ali Jinnah University,

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.

Application: String Matching By Rong Ge COSC3100

MCS 101: Algorithms Instructor Neelima Gupta

Hannu Peltola Jorma Tarhio Aalto University Finland Variations of Forward-SBNDM.

Great Theoretical Ideas In Computer Science John LaffertyCS Fall 2006 Lecture 22 November 9, 2006Carnegie Mellon University b b a b a a a b a b.

Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.

ICS220 – Data Structures and Algorithms Analysis Lecture 14 Dr. Ken Cosh.

Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)

Advanced Data Structure: Bioinformatics

Alternative Algorithms for Lyndon Factorization

Source : Practical fast searching in strings

Recuperació de la informació

Boyer and Moore Algorithm

Boyer and Moore Algorithm

Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University

Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching

Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.

Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.

Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007

Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching

Presentation transcript:

1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005, pp Professor R.C.T Lee Speaker K.W. Liu Department of Computer Science National Chi Nan University

2 Given : a text string T = t 1 t 2 t 3 …t n. a pattern string P = p 1 p 2 p 3 …p m. where |P|≤|T|. Output: All occurrence(s) of the pattern string within the text string. T =ababababaabbaaabbababab P =aabbaaab String Matching Problem Example T = ababababaabbaabbabababa P = aabbaabb

3 Example T = ababababaabbaabbabababa P = aabbaabb ababababaabbaaabbababab aabbaaab aabbaaab aabbaaab aabbaaab Traditional Method

4 In this talk, we shall provide three ideas: 1.The wide-window method 2.The convolution method 3.The bit pattern (modified convolution) method.

5 2|P|-1 |P||P||P|-1 T2T2 T1T1 ▫Open a window with size 2|P|-1. ▫Divide it into two parts: The first one denoted as T 1 is with size |P|-1 The second part denoted as T 2 is with size |P| T Basic Idea of the Wide Window Approach Since |T 1 |<|P|, some suffix of P must be in T 2 if it exists.

6 2|P|-1 |P||P|-1 ▫Find all prefixes of T 2 which are also suffixes of P. ▫Let r denote the length of such a longest prefix. ▫We can be sure that one part of T 2 can be ignored as shown. T2T2 T1T1 r Can be ignored. T

7 ▫For every prefix of T 2 which is a suffix of P, we should find whether there exists a suffix in T 1 which is also a prefix of P. 2|P|-1 |P||P|-1 T2T2 T1T1 T r

8 n-suffix : Given a string S, n-suffix of S is the suffix of S whose length is n. -1< n < |S|+1 Example: S = abcde 0-suffix of S = ε 1-suffix of S = e 2-suffix of S = de 3-suffix of S = cde n-prefix : Given a string S, n-prefix of S is the prefix of S whose length is n. -1< n < |S|+1 Example: S = abcde 0-prefix of S = ε 1-prefix of S = a 2-prefix of S = ab 3-prefix of S = abc Definition:

9 Given: T = aababcbdcea P = abcbd Let us produce a wide window whose length is |P| |P| = 2|P| - 1 In this case, |P|=5, 2|P| - 1 = 9 T =aababcbdcea aababcbdcea T = |P|-1|P| T2T2 T1T1 An Example of the Wide Window Approach

10 We first find all prefixes of T 2 which are equal to some suffixes of P. In this case, we obtain bcbd whose length is 4. |P|-4 = 5-4 = 1 If the 1-suffix of T 1 is the 1-prefix of P, we have found a matching. 1-suffix of T 1 = a 1-prefix of P = a ∴ 1-suffix of T 1 = 1-prefix of P. Thus we conclude that a matching is found. aababcbdcea abcbd T = P = T2T2 T1T1

11 Given:T = ababa P = aba Let us produce a wide window whose length is |P| |P| = 2|P| - 1 In this case, |P| = 3, 2|P| - 1 = 5 T = ababa ababa T = |P|-1|P||P| T2T2 T1T1 Another Example

12 We first find all prefixes of T 2 which are equal to some suffixes of P. In our case, we obtain aba and a where lengths are 3 and 1. |P| - 3 = = 0 (۞ the whole P is equal to T 2 ۞ one matching is found ) |P| - 1 = 3 – 1 = 2 ababa T = |P|-1|P||P| T2T2 T1T1 aba aba P = If the 2-suffix of T 1 is the 1- prefix of P, we have found a matching. 2-suffix of T 1 = ab 2-prefix of P = ab ∴ 2-suffix of T 1 = 2-prefix of P. Thus we conclude that two matchings are found.

13 Question: How can we find a suffix of a string S 1 to be a prefix of S 2 ? Answer : We use the convolution method.

14 Convolution Method T = aabc, P = ab = ba aabc ba T=aabc P=ab 0 T=aabc P=ab 10 T=aabc P=ab 11 T=aabc P=ab 00 T=aabc P=ab 0

15 We may use the convolution method to find all prefixes of T 2 which are equal to some suffixes of P. T 2 = bcbdc, P = abcbd = cdbcb a b c b d c d b c b = P = A 4-suffix of P equal to a prefix of T 2. The unused region to find matching! If any zero appears in the column, we can not get a matching.

16 a b c b d c d b c b = P = Pabcbd T2T2 bcbdc 0 Pabcbd T2T2 bcbdc 10 Pabcbd T2T2 bcbdc 000 Pabcbd T2T2 bcbdc 1111 Pabcbd T2T2 bcbdc May be ignored. No further sliding to the left is needed.

17 a b c b d c d b c b = P = The unused region to find matching! A 4-suffix of P equal to a prefix of T 2 & We may also use the logic operator (AND &) to find all prefixes of T 2 which are equal to some suffixes of P. T 2 = bcbdc P = abcbd

18 We may use the convolution method to find all suffixes of T 1 which are equal to some prefixes of P. T 1 = aaba,P = abcbd = dbcba d b c b a a a b a T 1 = = A 1-prefix of P equal to a suffix of T 1. The unused region to find matching! If any zero appears in the column, we can not get a matching.

19 d b c b a a a b a T 1 = = Pabcbd T1T1 aaba 1 Pabcbd T1T1 aaba 00 Pabcbd T1T1 aaba 110 Pabcbd T1T1 aaba 1000 May be ignored. No further sliding to the right is needed.

20 We may use the logic operator (AND &) to find all suffixes of T 1 which are equal to some prefixes of P. T 1 = aaba P = abcbd d b c b a a a b a T 1 = = The unused region to find matching! & A 1-prefix of P equals to a suffix of T 1. ∴ 1-suffix of T 1 = 1-prefix of P. Thus we conclude that a matching is found.

21 The Bit Pattern Approach Let us consider the following case: T = bcbdc P = abcbd Our job is to determine whether there is a prefix in T which is a suffix of P. Indeed, in this case, we have 4-prefix of T (bcbd) which is also the 4-suffix of P. As indicated before, we may use convolution.

22 P=P=abcbd =cdbcb Convolution A 4-suffix of T is a 4-prefix of P. AND OPERATION V1V1 V2V2 V3V3 V4V4 V5V5 What are the vectors V 1,V 2,…,V 5 ?

23 Given a string S = s 1 s 2 …s n and a character α, the α-bit pattern of S is defined as b 1 b 2 …b n where b i =1 if s i = α and b i =0 if otherwise. Example: S = abcbd a-bit pattern of S = b-bit pattern of S = c-bit pattern of S = d-bit pattern of S =

24 T = b c b d c, P = a b c b d P=P=abcbd =cdbcb AND OPERATION V1V1 V2V2 V3V3 V4V4 V5V5 We can now observe that 1.V 1 = b-bit pattern of P as we are comparing T[1] = b with P, 2.V 2 = c-bit pattern of P as we are comparing T[2] = c with P, 3.V 3 = b-bit pattern of P as we are comparing T[3] = b with P, 4.V 4 = d-bit pattern of P as we are comparing T[4] = d with P, 5.V 5 = c-bit pattern of P as we are comparing T[5] = c with P. and

25 T = bcbdc P = abcbd (1)T[1]=b. We want to decide whether P[5] = T[1] = b. b-bit vector of P = The last bit is 0 ≠1. T[1] ≠ P[5] Besides, we know that T[1] = P[2] = P[4]

26 T = b c b d c P = a b c b d (2) T[2] = c. We want to decide whether T[1]T[2] = bc= P[4]P[5]. c-bit pattern of P = AND-operation of T[1]-bit pattern of P and T[2]-bit pattern of P in the following way: ignore The last bit is 0. The 2-prefix of T ≠ the 2-suffix of P. What does mean ? It means that T[1] = P[2] = b and T[2] = P[3] = c. We keep this result Last bit T[2] = P[3] T[1] = P[2] The result of comparing T[1] and P[5] can be ignored from now on.

27 T = b c b d c P = a b c b d Resulting vector = (3) T[3] = b. We want to decide whether T[1]T[2]T[3] = P[3]P[4]P[5]. b-bit pattern of P = We only take the last 3 bits, namely because we are interested in P[3]P[4]P[5] ignore The last bit is 0. The 3-prefix of T ≠ the 3-suffix of P means that T[1] = P[2] = b, T[2] = P[3] (previously obtained) and T[3] = P[4] We keep the resulting vector 010. Result AND-operation Last bit

28 T = b c b d c P = a b c b d Resulting vector = (4) T[4] = d. We want to decide whether T[1]T[2]T[3]T[4] = P[2]P[3]P[4]P[5]. d-bit pattern of P = We only take the last 2 bits, namely ignore The last bit is 1. The 4-prefix of T = the 4-suffix of P. 0 1 means that T[1] = P[2] = b, T[2] = P[3]=c, T[3] = P[4]=b (previously obtained) and T[4] = P[5] = d Result AND-operation Last bit

29 T = b c b d c P = a b c b d Resulting vector = 0 1 (4)T[5] = c. We want to decide whether T[1]T[2]T[3]T[4]T[5] = P[1]P[2]P[3]P[4]P[5]. c-bit pattern of P = We only take the last 1 bits, namely ignore The last bit is 0. The 5-prefix of T ≠ the 5-suffix of P. 0 means that T[1] = P[2] = b, T[2] = P[3], T[3] = P[4],T[4] = P[5] (previously obtained) Result AND-operation

30 The Logic Operator (AND &) 1 & 1 = 1 1 & 0 = 0 0 & 1 = 0 0 & 0 = 0 Bit Pattern Of String - BPS Given a string S which is composed of n characters. S = abcabcabc S is composed of 3 characters which are a, b and c. BPS means to make bit patterns where each pattern represents each character appeared position in string. a-bit pattern = b-bit pattern = c-bit pattern = Definition:

31 ww( T =t 1 t 2 …t m, P=p 1 p 2 …p n ) Preprocessing Find the character set of P Build the character_bit pattern of P the character_rbit pattern of inversed P Search For k do Open a wide window whose length is 2m-1 and its center point is at km Let the window be denoted as a 1 a 2 …a 2m-1 Let a 1 a 2 …a m-1 be denoted as T 1 Let a m a m+1 …a 2m-1 be denoted as T 2 /*we use modified convolution method to find out the matching*/ Find out all prefixes of T 2 which are the suffix of P. (page 33-34) state 1: Find out the corresponding prefixes of P which are the suffix of T 1.(page 35-36) /*each time we can jump the wide window |P|*/ state 2: End For The Algorithm

32 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T 1 is equal to the prefix of P. Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T 2 is equal to the suffix of P.

33 Suf_bit = 1 … 1 // temporary space for storing the result |Suf_bit| = | T 2 | // the length of tem space is equal to the length of T 2 x = 1 // x is the index for reading T 2 Read T 2 from left to right // if the reading character of T 2 is one of the characters of P. if the character which belongs to the character set of P // We use AND-operation to simulate the convolution method // After each simulation, we store the result into the temporary space Suf_bit[|P|…x] = Suf_bit[|P|….x] & character_bit[(|P|-x+1)...1] /*check whether the |P| th to x th bit of Suf_bit are zeros, they are all zeros means no more prefix of T 2 will be equal to the suffix of P. Therefore we can skip the remaining reading character from T 2. */ if the |P| th to x th bit of Suf_bit are all zero goto state 1 end if else // if the reading character of T 2 is not one of the characters of P. Set the |P| th to x th bit of Suf_bit to zero. goto state 2 // finish the reading from T 2 end if Find out all prefixes of T 2 which are the suffix of P.

34 //if the x th bit of suf_bit is 1, x-suffix of T 2 is equal to the x-prefix of P if Suf_bit[x] == 1 if x == |P|, //if the length of suffix of T 2 is equal to the length of P, we found a matching we found a matching at km else we found x-suffix end if x ++ // increase the index for reading next character Read next character Fig :: Find out all prefixes of T2 which are the suffix of P.

35 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T 1 is equal to the prefix of P.

36 state 1: //if in the previous processing, we did not find any prefix of T 2 is equal to the suffix of P //We do not need to find any corresponding suffix of T 1 is equal to the prefix of P if the |P| th to 1 st bit of Suf_bit are all zero goto state 2 else Pre_bit = 1 …1 // temporary space for storing the result y = |Pre_bit| = | T 1 | // the length of tem space is equal to the length of T 1 z = 1 // z is the index for reading T 1 Read T 1 from right to left // if the reading character of T 1 is one of the characters of P if the character which belongs to the character set of P, // We use AND-operation to simulate the convolution method Pre_bit[y…z] = Pre_bit[y...z ] & character_rbit[( y-z+1)...1 ] /* if the (|P|-1) th to y th bit of Pre_bit are zeros, they are all zeros means no more suffix of T 1 will be equal to the prefix of P. Therefore we can skip the remaining reading character from T 1. */ Find out the corresponding prefixes of P which are the suffix of T 1.

37 if the (|P|-1) th to y th bit of Pre_bit are all zero goto state 2 end if else // if the reading character of T 1 is not one of the characters of P. goto state 2 // finish the reading character from T 1 end if //if the x th bit of Pre_bit is 1, x-suffix of T 1 is equal to the x-prefix of P if (Pre_bit[z] == 1) then /* if we found a suffix of T 1 is equal to the a prefix of P, we need to check the whether the corresponding prefix of T 2 appeared in the Suf_bit pattern. */ if ( Pre_bit[z] & Suf_bit[|P|-z]) Found a matching at km – y end if z ++ Read next character end if state 2: Fig :: Find out the corresponding prefixes of P which are the suffix of T1.

38 Example: T = aababcbdc P = abcbd Let us produce a wide windows where length is |P| |P| = 2|P| - 1 In this case, |P|=5, 2|P| - 1 = 9 aababcbdc T = |P|-1|P||P| T2T2 T1T1

39 Preprocessing Build character bit pattern of P P = abcbd Find all bit patterns of P, P is composed of a, b, c, b, d. The character set of P = {a, b, c, b, d} a_bit = b_bit = c_bit = d_bit = Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T 2 is equal to the suffix of P.

40 Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T 2 is equal to the suffix of P. Build character bit pattern of reversed P P = abcbd Find all character bit patterns of reversed P, P is composed of a, b, c, b, d. The character set of P = {a, b, c, b, d} a_rbit = b_rbit = c_rbit = d_rbit =

41 the character is ‘b’ ∴ Suf_bit[5...1] = Suf_bit[5...1] & b_bit[5...1] ∵ the last bit is ‘0’, no1-suffix of T 2 is equal to 1-prefix of P Suf_bit [5…1] = P=abcbd =cdbcb & =11111&01010 =01010 Step 1T2T2 =bcbdc Pabcbd T2T2 bcbdc 0

42 the character is ‘c’ ∴ Suf_bit[5...2] = Suf_bit[5…2] & c_bit[4…1] Suf_bit [5…1] = P=abcbd =cdbcb & =0101&0100 =0100 Step 2T2T2 =bcbdc Pabcbd T2T2 bcbdc 10 ∵ the last bit is ‘0’, no2-suffix of T 2 is equal to 2-prefix of P

43 the character is ‘b’ ∴ Suf_bit[5...3] = Suf_bit[5...3] & b_bit[3...1] Suf_bit [5…1] = P=abcbd =cdbcb & =010&010 =010 Step 3T2T2 =bcbdc Pabcbd T2T2 bcbdc 000 ∵ the last bit is ‘0’, no 3-suffix of T 2 is equal to 3-prefix of P

44 the character is ‘d’ ∴ Suf_bit[5...4] = Suf_bit[5…4] & c_bit[2...1] Suf_bit [5…1] = P=abcbd =cdbcb & =01&01 =01 Step 4T2T2 =bcbdc Pabcbd T2T2 bcbdc 1111 ∵ the last bit is ‘1’, 4-suffix of T 2 is equal to 4-prefix of P

45 We have found one suffix which is 4-suffix. The corresponding prefix which we need to find is (|P|-4)-prefix. If we found, we got a matching. the character is ‘c’ ∴ Suf_bit[5...5] = Suf_bit[5...5] & c_bit[1...1] ∴ Suf-bit [5…1] = P=abcbd =cdbcb & =0&0 =0 Step 5T2T2 =bcbdc Pabcbd T2T2 bcbdc ∵ the last bit is ‘0’, no5-suffix of T 2 is equal to 5-prefix of P

46 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T 1 is equal to the prefix of P.

47 Pre_bit[4...1] = Pre_bit[4...1] & a_rbit[4...1] if (Pre_bit[1] == 1) then if ( Pre_bit[1] & Suf_bit[5-1]) Found a matching ∴ Suf-bit [5…1] = ∴ Pre-bit [4…1] = Check |prefix|+|suffix| = |P| ? =dbcba T1=T1=aaba & =1111&0001 =0001 Step 1T1T1 =aaba Pabcbd T2T2 aaba 1 ∵ the last bit is ‘0’, 1-prefix of T 1 is equal to 1-suffix of P

48 Pre_bit[4…2] = Pre_bit[4...2] & b_rbit[3...1] if (Pre_bit[2] == 1) then if ( Pre_bit[2] & Suf_bit[5-2])  no need to check Pre-bit[4..1] = =dbcba T1=T1=aaba & =000&0001 =000 Step 2T1T1 =aaba Pabcbd T2T2 aaba 00 ∵ the last bit is ‘0’, no 2-prefix of T 1 is equal to 2-suffix of P

49 Pre_bit[4...3] = Pre_bit[4...3] & a_rbit[2...1] if (Pre_bit[3] == 1) then if ( Pre_bit[3] & Suf_bit[5-3])  no need to check Pre-bit[4..1] = =dbcba T1=T1=aaba & =00&01 =00 Step 3T1T1 =aaba Pabcbd T2T2 aaba 110 ∵ the last bit is ‘0’, no 3-prefix of T 1 is equal to 3-suffix of P

50 Pre_bit[4...4] = Pre_bit[4…4] & a_rbit[1…1] if (Pre-bit[4] == 0) then if ( Pre-bit[4] & Suf-bit[5-4])  no need to check Pre-bit[4..1] = =dbcba T1=T1=aaba & =0&0 =0 Step 3T1T1 =aaba ∵ the last bit is ‘0’, no 3-prefix of T 1 is equal to 3-suffix of P Pabcbd T1T1 aaba 1000

51 References [1] Simple optimal string matching algorithm, C. Allauzen, M. Raffinot, J. Algorithms 36 (1) (2000) 102–116. [2] A new approach to text searching, R. Baeza-Yates, G.H. Gonnet, Comm. ACM 35 (10) (1992) 74–82. [3] A fast string searching algorithm, R.S. Boyer, J.S. Moore, Comm. ACM 20 (10) (1977) 62–72. [4] Handbook of Exact String Matching Algorithms, C. Charras, T. Lecroq, King’s College London Publications, [5] A very fast string matching algorithm for small alphabets and long Patterns, C. Charras, T. Lecroq, J.D. Pehoushek,, in: M. Farach-Colton (Ed.), Proc. of the 9thAnn.Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 1448, Springer, Piscataway, NJ, USA, 1998, pp. 55–64. [6] Transducers and repetitions, M. Crochemore, Theoret. Comput. Sci. 45 (1) (1986) 63–86.

52 [7] Off-line serial exact string searching, M. Crochemore, in: A. Apostolico, Z. Galil (Eds.), Pattern Matching Algorithms, OxfordUni versity Press, Oxford, 1997, pp. 1–53, (Chapter 1). [8] Reducing space for index implementation, M. Crochemore, Theoret. Comput. Sci. 292 (1) (2003) 185–197. [9] Speeding up two string-matching algorithms, M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq,W. Plandowski,W. Rytter, Algorithmica 12 (4/5) (1994) 247–267. [10] Automata for matching patterns, M. Crochemore, C. Hancart, in: G. Rozenberg,A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 2, Linear Modeling: Background and Application, Springer, Berlin, 1997, pp. 399–462 (Chapter 9). [11] Text algorithms, M. Crochemore,W. Rytter, OxfordUniversity Press, Oxford, 1994, 412pp.

53 [12] Jewels of Stringology, M. Crochemore,W. Rytter, WorldScientific, Singapore, [13] Linear nondeterministic dawg string matching algorithm, L. He, B. Fang, in: A. Alberto, M. Massimo (Eds.), String Processing andInformation Retrieval, 11th Internat. Symp. (SPIRE 2004), Lecture Notes in Computer Science, Vol. 3246, Springer, Padova, Italy, 2004, pp. 70–71. [14] Fast string searching, A. Hume, D. Sunday, Software Pract. Exper. 21 (11) (1991) 1221–1248. [15] Fast pattern matching in strings, D.E. Knuth, J.H. Morris, V.R. Pratt, SIAM J. Comput. 6 (2) (1977) 323–350. [16] A variation on the Boyer–Moore algorithm, T. Lecroq, Theoret. Comput. Sci. 92 (1) (1992) 119–144.

54 [17] Fast and flexible string matching by combining bit-parallelism and suffix automata, G. Navarro, M. Raffinot, ACM J. Exp. Algorithmics (JEA) 5 (4) (2000) 1–36. [18] Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, G. Navarro, M. Raffinot, Cambridge University Press, Cambridge, [19] Alternative algorithms for bit-parallel string matching, H. Peltola, J. Tarhio, in: M.A. Nascimento, E.S. de Moura, A.L. Oliveira (Eds.), Proc. 10th Internat. Symp. on String Processing and Information Retrieval (SPIRE’03), Lecture Notes in Computer Science, Vol. 2857, Springer, Manaus, Brazil, 2003, pp. 80–94. [20] Asymptotic estimation of the average number of terminal states in dawgs, M. Raffinot, in: R. Baeza-Yates (Ed.), Proc. 4th SouthAmericanWorkshop on String Processing, Carleton University Press,Valparaiso, Chile, 1997, pp. 140–148. [21] On the multi backward dawg matching algorithm (MultiBDM), M. Raffinot, in: R. Baeza-Yates (Ed.), Proc.4th South American Workshop on String Processing, Carleton University Press, Valparaiso, Chile, 1997, pp. 149–165.

55 [22] Computing Patterns in Strings, W.F. Smyth, Pearson AddisonWesley, [23] A very fast substring search algorithm, D.M. Sunday, Comm. ACM 33 (8) (1990) 132–142. [24] Average case analysis of the boyer–moore algorithm, T.-H. Tsai, in: [25] The complexity of pattern matching for a random string, A.C.C. Yao, SIAM J. Comput. 8 (3) (1979) 368–387.

56 Thank you