1 Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng Two exact string matching algorithms using suffix to prefix rule.

Slides:



Advertisements
Similar presentations
1 Very fast and simple approximate string matching Information Processing Letters, 72:65-70, G. Navarro and R. Baeza-Yates Advisor: Prof. R. C. T.
Advertisements

1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Tuned Boyer Moore Algorithm
MSc Bioinformatics for H15: Algorithms on strings and sequences
Suffix Trees Construction and Applications João Carreira 2008.
Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu
1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American.
Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.
Lecture 27. String Matching Algorithms 1. Floyd algorithm help to find the shortest path between every pair of vertices of a graph. Floyd graph may contain.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California,
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32.
Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen
1 The Colussi Algorithm Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen Correctness and Efficiency of Pattern Matching Algorithms Information and Computation,
1 Reverse Factor Algorithm Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen Speeding up on two string matching algorithms, Algorithmica, Vol.12, 1994, pp
1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.
1 String Matching Algorithms Based upon the Uniqueness Property Advisor : Prof. R. C. T. Lee Speaker : C. W. Lu C. W. Lu and R. C. T. Lee, 2007, String.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 2: KMP Algorithm Lecturer:
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.
1 Two Way Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. C. Yen Two-way string-matching Journal of the ACM 38(3): , 1991 Crochemore M., Perrin.
Boyer-Moore Algorithm 3 main ideas –right to left scan –bad character rule –good suffix rule.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
1 KMP algorithm Advisor: Prof. R. C. T. Lee Reporter: C. W. Lu KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal.
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Exact and Approximate Pattern in the Streaming Model Presented by - Tanushree Mitra Benny Porat and Ely Porat 2009 FOCS.
1 The Galil-Giancarlo algorithm Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang On the exact complexity of string matching: upper bounds, SIAM Journal.
The Zhu-Takaoka Algorithm
Reverse Colussi algorithm
Backward Nondeterministic DAWG Matching Algorithm
1 Boyer and Moore Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen A fast string searching algorithm. Communications of the ACM. Vol. 20 p.p ,
Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee
Aho-Corasick Algorithm Generalizes KMP to handle sets of strings New ideas –keyword trees –failure functions/links –output links.
1 Turbo-BM Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen Deux méthodes pour accélérer l'algorithme de Boyer-Moore, Théorie des Automates et Applications.,
Linear Time Algorithms for Finding and Representing all Tandem Repeats in a String Dan Gusfield and Jens Stoye Journal of Computer and System Science 69.
The Galil-Giancarlo algorithm
Pattern Matching1. 2 Outline Strings Pattern matching algorithms Brute-force algorithm Boyer-Moore algorithm Knuth-Morris-Pratt algorithm.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Boyer Moore Algorithm Idan Szpektor. Boyer and Moore.
MCS 101: Algorithms Instructor Neelima Gupta
Exact String Matching Algorithms: A Survey Mehreen Ali, Hina Naz Khan, Shumaila Sayyab, Nadeem Iftikhar Department of Bio-Science Mohammad Ali Jinnah University,
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Book: Algorithms on strings, trees and sequences by Dan Gusfield Presented by: Amir Anter and Vladimir Zoubritsky.
MCS 101: Algorithms Instructor Neelima Gupta
Exact String Matching Algorithms Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU.
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Finding approximate occurrences of a pattern that contains gaps Inbok Lee Costas S. Iliopoulos Alberto Apostolico Kunsoo Park.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Advanced Data Structure: Bioinformatics
Source : Practical fast searching in strings
Recuperació de la informació
Boyer and Moore Algorithm
Boyer and Moore Algorithm
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
KMP String Matching Donald Knuth Jim H. Morris Vaughan Pratt 1997.
Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Presentation transcript:

1 Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng Two exact string matching algorithms using suffix to prefix rule

2 Speeding up on two string matching algorithms Algorithmica, Vol.12, 1994, pp CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., LECROQ, T., PLANDOWSKI, W. and RYTTER, W.

3 Problem Definition : We are given a text string and a pattern string and we want to find all occurrences of P in T.

4 Consider the following example: There are two occurrences of P in T as shown below:

5 Rule 1: The Suffix to Prefix Rule For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

6 Basic Ideas Open a window W with size |P| in the text. T |P||P| W p Find the longest suffix of W is also the prefix of pattern. T |P||P| p W Match! Case 1:

7 T |P||P| W p Case 2: T |P||P| W p T |P||P| W p Case 3: |P||P| If there is no such suffix, we move W with length |P|.

8 Preprocessing phase T=GCATCGGCGAGAGTATACAGTACG P=GCAGAGAG G A G A G GA C C C CA We construct the suffix automaton of P. Suffix Automaton

9 Preprocessing: Construct a Suffix Tree of the reverse of Pattern P R : the reversal string of P

10 GCATCGCAGAGAGTATACAGTACG GCAGAGAG When there is a match, how do we move the window? T P

11 GCATCGCAGAGAGTATACAGTACG GCAGAGAG T P

12 GCATCGCAGGCAGTATACAGTACG GCAGAGAG T P Find the longest suffix of W is also the prefix of pattern.

13 GCATCGCAGGCAGTATACAGTACG GCAGAGAG T P

14 A Whole Example T=GCATCGCAGAGA GTATACAGTACG P=GCAGAGAG First attempt : GCATCGCAGAGAGTATACAGTACG GCAGAGAG Shift by: 5 (8 - 3) T P

15 GCATCGCAGAGAGTATACAGTACG GCAGAGAG Second attempt : Shift by: 7 (8 - 1) T P

16 Third attempt: GCATCGCAGAGAGTATACAGTACG GCAGAGAG Shift by: 7 (8 - 1) T P

17 Third attempt: GCATCGCAGAGAGTATACAGTACG GCAGAGAG T P

18 Conclusion Preprocessing phase is O (m). Searching phase is O (mn).

19 Reference [A90]Algorithms for finding patterns in strings, A. V. Aho, Handbook of Theoretical Computer Science, Vol. A, Elsevier, Amsterdam, 1990, pp [A85]The myriad virtues of suffix trees, Apostolico, A., Combinatorial Algorithms on words, NATO Advanced Science Institutes, Series F, Vol. 12, 1985, pp [AG86]The Boyer-Moore-Galil string searching strategies revisited, Apostolico, A. and Giancarlo, R., SIAM, Comput. 15, 1986, pp [BR92]Average running time of the Boyer-Moore-Horspool algorithm, Baeza-Yates, R. A. and Regnier, M. Theoret. Comput. Sci., 1992, pp [BKR91]Analysis of algorithms and Data Structures, Banachowski, L., Kreczmar, A. and Rytter, W., Addison- Wesley. Reading, MA,1991.

20 Speeding up on two string matching algorithms Algorithmica, Vol.12, 1994, pp CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., LECROQ, T., PLANDOWSKI, W. and RYTTER, W.

21 A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science 1448, Springer-Verlag, Berlin, 14-31, NAVARRO G., RAFFINOT M.,

22 Problem Definition : We are given a text string and a pattern string and we want to find all occurrences of P in T.

23 This algorithm compares the pattern P with T within a sliding window. And the sliding window slides from left to right. Example: Text : ABDAACDGAEEGGGGJJ Pattern : ACDAAC sliding window

24 Text : ABDAACDGAEEGGGGJJ Pattern : ACDAAC sliding window Example:

25 Text : ABDAACDGAEEGGGGJJ Pattern : ACDAAC sliding window Example:

26 Basic idea In this algorithm, we want to find the longest prefix of the pattern which is equal to the suffix of the window.

27 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We want to find the suffix of “BDDCACDAD” which is a longest prefix of the pattern.

28 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We find all substrings ”D” in the pattern.

29 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: Actually, it means that we compare the windows as above. ACDADCEAD

30 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: mismatch Then we try to find out all substrings ”AD” in the pattern.

31 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We succeed in finding all substrings ”AD” in the pattern.

32 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: mismatch We try to find out all substrings ”DAD” in the pattern.

33 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We find all substrings ”DAD” in the pattern.

34 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We try to find all substrings ”CDAD” in the pattern.

35 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We try to find all substrings ”ACDAD” in the pattern.

36 Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD Example: We can align the pattern and the text with the longest prefix of the pattern to the suffix of the window.

37 Why do we want to find the longest suffix of the text in the sliding window which is also a prefix of pattern? We will explain this by the following idea.

38 P: T: u: u u Case 1: u is not a prefix of P, and no prefix of P is equal to the suffix of the window.

39 P: T: u u So, we can shift the pattern as below. u:

40 Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD Example: P must be shifted in such a way to avoid comparing any part of P with “DDAD”.

41 Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD Example: So, we can shift the pattern as above.

42 P: T: u: u u Case 2: u is not a prefix of P.

43 P: T: v : v u: u u But a suffix v of the window of T may be a prefix of P. v

44 P: T: v : v u: u u So, we can shift pattern as below.

45 Text: ABCABCABA Pattern: CABBCAD Example: “BCA” is a the longest suffix of “ABCABCA” which is also a substring of pattern “CA” is a suffix of “BCA” which is a prefix of the pattern.

46 Text: ABCABCABA Pattern: CABBCAD Example: So we can shift as above.

47 The idea that we explained above is the main idea of this algorithm, and we will use bit-parallel method to implement this algorithm.

48 Here, we explain how to use bit-parallel to find the substring of a pattern which is equal to a suffix of the window. Text: ABCABCCBA,∑={A,B,C} Example: Pattern: ACBCCBB

49 Text: ABCABCCBA Pattern: ACBCCBB Example: For every character exists in both Text and Pattern, we build: Pattern: ACBCCBB A: B: C: others:

50 Text: ABCABCCBA Pattern: ACBCCBB Example: Pattern: ACBCCBB A: B: C: other: We use a mask D to record some information. D:

51 Text: ABCABCCBA Pattern: ACBCCBB Example: Pattern: ACBCCBB A: B: C: other: D:

52 Text: ABCABCCBA Pattern: ACBCCBB Example: Pattern: ACBCCBB A: B: C: other: D: C: And We set D = Where there is a “1”, there is a substring “C” in Pattern <<1=

53 Text: ABCABCCBA Pattern: ACBCCBB Example: Pattern: ACBCCBB A: B: C: other: D: C: And We set D = Where there is a “1”, there is a substring “CC” in Pattern <<1=

54 Text: ABCABCCBA Pattern: ACBCCBB Example: Pattern: ACBCCBB A: B: C: other: D: B: And We set D = Where there is a “1”, there is a substring “BCC” in Pattern <<1=

55 Text: ABCABCCBA Pattern: ACBCCBB Example: Pattern: ACBCCBB A: B: C: other: D: A: And So, we can say that there is no prefix of Pattern which is equal to the suffix of the window. There is no substring “ABCC” in Pattern.

56 Text: ABCABCCBA Pattern: ACBCCBB Example: We can shift Pattern as above.

57 We give another example: Text: ABCABCABA Pattern: CABBCAB,∑={A,B,C} Pattern: CABCCAB A: B: C: other: D: A: And D= <<1 =

58 Text: ABCABCABA Pattern: CABBCAB,∑={A,B,C,D} Pattern: CABCCAB A: B: C: other: D: C: And We know that “CA” is a substring of the pattern which starts from position 1 in pattern, and this means that “CA” is a prefix of the pattern. D= <<1 =

59 Text: ABCABCABA Pattern: CABBCAB,∑={A,B,C,D} Pattern: CABCCAB A: B: C: other: D: B: And So, we know “BCA” is a substring of the pattern. D= <<1 =

60 Text: ABCABCABA Pattern: CABBCAB,∑={A,B,C,D} Pattern: CABCCAB A: B: C: other: D: A: And There is no substring “ABCA” in Pattern.

61 Text: ABCABCABA Pattern: CABBCAB,∑={A,B,C,D} “BCA” is a the longest suffix of “ABCABCA” which is also a substring of pattern, but the longest prefix of the pattern which is equal to the suffix of the window is “CA”.

62 We take an example of the whole algorithm.

63 We use “read” to store the suffix of the sliding in the text which we have already read and use “pre-temp” for storing the suffix of the current read which is also a prefix of the pattern.

64 Example: P:ATATAT:AGATACGATATATAC Preprocessing: A:10101 B= T: *: 00000

65 Example: P:ATATAT:AGATACGATATATAC Preprocessing: A:10101 B= T: *: Initial: D:11111 read : empty pre-temp : empty

66 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:11111 Reading A T:AGATACGATATATAC We set pre-temp = “A” which is a prefix of the pattern. read : A pre-temp : empty

67 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:11111 Reading A D =10101<< 1 =01010 T:AGATACGATATATAC read : A pre-temp : A

68 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:01010 Reading T T:AGATACGATATATAC pre-temp : A read : TA

69 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:01010 Reading T T:AGATACGATATATAC pre-temp : A read : TA D =01010<< 1 =10100

70 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:10100 Reading A We set pre-temp=“ATA” which is a prefix of the pattern. T:AGATACGATATATAC read : ATA pre-temp : A

71 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:10100 Reading A T:AGATACGATATATAC read : ATA pre-temp : ATA pre-temp : A D =10100<< 1 =01000

72 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:01000 Reading G T:AGATACGATATATAC read : GATA pre-temp : ATA

73 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:00000 T:AGATACGATATATAC read : GATA pre-temp : ATA We find that “ATA” is the longest suffix of “AGATA” which is also a prefix of the pattern. P:ATATA

74 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:00000 T:AGATACGATATATAC read : GATA pre-temp : ATA P: ATATA So, we can shift as above.

75 Example: P:ATATA Preprocessing: A:10101 B= T: *: Initial: D:11111 Then we reset D=11111, read=empty and pre-temp =empty. T:AGATACGATATATAC read : empty pre-temp : empty P: ATATA

76 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:11111 Reading G T:AGATACGATATATAC read : G pre-temp : empty

77 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:00000 There is no substring “G” in the pattern. T:AGATACGATATATAC read : G pre-temp : empty P: ATATA

78 Example: P:ATATA Preprocessing: A:10101 B= T: *: Initial: D:11111 T:AGATACGATATATAC So, we can shift the length of P to the right. And we reset D=11111, read=empty and pre-temp =empty. P: ATATA read : empty pre-temp : empty

79 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:11111 T:AGATACGATATATAC Reading A read : A pre-temp : empty We set pre-temp=“A” which is a prefix of the pattern.

80 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:11111 T:AGATACGATATATAC Reading A read : A pre-temp : A D =10101<< 1 =01010

81 read : AT Example: P:ATATA Preprocessing: A:10101 B= T: *: D:01010 T:AGATACGATATATAC Reading T pre-temp : A D =01010<< 1 =10100

82 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:01010 T:AGATACGATATATAC Reading A pre-temp : A read : ATA We set pre-temp=“ATA” which is a prefix of the pattern.

83 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:10100 T:AGATACGATATATAC Reading A pre-temp : A read : ATA D =10100<< 1 =01000

84 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:01000 T:AGATACGATATATAC Reading T pre-temp : ATA read : TATA D =01000<< 1 =10000

85 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:10000 T:AGATACGATATATAC Reading A pre-temp : ATA read : ATATA

86 Example: P:ATATA Preprocessing: A:10101 B= T: *: D:10000 T:AGATACGATATATAC pre-temp : ATA read : ATATA We find “ATATA” which is the longest prefix of the pattern which is equal to the suffix of the window with length m, so an exact match occurs. P: ATATA

87 Example: P:ATATA Preprocessing: A:10101 B= T: *: Initial: D:10000 T:AGATACGATATATAC pre-temp : ATA read : ATATA “ATA” is a longest suffix of “ATATA” which is equal to the suffix of the window of T besides the full pattern. P: ATATA

88 Example: P:ATATA Preprocessing: A:10101 B= T: *: Initial: D:10000 T:AGATACGATATATAC pre-temp : ATA read : ATATA P: ATATA So, we can shift as above.

89 Example: P:ATATA Preprocessing: A:10101 B= T: *: Initial: D:11111 T:AGATACGATATATAC pre-temp : empty read : empty P: ATATA Repeat above steps, until the window slides out of Text.

90 We give an extreme example to show the worst case of the algorithm.

91 Example: P:AAAAAT:AAAAAAAA Preprocessing: A:11111 B= *: Initial: D:11111 read : empty pre-temp : empty

92 Example: P:ATATA Preprocessing: A:11111 B= *: D:11111 Reading A T:AAAAAAAA read : A pre-temp : empty

93 Example: P:ATATA Preprocessing: A:11111 B= *: D:11111 Reading A T:AAAAAAAA read : A pre-temp : A D =11111<< 1 =11110

94 Example: P:ATATA Preprocessing: A:11111 B= *: D:11110 Reading A T:AAAAAAAA read : AA pre-temp : A

95 Example: P:ATATA Preprocessing: A:11111 B= *: D:11110 Reading A T:AAAAAAAA read : AA pre-temp : AA D =11110<< 1 =11100

96 Example: P:ATATA Preprocessing: A:11111 B= *: D:11100 Reading A T:AAAAAAAA read : AAA pre-temp : AA

97 Example: P:ATATA Preprocessing: A:11111 B= *: D:11100 Reading A T:AAAAAAAA read : AAA pre-temp : AAA D =11100<< 1 =11000

98 Example: P:ATATA Preprocessing: A:11111 B= *: D:11000 Reading A T:AAAAAAAA read : AAAA pre-temp : AAA

99 Example: P:ATATA Preprocessing: A:11111 B= *: D:11000 Reading A T:AAAAAAAA read : AAAA pre-temp : AAAA D =11000<< 1 =10000

100 Example: P:ATATA Preprocessing: A:11111 B= *: D:10000 Reading A T:AAAAAAAA read : AAAAA pre-temp : AAAA We find “AAAAA” which is the longest prefix of the pattern which is equal to the suffix of the window with length m, so an exact match occurs.

101 Example: P:ATATA Preprocessing: A:11111 B= *: D:11111 T:AAAAAAAA read : empty pre-temp : empty

102 Example: P:ATATA Preprocessing: A:11111 B= *: D:11111 Reading A T:AAAAAAAA read : A pre-temp : empty

103 Example: P:ATATA Preprocessing: A:11111 B= *: D:11111 Reading A T:AAAAAAAA read : A pre-temp : A D =11111<< 1 =11110

104 Example: P:ATATA Preprocessing: A:11111 B= *: D:11110 Reading A T:AAAAAAAA read : AA pre-temp : A

105 Example: P:ATATA Preprocessing: A:11111 B= *: D:11110 Reading A T:AAAAAAAA read : AA pre-temp : AA D =11110<< 1 =11100

106 Example: P:ATATA Preprocessing: A:11111 B= *: D:11100 Reading A T:AAAAAAAA read : AAA pre-temp : AA

107 Example: P:ATATA Preprocessing: A:11111 B= *: D:11100 Reading A T:AAAAAAAA read : AAA pre-temp : AAA D =11100<< 1 =11000

108 Example: P:ATATA Preprocessing: A:11111 B= *: D:11000 Reading A T:AAAAAAAA read : AAAA pre-temp : AAA

109 Example: P:ATATA Preprocessing: A:11111 B= *: D:11000 Reading A T:AAAAAAAA read : AAAA pre-temp : AAAA D =11000<< 1 =10000

110 Example: P:ATATA Preprocessing: A:11111 B= *: D:10000 Reading A T:AAAAAAAA read : AAAAA pre-temp : AAAA We find “AAAAA” which is the longest prefix of the pattern which is equal to the suffix of the window with length m, so an exact match occurs.

111 Time Complexity: If the length of the text is n and the length of pattern is m, the time complexity of this algorithm is O(mn) in the worst case.

112 Reference: M.Crochemore, A.Czumaj, L.Gasieniec, S.Jarominek, T.Lecroq, W.Plandowski, and W.Rytter. Speeding up two string matching algorithms. Algorithmica, 12(4/5): ,1994. G.Navarro and M.Raffinot. Fast and flexible string matching by combining bit-parallelism and Suffix automata. ACM Journal of Experimental Algorithmics,5,2000. W.I.Chang and E.L.Lawler. Sublinear approximate string matching and biological applications. Algorithmica, 12(4/5): ,1994

113 Thanks for your attention.

114 Algorithm: Preprocessing For c € ∑ Do B[c]←0 m For j € 1…m Do B[p j ]←B[p j ]|0 j-1 1 m-j Searching pos ← 0 while pos ≤ n-m Do j ← m, last ← m D ←1 m while D≠ 0 m Do D ←D & B[t pos+j ] j ←j-1 If D & 10 m-1 ≠0 m Then If j >0 Then last ← j Else report an occurrence at pos+1 End of if D ←D<<1 End of while pos ←pos + last End of while