Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Tuned Boyer Moore Algorithm
Space-for-Time Tradeoffs
Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu
1 The MaxSuffix-Matching Algorithm On maximal suffixes and constant-space versions of KMPalgorithm LATIN 2002: Theoretical Informatics : 5th Latin American.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Tries Standard Tries Compressed Tries Suffix Tries.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Boyer Moore Algorithm String Matching Problem Algorithm 3 cases Searching Timing.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California,
Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen
1 The Colussi Algorithm Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen Correctness and Efficiency of Pattern Matching Algorithms Information and Computation,
1 Reverse Factor Algorithm Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen Speeding up on two string matching algorithms, Algorithmica, Vol.12, 1994, pp
1 Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng Two exact string matching algorithms using suffix to prefix rule.
1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.
1 String Matching Algorithms Based upon the Uniqueness Property Advisor : Prof. R. C. T. Lee Speaker : C. W. Lu C. W. Lu and R. C. T. Lee, 2007, String.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.
1 Two Way Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. C. Yen Two-way string-matching Journal of the ACM 38(3): , 1991 Crochemore M., Perrin.
1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
1 KMP algorithm Advisor: Prof. R. C. T. Lee Reporter: C. W. Lu KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal.
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
1 Rules in Exact String Matching Algorithms 李家同. 2 The Exact String Matching Problem: We are given a text string and a pattern string and we want to find.
1 The Galil-Giancarlo algorithm Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang On the exact complexity of string matching: upper bounds, SIAM Journal.
The Zhu-Takaoka Algorithm
Reverse Colussi algorithm
Backward Nondeterministic DAWG Matching Algorithm
1 Boyer and Moore Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen A fast string searching algorithm. Communications of the ACM. Vol. 20 p.p ,
Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee
Algorithms and Data Structures. /course/eleg67701-f/Topic-1b2 Outline  Data Structures  Space Complexity  Case Study: string matching Array implementation.
1 Turbo-BM Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen Deux méthodes pour accélérer l'algorithme de Boyer-Moore, Théorie des Automates et Applications.,
The Galil-Giancarlo algorithm
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Advanced Algorithm Design and Analysis (Lecture 3) SW5 fall 2004 Simonas Šaltenis E1-215b
Exact String Matching Algorithms: A Survey Mehreen Ali, Hina Naz Khan, Shumaila Sayyab, Nadeem Iftikhar Department of Bio-Science Mohammad Ali Jinnah University,
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
MCS 101: Algorithms Instructor Neelima Gupta
Information Retrieval CSE 8337 Spring 2005 Simple Text Processing Material for these slides obtained from: Data Mining Introductory and Advanced Topics.
String Searching 2 of 2. String search Simple search –Slide the window by 1 t = t +1; KMP –Slide the window faster t = t + s – M[s] –Never recheck the.
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
15-853:Algorithms in the Real World
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
Source : Practical fast searching in strings
COMP261 Lecture 20 String Searching 2 of 2.
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
13 Text Processing Hongfei Yan June 1, 2016.
Boyer and Moore Algorithm
Boyer and Moore Algorithm
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Chapter 7 Space and Time Tradeoffs
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
Knuth-Morris-Pratt Algorithm.
Chap 3 String Matching 3 -.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Sequences 5/17/ :43 AM Pattern Matching.
2019/5/14 New Shift table Algorithm For Multiple Variable Length String Pattern Matching Author: Punit Kanuga Presenter: Yi-Hsien Wu Conference: 2015.
Presentation transcript:

Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan Alpha skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64 Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

The Exact String Matching Problem: We are given a text string T of length n and a pattern string P of length m and we want to find of all occurrences of P in T. Example: Input: There are two occurrences of P in T as shown below: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Output: 2, 10

The Alpha Skip Search Algorithm is an improvement of the Skip Search Algorithm. The Skip Search Algorithm uses Rule 2, the substring matching rule and Rule 4, two window rule.

Rule 2: The Substring Matching Rule For any substring u in T, find a nearest u in P which is to the left of it. If such an u in P exists, move P such then the two u’s match; otherwise, we may define a new partial window.

Rule 2-2: 1-Suffix Rule (A Special Version of Rule 2) Consider the 1-suffix x. We may apply Rule 2-2 now.

Rule 4: Two Window Rule T = C G C A C G G T A C C T T A C G G T P = C No prefix of P = a suffix of W1. No suffix of P = a prefix of W2. C G C A C G G T w3 w4 A C C T T A C G C T T A Matched!

The Skip Search Algorithm The Skip Search Algorithm uses Rule 2-2 together with Rule 4 in a very clever way. Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 G C A T C G C A G A G A G T A T A C A G T A C G T : P : 0 1 2 3 4 5 6 7 G C A G A G A G 0 1 2 3 4 5 6 7 G C A G A G A G the length of two window The length of the pattern is m. The length of two window which is a wide window is 2m-1.

G C A T C G C A G A G A G T A T A C A G T A C G T : P : Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 G C A T C G C A G A G A G T A T A C A G T A C G T : P : 0 1 2 3 4 5 6 7 G C A G A G A G 0 1 2 3 4 5 6 7 G C A G A G A G 0 1 2 3 4 5 6 7 G C A G A G A G The length of two window is 2m-1. A C G T (6,4,2) (1) (7,5,3,0) φ

G C A T C G C A G A G A G T A T A C A G T A C G T : Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 G C A T C G C A G A G A G T A T A C A G T A C G T : The length of two window is 2m-1. A C G T (6,4,2) (1) (7,5,3,0) φ

G C A T C G C A G A G A G T A T A C A G T A C G T : P : Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 G C A T C G C A G A G A G T A T A C A G T A C G T : P : 0 1 2 3 4 5 6 7 G C A G A G A G The length of two window is 2m-1. A C G T (6,4,2) (1) (7,5,3,0) φ

The Skip Search Algorithm uses a very special version of Rule 2 The Skip Search Algorithm uses a very special version of Rule 2. In it, the substring is limited to one character. Later, in alpha skip algorithm, it uses a substring whose length may be longer than 1 and a wide window with length 2m-L is used.

We assume that the size of the alphabet Σ of the text and pattern is σ We assume that the size of the alphabet Σ of the text and pattern is σ. In the preprocessing phase, we first use a formula to determine L and then find all substrings in pattern P whose length is L. The information about where the substrings are location in P is stored in a trie. In the searching phase, we use the information which is stored in trie to compare text T with pattern P.

Preprocessing phase If logσm > 1, L = logσm where σ is the size of the alphabet and m is the length of pattern P; otherwise L=1. Example: trie a b T = aaaababbababbbbbbaabababababbac P = ababbaba σ=3, m=8 L= logσm = log38 = 1 [7,5,2,0] [6,4,3,1] In this case, the σ is 3 and the length of pattern is 8, so that L is 1, that is, the limit of the length of substring is 1.

Every trie’s leaf stores decreasing numbers of position of pattern P. Example: T : a a a a b a b b a b a b b b b b b a a b a b a b a b a b b a 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 P : a b a b b a b a 0 1 2 3 4 5 6 7 a b b a b σ= 2, m = 8 L = logσm = log28 = 3 a b b a [5,0] [2] [4,1] [3]

Trie Example: root a b b a b a b b a [5,0] [2] [4,1] [3] P : a b a b b a b a 0 1 2 3 4 5 6 7 root a b b a b a b b a [5,0] [2] [4,1] [3]

a a b a b b b a a a b a b a b b b b a a b a b a a b a b a b b b P : a b a b b a b a 0 1 2 3 4 5 6 7 root a a b a b b b a a [0] P : a b a b b a b a 0 1 2 3 4 5 6 7 a b a b a b b b b a a b a b a a b a b a b b b [0] [1] [0] [2] [1] [0] [2] [1] [3]

a b a b b a b a b a b P : a b a b b a b a [5,0] [4,1] [2] [3] [0] [2] 0 1 2 3 4 5 6 7 a b a b [5,0] [4,1] [2] [3] b a b a b a b [0] [2] [4,1] [3]

We use a wide window with length 2m-L. Example: T : a a a a b a b b a b a b b b b b b a a b a b a b a b a b b a 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 This is a wide window with length 2m-L= 2*8-3=13. P : a b a b b a b a 0 1 2 3 4 5 6 7 σ= 2, m = 8 L = logσm = log28 = 3

T = aaaababbababbbbbbaabababababba P = ababbaba Example: a b [5,0] [4,1] [2] [3] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 P = ababbaba 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 ababbaba Match!

T = aaaababbababbbbbbaabababababba ababbaba [5,0] [4,1] [2] [3] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 ababbaba No bbb in P Match! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 ababbaba No aab in P Match! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 ababbaba Match!

T = aaaababbababbbbbbaabababababba ababbaba [5,0] [4,1] [2] [3] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 ababbaba 0 1 2 3 4 5 6 7 ababbaba Match! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 T = aaaababbababbbbbbaabababababba 0 1 2 3 4 5 6 7 ababbaba 0 1 2 3 4 5 6 7 ababbaba Match! 0 1 2 3 4 5 6 7 ababbaba

Time complexity: preprocessing phase in O(m) time and space complexity; searching phase in O(mn) time complexity;

References [BM77]    A Fast String Searching Algorithm , Boyer, R. S. and Moore, J. S. , Communication of the ACM , Vol. 20 , 1977 , pp. 762-772 . [HS91]    Fast String Searching , Hume, A. and Sundy, D. M. , Software, Practice and Experience , Vol. 21 , 1991 , pp. 1221-1248 . [MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stefan J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp. 589-600 . [MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, 1994. [KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing, Vol. 6, No.2, 1977, pp 323-350 . [T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92 , No.1, 1992, pp 119-144 . [T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience archive, Vol. 28, No.5, 1998, pp 561-568 [T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experience archive, Vol. 22, No.10, 1992, pp. 879-884 . [G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol. 3, 1994, pp. 243 .