Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku

Similar presentations


Presentation on theme: "Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku"— Presentation transcript:

1 Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Skip Search algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku

2 Skip Search algorithm is an algorithm which solves the string matching problems.
Input: a text string T of length n and a pattern string P of length m. Output: all occurrences of P in T.

3 The Skip Search algorithm consists two phases which are Processing and searching.
The Skip Search algorithm uses Rule 4(Two window rule) and Rule 2-2 (1-Suffix Rule) to do the string matching.

4 the buckets for all characters of the alphabet
Preprocessing The Preprocessing phase of the Skip Search algorithm preprocesses the pattern by computing the buckets for all characters of the alphabet. Example: Text string T=GCATCGCAGAGAGTATACAGTACG Pattern string P=GCAGAGAG the buckets for all characters of the alphabet A C G T (6,4,2) (1) (7,5,3,0) φ

5 Search phase The search phase checks what is the km-th symbol in the text string, where 1≦k ≦n/m. According the symbol to align every identical symbol in the pattern and executes matching. Note that the bucket record every symbols’ location in the pattern. Example: Text string T=aabcdbdabcabc Pattern string P=abcabc, m=6 The 6-th symbol in T is b. Then we align it by the 5-th symbol and executes matching. Then we align it by the 2-th symbol and executes matching. T=aabcdbdabcabc abcabc

6 the buckets for all characters of the alphabet
Full Example Text string T=GCATCGCAGAGAGTATACAGTACG Pattern string P=GCAGAGAG the buckets for all characters of the alphabet A C G T (6,4,2) (1) (7,5,3,0) Φ

7 GCATCGCAGAGAGTATACAGTACG
GCATCGCAGAGAGTATACAGTACG A C G T (6,4,2) (1) (7,5,3,0) φ GCAGAGAG mismatch GCAGAGAG mismatch GCAGAGAG exact match Then we check T[15]=T. Since there is no “T” in the pattern, we check T[23]=G. Then we shift pattern to align T[16…23]. GCAGAGAG

8 Time Complexity The space and time complexity of the preprocessing phase is O(m+σ)(σ is the number of alphabet.) The Skip Search algorithm has a quadratic worst case time complexity but the expected number of text character inspections is O(n).

9 References [BM77]    A Fast String Searching Algorithm , Boyer, R. S. and Moore, J. S. , Communication of the ACM , Vol. 20 , 1977 , pp.  [HS91]    Fast String Searching , Hume, A. and Sundy, D. M. , Software, Practice and Experience , Vol. 21 , 1991 , pp.  [MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stefan J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp [MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, [KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing, Vol. 6, No.2, 1977, pp [T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92 , No.1, 1992, pp [T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience archive, Vol. 28, No.5, 1998, pp [T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experience archive, Vol. 22, No.10, 1992, pp [G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol. 3, 1994, pp


Download ppt "Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku"

Similar presentations


Ads by Google