Download presentation
Presentation is loading. Please wait.
Published byLucas Sims Modified over 9 years ago
1
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later 15 > |T| (no occurrence of P) P output combo 4 (i.e., with shift 3) ate 12
2
Applications Text retrieval Computational biology - DNA is a one-dimensional (1-D) string of characters A’s, G’s, C’s, T’s. Searching for DNA patterns Comparing two or more DNA strings for similarities Reconstructing DNA strings from overlapping fragments. - All information for 3-D protein folding is contained in protein sequence itself and independent of the environment.
3
Sliding the Pattern Template T = b i o l o g y P = l o g i c n = 7 m = 5 b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c T[1] P[1] T[4] = P[1], T[5] = P[2], T[6] = P[3], but T[7] P[4] T[2] P[1] b i o l o g y l o g i c No match! T[3] P[1]
4
Another Example T = b i o l o g i c a l P = l o g i c n = 10 m = 5 b i o l o g i c a l l o g i c Match found! return 4.
5
The Naive Matcher Pattern: P[1..m] Text: T[1..n] Naive-String-Matcher(T, P) // find all occurrences of P in T. for s = 1 to n m +1 do if P[1.. m] = T[s.. s+m 1] then print “Pattern occurs at index” s T:T: P:P: s s+m-1 1 m
6
Time Complexity m(n m + 1) comparisons (as below) in the worst case. m chars P T 1 2 3 n m+1 n n m + 1 blocks, each requiring m comparisons Time complexity is O(mn)!
7
Finite Automaton A finite automaton consists of a finite set Q of states a start state a set A of accepting states a finite input alphabet a transition function d : Q Q. a 01 b a b start state accepting state Example 1 00 0 a b 0 1 state input transition function
8
Accepting a String a 01 b a b aabba bbabb input state sequence accepts? 010001 Yes 000100 No Always begins at the start state. Accepts a string if it ends at an accepting state after accepting all string chars. Otherwise, it rejects the string.
9
A String Matching Automaton Pattern P = a a b a 10 20 23 40 a b P a a b a input state 0 1 2 3 20 4 T = a b b a a a b a a b a state sequence 0 1 0 0 1 43210 a b b a b a b a a b Pattern occurs at indices 5 and 8! aba not rescanned due to transition 4 2 Ex. 2 2 3 42 3 4
10
Key Ideas of Automaton Matching Do not rescan chars of T that have already been examined. Slide pattern forward by more than one position if possible.
11
The Automaton Matcher Finite-Automaton-Matcher(T, d, m) n = length[T] q = 0 // current state for i = 1 to n do q = d(q, T[i]) // d function precomputed if q = m // match succeeds then print “Pattern occurs at index” i m+1 O(n) if the state transition function d is available. But computing d requires O(m | |)! // details omitted. 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.