Download presentation
Presentation is loading. Please wait.
Published byNathan Russell Modified over 9 years ago
1
String Searching Algorithms Problem Description Given two strings P and T over the same alphabet , determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively. [Adapted from G.Plaxton]
2
String Searching Algorithms Some applications [Adapted from K.Wayne]
3
String Searching Algorithms Trivial Approach - Algorithm SimpleMatcher(string P, string T) n length[T] m length[P] for s 0 to n m do if P[1...m] = T[s+1... s+m] then print s T(n,m) = (n m + 1) m (1) = (n m)
4
String Searching Algorithms Rabin-Karp Algorithm - Idea Pattern - P[1 m] Target -T[1 n] p = P[m] + 10 P[m–1] + 100 P[m–2] + + 10 m P[1] for s = 0 to n – m: t s = T[s+m] + 10 T[s+m–1] + 100 T[s+m–2] + + 10 m T[s+1] P matches T at position i if and only if p = t i
5
String Searching Algorithms Rabin-Karp Algorithm - Idea p = P[m] + 10 P[m–1] + 100 P[m–2] + + 10 m P[1] p = P[m] + 10 (P[m–1] + 10 (P[m–2] + + 10 (P[2] + 10 P[1]) ))) t 0 = T[m] + 10 T[m–1] + 100 T[m–2] + + 10 m T[1] t 0 = T[m] + 10 (T[m–1] + 10 (T[m–2] + + 10 (T[2] + 10 T[1]) ))) t s+1 = 10 (t s – 10 m–1 T[s+1]) + T[s+m+1]
6
String Searching Algorithms Rabin-Karp Algorithm - Example 1 T = 289462340372392345 P = 234 p = 234 t = 289, 894, 946, 462, 623, 234, 340, 403, 37, 372, 723, 239, 392, 923, 234, 345
7
String Searching Algorithms Rabin-Karp Algorithm - Problem What to do, if p is too large to be stored as integer data type? (The simplest) solution: Use p mod q and t i mod q, instead of p and t i If p mod q t i mod q no match is possible at position i If p mod q = t i mod q we have to check for match explicitly
8
String Searching Algorithms Rabin-Karp Algorithm - Algorithm RabinKarpMatcher(string P, string T, integer d, integer q) n length[T]; m length[P] h d m–1 mod q p 0; t 0 0 for i 1 to m do p (d p + P[i]) mod q t 0 (d t 0 + T[i]) mod q for s 0 to n – 1 do if p = t s then if P[1...m] = T[s+1... s+m] then print s if s < n – 1 then t s (d (t s – T[s+1] h) + T[s+m+1]) mod q
9
String Searching Algorithms Rabin-Karp Algorithm - Example 2 T = 289462340372392345 P = 234 q = 5 p = 234 p mod q = 4 t = 289, 894, 946, 462, 623, 234, 340, 403, 37, 372, 723, 239, 392, 923, 234, 345 t mod q = 4, 4, 1, 2, 3, 4, 0, 3, 2, 2, 3, 4, 2, 3, 4, 0
10
String Searching Algorithms Rabin-Karp Algorithm - generalization Instead of calculating numbers mod q, we can use an arbitrary hash function
11
String Searching Algorithms Rabin-Karp Algorithm - Complexity [Adapted from T.Ralphs]
12
String Searching Algorithms Rabin-Karp Algorithm - Complexity Worst case: T(n,m) = (n m + 1) m (1) = (n m) Average case: number of correct matches - v number of incorrect matches - n/q T(n,m) = (n + m) + (m(v + n/q)) If v is small and m q, then T(n,m) = (n + m)
13
String Searching Algorithms Two dimensional pattern matching [Adapted from M.Crochemore,T.Lecroq]
14
String Searching Algorithms Two dimensional pattern matching [Adapted from M.Crochemore,T.Lecroq]
15
String Searching Algorithms Knuth-Morris-Pratt Algorithm - Idea [Adapted from A.Cawsey]
16
String Searching Algorithms Knuth-Morris-Pratt Algorithm - Some history [Adapted from K.Wayne]
17
String Searching Algorithms Knuth-Morris-Pratt Algorithm - Idea T = gadji beri bimba glandridi P = gadjama gadjiberibimbaglandridi gadjama gadjiberibimbaglandridi gadjama
18
String Searching Algorithms Knuth-Morris-Pratt Algorithm - Idea T = gadjama gramma berida P = gaga gagjamagrammaberida gaga gagjamagrammaberida gaga
19
String Searching Algorithms Knuth-Morris-Pratt Algorithm - Idea For each position q = 1, , m in P compute the number of positions by which pattern can be advanced, if a mismatch has been previously detected in q-th position.
20
String Searching Algorithms KMP - Algorithm KnuthMorrisPrattMatcher(string P, string T) n length[T] m length[P] PrefixFunction(P) q 0 for i 1 to n do while q > 0 & P[q+1] T[i] do q [q] if P[q+1] = T[i] then q q + 1 if q = m then print i m + 1 q [q]
21
String Searching Algorithms KMP - Prefix Function A << B - A is prefix of B, e.g. ab << abacae A >> B - A is suffix of B, e.g. ae >> abacae P s - initial substring of P of length s s P - terminal substring of P of length s Prefix function: : {1,2, , m} {0, 1, 2, , m–1} [q] = max {k : k > P q }
22
String Searching Algorithms KMP - Prefix Function [q] = max {k : k > P q } [q] is the length of the longest prefix of P that is a proper suffix of P q. If a mismatch is detected at position q, then pattern can be advanced by q – [q] positions.
23
String Searching Algorithms KMP - Prefix Function - Example [q] = max {k : k > P q } [q] is the length of the longest prefix of P that is a proper suffix of P q. P = abracadabra = 0,0,0,1,0,1,0,1,2,3,4
24
String Searching Algorithms KMP - Prefix Function - Algorithm PrefixFunction(string P) m length[P] [1] 0 k 0 for q 2 to m do while k > 0 & P[k+1] P[q] do k [k] if P[k+1] = P[q] then k k + 1 [q] k return
25
String Searching Algorithms KMP - Complexity KnuthMorrisPrattMatcher(string P, string T) n length[T] m length[P] PrefixFunction(P) q 0 for i 1 to n do while q > 0 & P[q+1] T[i] do q [q] if P[q+1] = T[i] then q q + 1 if q = m then print i m + 1 q [q] (n) times In worst case (m) times Thus T(n,m) = O(nm)...
26
String Searching Algorithms KMP - Complexity T(n,m) = T P (m) + n T While (m) = O(n m)? q value are increased at most n times always q 0 Thus, q can not be decreased more than n times, i.e. while loop can be executed no more than n times. T(n,m) = T P (m) + n T While (m) = T P (m) + (n) KnuthMorrisPrattMatcher(string P, string T) n length[T] m length[P] PrefixFunction(P) q 0 for i 1 to n do while q > 0 & P[q+1] T[i] do q [q] if P[q+1] = T[i] then q q + 1 if q = m then print i m + 1 q [q]
27
String Searching Algorithms KMP - Prefix Function - Correctness [q] = max {k : k > P q } [q] is the length of the longest prefix of P that is a proper suffix of P q. We define : 0 [q] = q, i+1 [q] = [ i [q]] *[q] = {q, [q], 2 [q], , t [q] = 0}
28
String Searching Algorithms KMP - Prefix Function - Correctness 0 [q] = q, i+1 [q] = [ i [q]] *[q] = {q, [q], 2 [q], , t [q] = 0} Lemma Let P be a pattern of length m with prefix function . Then, for q = 1,2, , m we have *[q] = {k : P k >> P q }
29
String Searching Algorithms KMP - Prefix Function - Correctness Lemma Let P be a pattern of length m with prefix function . For q = 1,2, , m, if [q] > 0, then [q] – 1 *[q–1].
30
String Searching Algorithms KMP - Prefix Function - Correctness Corollary Let P be a pattern of length m with prefix function . For q = 2, , m: For q = 2, , m we define E q–1 *[q–1] by E q–1 = {k : k *[q–1] & P[k+1] = P[q]} [q] = 0,if E q–1 = 1 + max{k E q–1 },if E q–1
31
String Searching Algorithms KMP - Prefix Function - Correctness We consecutively compute [1], [2], , [m] [1] = 0 For k > 1: if P[k] = P[ [k–1] + 1], then [k] = [k–1] + 1, else, if P[k] = P[ [ [k–1]] + 1], then [k] = [ [k–1]] + 1, else, if P[k] = P[ [ [ [k–1]]] + 1], then [k] = [ [ [k–1]]] + 1,
32
If P[k+1] = P[ [k]+1], then we can obtain [k+1] [k] + 1. For given value [k+1] it is easy to see that [k] [k+1] 1. Thus: [k+1] = [k] + 1 iff P[k+1] = P[ [k]+1]. [k+1] < [k] + 1 iff P[k+1] P[ [k]+1]. [ [k]] [k] [ [k]] [k] P[k+1] P[ [k]+1]
33
If P[k+1] P[ [k]+1] and P[k+1] = P[ [ [k]]+1] then we can obtain [k+1] [ [k]] + 1. For given value [k+1] it is easy to see that [ [k]] [k+1] 1. Thus: [k+1] = [ [k]] + 1 iff P[k+1] P[ [k]+1] & P[k+1] = P[ [ [k]]+1] [k+1] < [ [k]] + 1 iff P[k+1] P[ [k]+1] & P[k+1] P[ [ [k]]+1].................................................................................. [ [k]] [k] [ [k]] [k] P[k+1] P[ [ [k]]+1]
34
String Searching Algorithms KMP - Prefix Function - Complexity T P (m) = const + m T While (m) = O(m 2 ) k value are increased at most n times always k 0 Thus, k can not be decreased more than n times, i.e. while loop can be executed no more than n times. T P (m) = const + m T While (m) = (m) PrefixFunction(string P) m length[P] [1] 0 k 0 for q 2 to m do while k > 0 & P[k+1] P[q] do k [k] if P[k+1] = P[q] then k k + 1 [q] k return
35
String Searching Algorithms KMP - Complexity T(n,m) = T P (m) + n T While (m) = T P (m) + (n) = (m) + (n) = (m + n) PrefixFunction(string P) m length[P] [1] 0 k 0 for q 2 to m do while k > 0 & P[k+1] P[q] do k [k] if P[k+1] = P[q] then k k + 1 [q] k return KnuthMorrisPrattMatcher(string P, string T) n length[T] m length[P] PrefixFunction(P) q 0 for i 1 to n do while q > 0 & P[q+1] T[i] do q [q] if P[q+1] = T[i] then q q + 1 if q = m then print i m + 1 q [q]
36
String Searching Algorithms Boyer-Moore Algorithm - Idea 1 T = gadji beri bimba glandridi P = lonni gadjiberibimbaglandridi lonni gadjiberibimbaglandridi lonni Bad character heuristic
37
String Searching Algorithms Boyer-Moore Algorithm - Idea 2 T = gadji beri bimba glandridi P = ajiji gadjiberibimbaglandridi ajiji gadjiberibimbaglandridi ajiji Good suffix heuristic
38
String Searching Algorithms Boyer-Moore - Bad Character Function Bad character function: : {0,1,2, , m} [s] = max {k : P[k] = s} (if such k exists) [s] = 0(otherwise) [Adapted from M.Goodrich, R.Tamassia]
39
String Searching Algorithms Boyer-Moore - Bad Character Function BadCharacterFunction(string P, set ) m length[P] for a do [a] 0 for j 1 to m do [P[j]] j return T B (m,| |) = (m + | |)
40
String Searching Algorithms Boyer-Moore - Suffix Function Suffix function: : {1, 2, , m} {1, 2, , m} [j] = m – max {k : k > P k P K >> j P} [Adapted from R.Lee, C.Lu]
41
String Searching Algorithms Boyer-Moore - Suffix Function [Adapted from R.Lee, C.Lu]
42
String Searching Algorithms Boyer-Moore - Suffix Function [Adapted from R.Lee, C.Lu]
43
String Searching Algorithms Boyer-Moore - Suffix Function [Adapted from R.Lee, C.Lu]
44
String Searching Algorithms Boyer-Moore - Suffix Function SuffixFunction(string P) m length[P] PrefixFunction(P) P’ Reverse(P); ’ PrefixFunction(P’) for j 0 to m do [j] m – [m] for l 1 to m do j m – ’[l] if [j] > l – ’[l] then [j] l – ’[l] return T S (m) = (m)
45
String Searching Algorithms Boyer-Moore Algorithm - Algorithm BoyerMooreMatcher(string P, string T, set ) n length[T] m length[P] LastOccurenceFunction(P,m, ) GoodSuffixFunction(P,m) s 0 while s n m do j m while j > 0 & P[j] = T[s + j] do j j 1 if j = 0 then print s s s + [0] else s s + max( [j], j [T[s + j]])
46
String Searching Algorithms Boyer-Moore Algorithm - Complexity T B (m,| |) = (m + | |) T S (m) = (m) T(n,m,| |) = T B (m,| |) + T S (m) + n T While (m) = = (m + | |) + (m) + O(n m) = O(| | + n m)? It can be shown that T(n,m,| |) = (| | + n + m)
47
String Searching Algorithms Boyer-Moore Algorithm - Complexity It can be shown that: T(n,m,| |) = (| | + n m) using only bad character rule T(n,m,| |) = (| | + n + m) using only good suffix rule, if the pattern does not occur in text T(n,m,| |) = (| | + n m) using only good suffix rule, if the pattern does occur in text
48
String Searching Algorithms Boyer-Moore Algorithm - Complexity With Galil's modification: T(n,m,| |) = (| | + n + m) using only good suffix rule There is also a similar Apostolico-Giancarlo algorithm that achieves (| | + n + m) time bound (which is much easier to prove) On average the number of character comparisons is n/m (for large | | )
49
String Searching Algorithms Algorithms - Complexity comparison [Adapted from H.Løvengreen]
50
String Searching Algorithms Algorithms - Efficiency comparison n=5000 [Adapted from I.Spence]
51
String Searching Algorithms Complexity - Lower Bound Theorem (Rivest) Any string searching algorithm has worst-case time complexity T(n,m) = (m + n)
52
String Searching Algorithms Suffix Trees - The problem Theorem (Rivest) Any string searching algorithm has worst-case time complexity T(n,m) = (m + n) Despite this, we probably can do better! (Well, for slightly different problem...) [Adapted from P.Kilpeläinen]
53
String Searching Algorithms Suffix Trees [Adapted from P.Kilpeläinen]
54
String Searching Algorithms Suffix Trees [Adapted from P.Kilpeläinen]
55
String Searching Algorithms Suffix Trees - Example [Adapted from P.Kilpeläinen]
56
String Searching Algorithms Suffix Trees - Do they always exist? [Adapted from P.Kilpeläinen]
57
String Searching Algorithms Suffix Trees - Application to string matching [Adapted from P.Kilpeläinen]
58
String Searching Algorithms Suffix Trees - Construction [Adapted from P.Kilpeläinen]
59
String Searching Algorithms Suffix Trees - Construction [Adapted from P.Kilpeläinen]
60
String Searching Algorithms Suffix Trees - Construction - Example [Adapted from P.Kilpeläinen]
61
String Searching Algorithms Suffix Trees - Construction - Example [Adapted from P.Kilpeläinen]
62
String Searching Algorithms Suffix Trees - Construction - Example [Adapted from P.Kilpeläinen]
63
String Searching Algorithms Suffix Trees - Construction - Complexity [Adapted from P.Kilpeläinen]
64
String Searching Algorithms Suffix Trees - Compact representation [Adapted from P.Kilpeläinen]
65
String Searching Algorithms Suffix Trees - Compact representation - Example [Adapted from P.Kilpeläinen]
66
String Searching Algorithms Suffix Trees - Some history [Adapted from P.Kilpeläinen]
67
String Searching Algorithms Suffix Trees - Ukkonen's algorithm [Adapted from P.Kilpeläinen]
68
String Searching Algorithms Suffix Trees - Implicit trees [Adapted from P.Kilpeläinen]
69
String Searching Algorithms Suffix Trees - Implicit trees [Adapted from P.Kilpeläinen]
70
String Searching Algorithms Suffix Trees - Implicit trees [Adapted from P.Kilpeläinen]
71
String Searching Algorithms Suffix Trees - String paths [Adapted from P.Kilpeläinen]
72
String Searching Algorithms Suffix Trees - Ukkonen's algorithm [Adapted from P.Kilpeläinen]
73
String Searching Algorithms Suffix Trees - Extensions [Adapted from P.Kilpeläinen]
74
String Searching Algorithms Suffix Trees - Extensions [Adapted from P.Kilpeläinen]
75
String Searching Algorithms Suffix Trees - Extensions - Example [Adapted from P.Kilpeläinen]
76
String Searching Algorithms Suffix Trees - Ukkonen's algorithm - Complexity [Adapted from P.Kilpeläinen]
77
String Searching Algorithms Suffix Trees - Ukkonen's algorithm - Complexity [Adapted from P.Kilpeläinen]
78
String Searching Algorithms Suffix Trees - Ukkonen's algorithm - Complexity [Adapted from P.Kilpeläinen]
79
String Searching Algorithms Suffix Trees - Suffix links [Adapted from P.Kilpeläinen]
80
String Searching Algorithms Suffix Trees - Suffix links [Adapted from P.Kilpeläinen]
81
String Searching Algorithms Suffix Trees - Suffix links [Adapted from P.Kilpeläinen]
82
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
83
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
84
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
85
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
86
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
87
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
88
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
89
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
90
String Searching Algorithms Suffix Trees - Speeding up [Adapted from P.Kilpeläinen]
91
String Searching Algorithms Suffix Trees - Eliminating extensions [Adapted from P.Kilpeläinen]
92
String Searching Algorithms Suffix Trees - Single phase algorithm [Adapted from P.Kilpeläinen]
93
String Searching Algorithms Suffix Trees - Ukkonen's algorithm - Complexity [Adapted from P.Kilpeläinen]
94
String Searching Algorithms Suffix Trees - Ukkonen's algorithm - Complexity [Adapted from P.Kilpeläinen]
95
String Searching Algorithms Suffix Trees - Ukkonen's algorithm - Complexity [Adapted from P.Kilpeläinen]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.