Download presentation
Presentation is loading. Please wait.
1
String-Matching Algorithms (UNIT-5)
ADVANCED ALGORITHMS String-Matching Algorithms (UNIT-5)
2
Let there is an array of text, T[1..n] of length ‘n’.
String Matching : Let there is an array of text, T[1..n] of length ‘n’. Let there is a pattern of text, P[1..m] of length ‘m’. Let T and P are drawn from a finite alphabet . Here P and T are called ‘Strings of Characters’. Here, the pattern P occurs with shift s in text T, if, ≤ s ≤ n – m and T[s+1..s+m] = P[1..m] i.e., for 1 ≤ j ≤ m, T[s+j] = P[j] If P occurs with shift s in T, it is a VALID SHIFT. Other wise, we call INVALID SHIFT.
3
The String-matching Problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T. Ex-1 : Let text T : a b c a b a a b c a b a c Let pattern P : a b a a Find the number of valid shifts and ‘s’ values. Answer : Only one Valid Shift. s = 3 The symbol * (read as ‘sigma-star’) is the set of all finite-length strings formed using characters from the alphabet .
4
The zero-length string is called ‘Empty String’.
denoted by ‘ɛ’, also belongs to *. The length of the string ‘x’ is denoted |x|. The concatenation of two strings x and y, denoted xy has length |x| + |y|. A string ω is a prefix of a string x, denoted as ω ⊏ x, if x = ω y for some string y ∊ *. Here, note that if ω ⊏ x, then |w| ≤ |x|. Similarly, a string ω is a suffix of a string x, denoted as ω ⊐ x, if x = y ω for some string y ∊ *. Here, note that if ω ⊐ x, then |w| ≤ |x|.
5
Ex-2 : Let abcca is a string.
Here, ab ⊏ abcca and cca ⊐ abcca Note-1: The empty string ɛ is both a suffix and prefix of every string. Note-2 : Both prefix and suffix are transitive relations. Lemma : Suppose that x, y, and z are strings such that x ⊐ z and y ⊐ z. Here, if |x| ≤ |y| then x ⊐ y. if |x| ≥ |y| then y ⊐ x. if |x| = |y| then x = y.
6
2. The Naïve String-matching Algorithm :
This algorithm finds all valid shifts using a loop that checks the condition P[1..m] = T[s+1..s+m] for each of the n –m + 1 possible values of s. NAÏVE-STRING-MATCHER(T,P) n = T.length m = P.length 3. for s = 0 to n – m 4. if P[1..m] = = T[s+1..s+m] 5 Print “Pattern occurs with shift s.”
7
Ex-3 : Let T = acaabc & P = aab
Find the value of s. Answer : The value of s = 2 Ex-4 : Let T = P = 0001 Find the values of ‘s’. Answer : The value of s = 1 & 5 & 11 Ex-5 : Let T = an and P = am Answer : The values of s = 0 to n – m i.e., s contains n – m + 1 values
8
ts = p iff T[s+1..s+m] = P[1..m] s is a valid shift iff ts = p
3. The Rabin-Karp Algorithm : Let = {0, 1, 2, … , 9} Here each character is a decimal digit. d = | | = 10. The string represents 31,415 in radix-d notation. Let there is a text T[1..n]. Let there is a pattern P[1..m]. Let p denote the corresponding decimal value. Let ts is the decimal value of the length –m substring T[s+1..s+m], for s = 0,1,2,..n-m. ts = p iff T[s+1..s+m] = P[1..m] s is a valid shift iff ts = p
9
ts+1 = 10 (ts – 10m-1 T[s+1 ]) + T[s+m+1].
Now, the value of p can be computed using Horner’s rule as follows: p = P[1..m] = P[1] P[2] P[3]…P[m] So, p = P[m] + 10 (P[m-1] + 10 (P[m-2] + … + 10 (P[2] + 10 P[1])…)). Similarly, one can compute t0 as follows : t0 = T[m] + 10 (T[m-1] + 10 (T[m-2] + … + 10 (T[2] + 10 T[1])…)). Here we can compute ts+1 from ts as follows : ts+1 = 10 (ts – 10m-1 T[s+1 ]) + T[s+m+1].
10
ts+1 = (d (ts – T[s+1] h ) + T[s+m+1]) mod q.
Ex-6 : Let m = 5, ts = 31415 Let T[s+m+1] = 2 So, RHS = 10 (ts – 10m-1 T[s+1 ]) + T[s+m+1] = 10 (31415 – ) + 2 = = 14152 Let q is defined so that dq fits in one computer word and the above recurrence equation can be written as : ts+1 = (d (ts – T[s+1] h ) + T[s+m+1]) mod q. Here, h dm-1 (mod q) i.e., h is the first digit in the m-digit text window.
11
The test ts p (mod q) is a fast heuristic test to rule out invalid shifts s.
For any value of ‘s’, if ts p (mod q) is TRUE and P[1..m] = T[s+1..s+m] is FALSE then ‘s’ is called SPURIOUS HIT. Note : a) If ts p (mod q) is TRUE then ts = p may be TRUE b) If ts p (mod q) is FALSE then ts ≠ p is definitely TRUE
12
RABIN-KARP-MATCHER (T,P,d,q)
n = T.length m = P.length h = dm-1 (mod q) p = 0 5 t0 = 0 6 for i = 1 to m // preprocessing 7 p = (dp + P[i]) mod q 8 t0 = (d t0 + T[i]) mod q 9 for s = 0 to n-m //matching 10 if (p = = ts ) if (P[1..m] = T[s+1..s+m]) 12 print “Pattern occurs with shift” s 13 if (s < n – m) ts+1 = (d (ts – T[s+1] h ) + T[s+m+1]) mod q.
13
Ex-7 : Let T = Let P = Here n = 19 m = 5 d = 10 q = 13 h = 3 p = 0 t0 = 0 First for statement : i = 1 : p = 3 t0 = 2 i = 2 : p = 5 t0 = 10 i = 3 : p = 2 t0 = 1 i = 4 : p = 8 t0 = 6 i = 5 : p = 7 t0 = 8
14
s p ts T p = = ts s < n – m ts+1 0 7 8 23590 FALSE TRUE 9
Second for statement : s p ts T p = = ts s < n – m ts+1 FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE S = TRUE VM 8 FALSE TRUE 4 FALSE TRUE 5
15
Hence, there is only ONE VALID MATCH at s = 6
s p ts T p = = ts s < n – m ts+1 FALSE TRUE FALSE TRUE FALSE TRUE TRUE S = 12 TRUE SH 9 FALSE TRUE FALSE FALSE Hence, there is only ONE VALID MATCH at s = 6 there is only ONE SPURIOUS HIT at s = 12
16
The Knuth-Morris-Pratt Algorithm :
This algorithm is meant for ‘Pattern Matching’. Here, the prefix function for a pattern encapsulates knowledge about how the pattern matches against shifts of itself. Ex-8 : Let the Text String T & Pattern P is : T : b a c b a b a b a c a c a c a P : a b a b a c a
17
COMPUTE-PREFIX-FUNCTION (P) :
1. m = P.length Let [1..m] be a new array [1] = 0 k = 0 for q = 2 to m while k > 0 and P[k+1] P[q] k = [k] 8. if P[k+1] = = P[q] k = k + 1 [q] = k 11. return
18
Ex-8 (contd…) P : a b a b a c a INIT : m = 7 [1] = 0 k = 0 Step : q = 2 : Here, k = 0 & P[k+1] = a & P[q] = b So, while : FALSE & if : FALSE Hence, [2] = 0 Step : q = 3 : Here, k = 0 & P[k+1] = a & P[q] = a So, while : FALSE & if : TRUE k = 1 Hence, [3] = 1
19
Step : q = 4 : Here, k = 1 & P[k+1] = b & P[q] = b So, while : FALSE & if : TRUE k = 2 Hence, [4] = 2 Step : q = 5 : Here, k = 2 & P[k+1] = a & P[q] = a So, while : FALSE & if : TRUE k = 3 Hence, [5] = 3 Step : q = 6 : Here, k = 3 & P[k+1] = b & P[q] = c So, while : TRUE k = 1 ( = [3] ) & k = 1 & P[k+1] = b & P[q] = c while : TRUE k = 0 ( = [1] ) if : FALSE ([P[1] = = P[6]) Hence, [6] = 0
20
Step : q = 7 : Here, k = 0 & P[k+1] = a & P[q] = a So, while : FALSE & if : TRUE (P[1] = = P[7] ) k = 1 Hence, [7] = 1 Hence the array is as follows : q : : Hence, this returns the value : 1
21
6. while q > 0 and P[q+1] T[i] 7. q = [q] 8. if P[q+1] = = T[i]
KMP-MATCHER (T,P) : 1. n = T.length m = P.length = COMPUTE-PREFIX-FUNCTION(P) q = 0 5. for i = 1 to n while q > 0 and P[q+1] T[i] 7. q = [q] if P[q+1] = = T[i] 9. q = q + 1 if q = = m print ”Pattern occurs with shift” i - m q = [q]
22
i q C1 C2 wh q= [q] if q++ if print q= [q]
Ex-8 contd.. KMP-Matcher (T,P) : INIT : n = m = 7 = q = 0 i q C1 C2 wh q= [q] if q++ if print q= [q] F T F F F F F F T q = 1 F T T T q = F F F T F F F F F F T q = 1 F
23
6 1 T F F --- T q=2 F ---- ---- 7 2 T F F --- T q=3 F ---- ----
i q C1 C2 wh q= [q] if q if print q= [q] T F F T q= F T F F T q= F T F F T q= F T F F T q= F T F F T q= F T F F T q= F shift 4 q=1 T T T q= F F F F F T q= F T T T q= F F F F F T q= F
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.