Download presentation
Presentation is loading. Please wait.
1
1 Exact Matching Charles Yan 2008
2
2 Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end of T Compare from left right until mismatch or an occurrence of P is found Shift P one place to the right O (n*m)
3
3 Speeding Up The Naïve Algorithm Shift P by more than one places at a time Skip comparisons that have been made
4
4 Preprocessing Goal: To gather the information needed for speeding up the algorithm Definitions: substring, prefix, suffix, proper prefix, proper suffix Z i : For i>1, the length of the longest substring of S that starts at i and matches a prefix of S Z-box: for any position i >1 where Z i >0, the Z-box at i starts at i and ends at i+Z i -1 r i; For every i>1, r i is the right-most endpoint of the Z- boxes that begin at or before i l i; For every i>1, l i is the left endpoint of the Z-box ends at r i
5
5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0 Z-box a a b a a b c a x a a b a a b c y r i: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 l i: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10 Preprocessing
6
6 Z-Algorithm Goal: To calculate Z i for an input string S in a linear time Starting from i=2, calculate Z 2, r 2 and l 2 For i=3; i<n; i++ In iteration k, calculate Z k, r k and l k based on Z j, r j and l j for j=2,…,k-1 For iteration k, the algorithm only need r k-1 and l k-1. Thus, there is no need to keep all r i and l i. We use r, and l to denote r k-1 and l k-1
7
7 Z-Algorithm k r l k’ r’ l’ ’’ ’’ k’=k-l+1; r’=r-l+1; = ’; = ’ k r l In iteration k: (I) if k<=r a a b a a b c a x a a b a a b c y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ’’ ’’
8
8 k r l k’ r’ l’ ’’ ’’ ’’ A) If | ’ |<| ’ |, that is, Z k’ < r-k+1, Z k = Z k’ ’’ x y y = ’= ’’; x≠y a a b a a b c a x a a b a a b c y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 ’’ ’’ ’’ ’’
9
9 Z-Algorithm k r l k’ r’ l’ ’’ ’’ ’’ B) If | ’ |>| ’ |, that is, Z k’ >r-k+1, Z k =| |, i.e., r-k+1 ’’ y ’ ’’ ’= ’’; x ≠y (because is a Z box) ’’ xx Z k =| |, i.e., r-k+1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 S: a a b a a b c a x a a b a a c d Z: 0 1 0 3 1 0 0 1 0 6 1 0 2 1 0 0 ’’ ’’ ’’ ’’ ’’
10
10 Z-Algorithm k r l k’ r’ l’ ’’ ’’ ’’ C) If | ’ |=| ’ |, that is, Z k’ =r-k+1, Z k =| |, i.e., r-k+1 ’’ y ’ ’’ = ’= ’’; x ≠y (because is a Z box) z ≠x (because ’ is a Z box) z ?? y ’’ xz Compare S[r+1,...] with S[ | | +1,…] until a mismatch occurs. Update Z k, r, and l 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 S: a a b a a e c a x a a b a a b d Z: 0 1 0 2 1 0 0 1 0 6 1 0 3 1 0 0 ’’ ’’ ’’ ’’
11
11 Z-Algorithm krl (II) if k>r Compare the characters starting at k+1 with those starting at 1. Update r, and l if necessary
12
12 Z-Algorithm Input: Pattern P Output: Z i Z Algorithm Calculate Z 2, r 2 and l 2 specifically by comparisons. R= r 2 and l=l 2 for i=3; i<n; i++ if k<=r if Z k-l+1 <r-k+1, then Z k = Z k-l+1 else if Z k-l+1 > r-k+1 Z k = r-k+1 else compare the characters starting at r+1 with those starting at | | +1. Update r, and l if necessary else Compare the characters starting at k to those starting at 1. Update r, and l if necessary
13
13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0 r : 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 l : 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10 Preprocessing
14
14 Z-Algorithm Time complexity #mismatches <= number of iterations, n #matches Let q be the number of matches at iteration k, then we need to increase r by at least q r<=n Thus total #match <=n T=O( #matches + #mismatches +#iterations)=O(n) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0 r : 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 l : 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10 #m: 0 1 0 3 0 0 0 1 0 7 0 0 0 0 0 0 0 #mis: 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1
15
15 Simplest Linear Time Exact Matching Algorithm Input: Pattern P, Text T Output: Occurrences of P in T Algorithm Simplest S=P$T, where $ is a character that do not appear in P and T For i=2; i<|S|; i++ Calculate Z i If Z i =|P|, then report that there is an occurrence of P in T starting at i-|P|-1 of T=O(|P|+|T|+1)=O(n+m)
16
16 Simplest Linear Time Exact Matching Algorithm Take only O (n) extra space Alphabet-independent linear time k r l k’ r’ l’ ’’ ’’ $
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.