CSC 212 – Data Structures Lecture 36: Pattern Matching
Suffixes and Prefixes “I am the Lizard King!” PrefixesSuffixes I I I a I am … I am the Lizard Kin I am the Lizard King I am the Lizard King! ! g! ng! ing! … am the Lizard King! am the Lizard King! I am the Lizard King!
KMP Algorithm Asymptotically optimal algorithm Means cannot do better in big-Oh terms Compares from left-to-right So like BruteForce, not Boyer-Moore But shifts pattern intelligently Relies on a Key Insight™ Preprocess pattern to avoid redundant comparisons Always go forward; Never, ever look back
The KMP Algorithm x j.. abaab..... abaaba abaaba Do not repeat these comparisons Need to resume comparing here Shifting P here ensures these two entries match
KMP Failure Function Assume P[j] ≠ T[k]. Need rank in P to next compared to T[k] E.g., How should we shift P after a miss? Uses failure function, F(j-1), One value defined for each rank in P Specifies rank j in P must restart comparisons
Computing Failure Function For rank j, find longest proper prefix and suffix of P[0...j] For speed, store failure function in array Unlike Boyer-Moore, works w/infinite alphabets Takes at most O(2m) = O(m) time Similar algorithm computes failure function & KMP
Computing Failure Function Algorithm KMPFailureFunction(String P) F[0] 0 i 1 j 0 while i < P.length() if P[i] = P[j] // So, P[0…j] = P[i - j…i] F[i] j + 1 // Record the length of this prefix/suffix i i + 1// Advance a character and see if still matches j j + 1 else if j > 0 // No match, need to restart our computation j F[j - 1] // Skip over longest prefix that is also a suffix else F[i] 0// No prefix of P[0…i] is a suffix of P[0…i] i i + 1// Move to the next character return F
KMP Failure Function j01234 P[j]P[j]abaaba F(j)F(j)
The KMP Algorithm Algorithm KMPMatch(String T, String P) F KMPFailureFunction(P) i 0 j 0 while i < T.length() if P[j] = T[i] // So, P[0…j] = T[i - j…i] if j = P.length() - 1 return i - j i i + 1// Advance and see if still a match j j + 1 else if j > 0 // No match, but a prefix of P[0…j-1] matches j F[j - 1] // So skip past longest prefix that is a suffix else i i + 1// Nothing to reuse, move to the next character return F
Example j01234 P[j]P[j]abacab F(j)F(j)00101
The KMP Algorithm In each pass of KMPMatch, either: P[j]=T[i] i increases by one, or P[j]≠T[i] & j > 0 P shifted right by at least 1 P[j]≠T[i] & j = 0 i increases by 1 So at most 2 n iterations of loop KMPMatch takes O(2n) = O(n) time KMPFailureFunction needs O(m) time Thus, algorithm runs in O(m n) time
Your Turn Get back into groups and do activity
Before Next Lecture… Finish up assignments Start thinking about questions for Final