Download presentation
Presentation is loading. Please wait.
1
String matching
2
Exact String Matching Input: Two strings T[1…n] and P[1…m], containing symbols from alphabet . Example: = {A,C,G,T} T[1…12] = “CAGTACATCGAT” P[1..3] = “AGT” Goal: find all “shifts” 0≤s ≤n-m such that T[s+1…s+m] = P
3
Simple Algorithm for s ← 0 to n-m Match ← 1 for j ← 1 to m
if T[s+j]≠P[j] then Match ← 0 exit loop if Match=1 then output s
4
Analysis Running time of the simple algorithm: Worst-case: O(nm)
Average-case (random text): O(n) (expectation) Ts = time spend on checking shift s (the number of comparisons until 1st mismatch) E[Ts] < 2 (why) E[SsTs] = SsE[Ts] = O(n)
5
Approximate String Matching
Input: Two strings T[1…n] and P[1…m], containing symbols from alphabet . Goal: find all “shifts” 0≤s ≤n-m such that T[s+1…s+m] is “highly similar” to P
6
Two common metrics for comparing strings
Given two strings T[1…n] and P[1…m]: Hamming distance: the number of substitutions between the two strings. n=m Edit distance: the number of edit operations (including substitutions, insertions, and deletions) to transform one string to the other string.
7
Simple Algorithm for Hamming Distance
for s ← 0 to n-m Mismatch ← 0 for j ← 1 to m if T[s+j]≠P[j] then Mismatch ← Mismatch+1 If Mismatch > threshold exit loop if Mismatch<=threshold then output s
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.