Download presentation
Presentation is loading. Please wait.
Published byKalevi Lattu Modified over 6 years ago
1
Approximate Matching of Run-Length Compressed Strings
Algorithmica (2003) Veli M¨akinen, Gonzalo Navarro, and Esko Ukkonen
2
Run-Length encoding aaabb (a,3),(b,2)
Edit Distance on Run-Length Compressed Strings Extending to Weighted Edit Distance Approximate Searching Improving a Greedy Algorithm for LCS
3
Part1: An O(mn’+m’n) Algorithm for the Levenshtein Distance
String A=a1a2 · · · am compressed length m’ String B=b1b2 · · · bn compressed length n’ Levenshtein distance, DL(A , B) di, j = min(di-1, j + 1, di, j-1 + 1, di-1, j-1 + if ai = bj then 0 else 1) DID(A, B) di, j = min(di-1, j + 1, di, j-1 + 1, di-1, j-1 + if ai = bj then 0 else ∞) Use Dynamic Programming
4
Relationship between DID and LCS
2 ×|LCS(A, B)| = m+n -DID(A, B) m + n = 2 ×|LCS(A, B)|+ x + y DID(A, B) = x + y
5
Notations
6
Known: Top and Left border Goal: Right and Button border
Equal letter box:
7
Different letter box: Observation: consecutive cells in the (dij) matrix differ at most by one
8
Algorithm:
9
Time Complexity of the Algorithm
10
Part2: Extending to Weighted Edit Distance
11
Which one is correct? or
12
path(d, r ) = Cs min(d, r ) + Cd max(d - r, 0) + Ci max(r - d, 0),
s-t (q,0) s-q (s,t) t path(d, r ) = Cs min(d, r ) + Cd max(d - r, 0) + Ci max(r - d, 0), r r Cs Cs d Cs d Ci Cd d<r d=r d>r
13
How to evaluate min value in constant time
The problem is, path is not a constant any more +Cs-Ci (s1,t1) (s3,t3) (s2,t2) +Cs-Cd (s4,t4)
14
Part3: Approximate Searching
Find all approximate occurrence of A(short pattern) in B(long string) Let all d0,j=0 and find all dm,j≦k More efficient approach — evaluate only the first m columns in each long run
15
Time Complexity Short run in B with length r≦m: O(m’r+m)
Long run: O(m’m+m+m) Total time complexity is O(n’m’m+R), R = number of occurence
16
Part4: Improving a Greedy Algorithm for LCS
Basic idea: Fill the only corner of the boxes Different letter box: ←x→ +s +t
17
Equal letter box: Recursively tracing an optimal path
Time complexity of tracing a path is O(m’+n’) The algorithm takes O(m’n’(m’+n’))
18
Analysis of Time Complexity
Observation: each cell in the borders of the boxes can be visited only once Also achieve O(m’n+n’m) bound Time complexity is O(min(m’n’(m’+n’), m’n+n’m)) Space complexity is O(m’n’)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.