Approximate Matching of Run-Length Compressed Strings

Slides:



Advertisements
Similar presentations
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Advertisements

Final presentation Final presentation Tandem Cyclic Alignment.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
SUFFIX TREES From exact to approximate string matching. 17 dicembre 2003 Luca Bortolussi.
Chapter 7 Dynamic Programming 7.
§ 8 Dynamic Programming Fibonacci sequence
Space Efficient Alignment Algorithms and Affine Gap Penalties
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez June 24, 2005.
Dynamic Programming Solving Optimization Problems.
CSE 421 Algorithms Richard Anderson Lecture 19 Longest Common Subsequence.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Distance Functions for Sequence Data and Time Series
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
By Makinen, Navarro and Ukkonen. Abstract Let A and B be two run-length encoded strings of encoded lengths m’ and n’, respectively. we will show an O(m’n+n’m)
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.
CSE 421 Algorithms Richard Anderson Lecture 19 Longest Common Subsequence.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
Honors Track: Competitive Programming & Problem Solving Optimization Problems Kevin Verbeek.
Dynamic Programming Louis Siu What is Dynamic Programming (DP)? Not a single algorithm A technique for speeding up algorithms (making use of.
1 Chapter 6 Dynamic Programming. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, optimizing some local criterion. Divide-and-conquer.
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Efficient Algorithms for Some Variants of the Farthest String Problem Chih Huai Cheng, Ching Chiang Huang, Shu Yu Hu, Kun-Mao Chao.
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Introduction to Algorithms Jiafen Liu Sept
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
Core String Edits, Alignments, and Dynamic Programming.
Example 2 You are traveling by a canoe down a river and there are n trading posts along the way. Before starting your journey, you are given for each 1
Dynamic Programming for the Edit Distance Problem.
CS502: Algorithms in Computational Biology
Advanced Algorithms Analysis and Design
Approximate Matching of Run-Length Compressed Strings
Seminar on Dynamic Programming.
Distance Functions for Sequence Data and Time Series
JinJu Lee & Beatrice Seifert CSE 5311 Fall 2005 Week 10 (Nov 1 & 3)
Chapter 8 Dynamic Programming.
CSCE 411 Design and Analysis of Algorithms
Randomized Algorithms CS648
Dynamic Programming General Idea
Memory Efficient Longest Common Subsequence
Richard Anderson Lecture 20 LCS / Shortest Paths
Richard Anderson Lecture 19 Longest Common Subsequence
Cyclic string-to-string correction
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
CSE 589 Applied Algorithms Spring 1999
Data Structure and Algorithms
Longest Common Subsequence
Lecture 8. Paradigm #6 Dynamic Programming
Dynamic Programming-- Longest Common Subsequence
Dynamic Programming General Idea
Introduction to Algorithms: Dynamic Programming
Bioinformatics Algorithms and Data Structures
Longest Common Subsequence
Richard Anderson Lecture 19 Memory Efficient Dynamic Programming
Richard Anderson Lecture 19 Longest Common Subsequence
Advanced Analysis of Algorithms
Linear space LCS algorithm
Longest Common Subsequence
Richard Anderson Lecture 19 Memory Efficient Dynamic Programming
Richard Anderson Lecture 20 Space Efficient LCS
Memory Efficient Dynamic Programming / Shortest Paths
Presentation transcript:

Approximate Matching of Run-Length Compressed Strings Algorithmica (2003) Veli M¨akinen, Gonzalo Navarro, and Esko Ukkonen

Run-Length encoding aaabb (a,3),(b,2) Edit Distance on Run-Length Compressed Strings Extending to Weighted Edit Distance Approximate Searching Improving a Greedy Algorithm for LCS

Part1: An O(mn’+m’n) Algorithm for the Levenshtein Distance String A=a1a2 · · · am compressed length m’ String B=b1b2 · · · bn compressed length n’ Levenshtein distance, DL(A , B) di, j = min(di-1, j + 1, di, j-1 + 1, di-1, j-1 + if ai = bj then 0 else 1) DID(A, B) di, j = min(di-1, j + 1, di, j-1 + 1, di-1, j-1 + if ai = bj then 0 else ∞) Use Dynamic Programming

Relationship between DID and LCS 2 ×|LCS(A, B)| = m+n -DID(A, B) m + n = 2 ×|LCS(A, B)|+ x + y DID(A, B) = x + y

Notations

Known: Top and Left border Goal: Right and Button border Equal letter box:

Different letter box: Observation: consecutive cells in the (dij) matrix differ at most by one

Algorithm:

Time Complexity of the Algorithm

Part2: Extending to Weighted Edit Distance

Which one is correct? or

path(d, r ) = Cs min(d, r ) + Cd max(d - r, 0) + Ci max(r - d, 0), s-t (q,0) s-q (s,t) t path(d, r ) = Cs min(d, r ) + Cd max(d - r, 0) + Ci max(r - d, 0), r r Cs Cs d Cs d Ci Cd d<r d=r d>r

How to evaluate min value in constant time The problem is, path is not a constant any more +Cs-Ci (s1,t1) (s3,t3) (s2,t2) +Cs-Cd (s4,t4)

Part3: Approximate Searching Find all approximate occurrence of A(short pattern) in B(long string) Let all d0,j=0 and find all dm,j≦k More efficient approach — evaluate only the first m columns in each long run

Time Complexity Short run in B with length r≦m: O(m’r+m) Long run: O(m’m+m+m) Total time complexity is O(n’m’m+R), R = number of occurence

Part4: Improving a Greedy Algorithm for LCS Basic idea: Fill the only corner of the boxes Different letter box: ←x→ +s +t

Equal letter box: Recursively tracing an optimal path Time complexity of tracing a path is O(m’+n’) The algorithm takes O(m’n’(m’+n’))

Analysis of Time Complexity Observation: each cell in the borders of the boxes can be visited only once Also achieve O(m’n+n’m) bound Time complexity is O(min(m’n’(m’+n’), m’n+n’m)) Space complexity is O(m’n’)