SMAWK. REVISE Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) +  (S[i], T[j]),

Slides:



Advertisements
Similar presentations
Lecture 19: Parallel Algorithms
Advertisements

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Jiang Chen Columbia University Ke Yi HKUST. Motivation  Uncertain data naturally arises in many applications: sensor data, fuzzy data integration, data.
Approximations of points and polygonal chains
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Advanced Algorithm Design and Analysis (Lecture 6) SW5 fall 2004 Simonas Šaltenis E1-215b
By Cruchemor, Landau and Ziv-ukelson. Abstract We present an O(n²/log n) algorithm for computing the optimal global alignment value of two strings,of.
The Divide-and-Conquer Strategy
Introduction to Algorithms Jiafen Liu Sept
CS4413 Divide-and-Conquer
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
1 Parallel Algorithms II Topics: matrix and graph algorithms.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Maths for Computer Graphics
11-1 Elements of Dynamic Programming For dynamic programming to be applicable, an optimization problem must have: 1.Optimal substructure –An optimal solution.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Dynamic Programming Reading Material: Chapter 7..
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date:
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
1 Parallel Algorithms III Topics: graph and sort algorithms.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Class 2: Basic Sequence Alignment
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Charalampos (Babis) E. Tsourakakis SODA th January ‘11 SODA '111.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
CS 3343: Analysis of Algorithms
5 -1 Chapter 5 The Divide-and-Conquer Strategy A simple example finding the maximum of a set S of n numbers.
Project 2 due … Project 2 due … Project 2 Project 2.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices Maxime Crochemore Gad M. Landau Michal Ziv-Ukelson.
Lectures on Greedy Algorithms and Dynamic Programming
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Divide and Conquer Strategy
Algorithmics - Lecture 121 LECTURE 11: Dynamic programming - II -
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
Algorithms for the Maximum Subarray Problem Based on Matrix Multiplication Authours : Hisao Tamaki & Takeshi Tokuyama Speaker : Rung-Ren Lin.
A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)
Table of Contents Matrices - Definition and Notation A matrix is a rectangular array of numbers. Consider the following matrix: Matrix B has 3 rows and.
Lecture 6 Sorting II Divide-and-Conquer Algorithms.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Lecture 3: Parallel Algorithm Design
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
PRAM Algorithms.
Intro to Alignment Algorithms: Global and Local
Dynamic Programming-- Longest Common Subsequence
Danny Z. Chen and Haitao Wang University of Notre Dame Indiana, USA
Elements of Dynamic Programming
Presentation transcript:

SMAWK

REVISE

Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) +  (S[i], T[j]), V(i-1,j) +  (S[i], -), V(i,j-1) +  (-, T[j]) }

DIST and OUT matrix (Revise) O g a gca G I DIST matrixOUT matrix I (input borders) Block – sub-sequences “acg”, “ag” I0I △△ I1I △ I2I I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ - -- - I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O max col

Compute O without explicit OUT O g a gca G I DIST matrix I (input borders) Block – sub-sequences “acg”, “ag” I0I △△ I1I △ I2I I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O SMAWK

Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays. Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.Convex condition: M[a,c]  M[b,c]  M[a,d]  M[b,d]. 2.Concave condition: M[a,c]  M[b,c]  M[a,d]  M[b,d].

SMAWK Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.

Presentation Outline What is Monge arrays? – Monge  Totally monotone Why DIST alignment matrix is Monge arrays? How to compute totally monotone arrays efficiently? – SMAWK Given a totally monotone arrays Compute all columns maxima in O(n)

MONGE AND TOTALLY MONOTONE PROPERTIES

Monge A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.M[a, c] + M[b, d]  M[a, d] + M[b, c] 2.M[a, c] + M[b, d]  M[a, d] + M[b, c] cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x……

Totally monotone A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.Convex condition: M[a,c]  M[b,c]  M[a,d]  M[b,d] 2.Concave condition: M[a,c]  M[b,c]  M[a,d]  M[b,d] Monge  Totally monotone cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x……

Intuition Monge: Quadrangle inequality: a c b d x z cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x…… M[a, c] + M[b, d]  M[a, d] + M[b, c]

History Computational Geometry All nearest neighbor problem – Shamos and Hoey proved  (n log n) in 1975 All farthest neighbor problem – F.P.Reparata proved  (n log n) in 1977 All farthest neighbor problem in convex polygon – Lee and Preparata proved O(n) in 1978

SMAWK Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987 Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.

DIST AND OUT MATRICES

Assumption – row and column maxima of a totally monotone matrix can be computed in O(n) Why DIST and OUT matrices of the alignment problem is totally monotone?

DIST and OUT matrix (Revise) O g a gca G I DIST matrixOUT matrix I (input borders) Block – sub-sequences “acg”, “ag” I0I △△ I1I △ I2I I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ - -- - I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O max col

Compute O without explicit OUT O g a gca G I DIST matrix I (input borders) Block – sub-sequences “acg”, “ag” I0I △△ I1I △ I2I I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O SMAWK

DIST is Monge O g a gca G I

DIST is Monge array Monge M[a, c] + M[b, d]  M[a, d] + M[b, c] Totally monotone by Concave condition: M[a,c]  M[b,c]  M[a,d]  M[b,d]

Comment on this approach Advantages – Easy to parallelize – Easy to combine Disadvantages – Need to compute/keep more information

Applications Parallel sequence alignment – O(log m log n) time – Using O(m n / log m) processors (CREW PRAM) Best non-overlapping alignment score – O(n 2 log 2 n) time Tandem approximate repeat – O(n 2 log n) time Common Substring Alignment

SMAWK

[a b] [c d] Find all column mimimas of the following totally monotone arrays b < d  a < c b = d  a  c

[a b] [c d] a > c  b > d a = c  b  d Find all column mimimas of the following totally monotone arrays b < d  a < c b = d  a  c

[a b] [c d] a > c  b > d a = c  b  d b < d  a < c b = d  a  c Observation 1

[a b] [c d] a > c  b > d a = c  b  d Observation 2 b < d  a < c b = d  a  c

[a b] [c d] a > c  b > d a = c  b  d SMAWK is a recursive algorithm of 2 steps – REDUCE – INTERPOLATE b < d  a < c b = d  a  c

[a b] [c d] a > c  b > d a = c  b  d SMAWK is a recursive algorithm of 2 steps – REDUCE – INTERPOLATE REDUCE removes rows INTERPOLATE removes half of the columns b < d  a < c b = d  a  c

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

REDUCE

INTERPOLATE Remove all odd indexed colums

INTERPOLATE

RECURSIVE Find all row minima

APPROXIMATE TANDEM REPEAT Application of DIST and SMAWK

Tandem repeat IRQI QLWLR QIWIR LRQL

Social City

Observation Approximate tandem repeat – With the Mid-point c – Alignments start at column c end at row c c c 0n n

4 cases – Cross column n/2 – Cross row n/2 – In side sub-triangle [0,n/2] – In side sub-triangle [n/2,n]

Algorithm 1.Find all repeats that cross – row n/2 – column n/2 2.Recursively solve the – sub-array [0..n/2, 0..n/2] – sub-array [n/2..n, n/2..n] c1c1 0 n/2c2c2 c1c1 c2c2 c3c3 c3c3

Cross column n/2 Combine – Best path from column c to (k,n/2) – Best path from (k,n/2) to row c c c 0n n n/2

Cross column n/2 Sub-problems: – DIST_col (c,n/2) [i,j] – DIST_row (c,n/2) [i,j] c1c1 0 n/2c2c2 c1c1 c2c2

Cross column n/2 DIST_col (c,n/2) [i,j] : O(n 3 ) words Encode in array of binary trees Using O(n 2 log n) words B[j,c] is a binary tree B[j,c](i) is a leaf of the tree Read an entry of DIST_col (c,n/2) [i,j] in O(log n) c1c1 0 n/2c2c2 c1c1 c2c2

Algorithm 1.Find all repeats O(n 2 logn) – cross row n/2 – column n/2 1.Recursively solve the – sub-array [0..n/2, 0..n/2] – sub-array [n/2..n, n/2..n] c1c1 0 n/2c2c2 c1c1 c2c2 c3c3 c3c3

References Aggarwal, A. and Park, J. Notes on Searching in Multidimensional Monotone Arrays. IEEE Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM. Lawrence L. Larmore. The SMAWK Algorithm. UNLV. Apostolico, A. and Atallah, M.J. and Larmore, L.L. and McFaddin, S.. Efficient Parallel Algorithms for String Editing and Related Problems. SIAM J. Comput. Landau, G.M. and Ziv-Ukelson, M. On the Common Substring Alignment Problem. J. of Algorithms