Download presentation
Presentation is loading. Please wait.
1
1Carnegie Mellon University
Manuscript on bioRxiv We also have a Poster LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties Hongyi Xin1, Jeremie Kim1, Sunny Nahar1,4, Can Alkan3 and Onur Mutlu1,2 1Carnegie Mellon University 2ETH Zürich 3Bilkent University 4Google
2
Approximate String Matching Problem (ASM)
A fundamental problem in Bioinformatics DNA and protein sequence mapping and comparisons Computes a similarity score between two strings Differences include mismatches as well as indels Common scoring schemes Edit-distance (mismatch and indels have same penalty score) Affine gapping (gap opening is more penalized than gap extension)
3
The Landau-Vishkin (1989) Algorithm
C T G A C T G Dynamic-programming algorithm for edit- distance Diagonally oriented Find out the furthest nodes reachable at e edit-distance Use nodes at e to compute the furthest nodes at e+1 edit-distance Procedure (3 steps) Pick the furthest starting position at e (initiation) Traverse diagonally until hitting a mismatch (elongation) Furthest node reachable at e Notify neighbor diagonals with new potential furthest starting positions at e+1 (termination) A C T G A C T G 1 2 1 2 1 2 1 2 1 2 LV uses this secondary observation and proposed a different DP method. In a nutshell, LV computes the deepest …. and use that to compute …. Here is an example of the procedure…
4
Benefit of the Landau-Vishkin Algorithm
C T G Compared against Ukkonen’s banded algorithm Less bookkeeping Only stores starting positions and elongation length Compute fewer nodes Only the start and end positions in the diagonal Less compute per node Elongation only checks for the next mismatch Termination updates the furthest node for next edit iteration Only one termination per edit per diagonal Less work! (although still being O(k*m) ) A C T G 1 2 - 3 1 2 3 4 5 1 2 3 Compared to the canonical DP method, LV has 2 advantages: First, it considers fewer elements in the matrix as we can see in our example. Second, it only check scores with neighbor diagonals during cost transitions. These elements are highlighted with underlines. For the rest of the elements, they only check for mismatches.
5
Limitations Landau-Vishkin was proposed for edit-distance
Correctness for custom gap penalties has not been proven
6
Our Contribution Prove that Landau-Vishkin also works for custom gap penalties Including affine-gapping
7
Problem Setup Convert the process of computing the DP table into traversing a graph Nodes in the table vertices Horizontal, vertical and diagonal transitions edges Mismatches, indels weights on the edges Goal: find a path from start to destination with min edge weights A C T G A C T G 1 2 3 4 5
8
Problem Statement of LEAP
Toad swims in a swimming pool with hurdles in it leap stride swim
9
Three Theorems Delaying a leap in the path until the next hurdle does not add cost to the path There exists an optimal path where the toad only leaps before a hurdle Landau-Vishkin algorithm finds such optimal path
10
Pseudo-Proof 1 Blue path has the same cost as the red path
No change in leaps, no additional strides
11
Pseudo-Proof 2 Must exist an optimal path where leaps are right before hurdles Iteratively use theorem 1
12
Pseudo-Proof 3 Can be proved using induction
Energy cost monotonically increases along the path Check out our paper/poster for further details!
13
Further Optimizations
Elongation finds the next mismatch Elongation can be sped up using the de Bruijn sequence technique Think of it as a hashing function No need to iteratively check for letter matches. Use bit-parallel operations instead! A C T G A C T G - 1 XOR 1 - Shift 1 - 2’s complement 1 - AND 1 * De Bruijn sequence 4 4 lookup
14
Results Two implementations
LEAP (without De Bruijn sequence optimization) LEAP-BV (with optimization) Compared to 3 state-of-the- art implementations Myer’s bit vector SeqAn NW-SIMD Canonical LV (equivalent to LEAP for Levenshtein dist.) Takeaway: LEAP-BV attains as much as 7.4x speedup against Myers’ bit-vector and 32x speedup against NW-SIMD
15
Conclude We prove that the Landau-Vishkin algorithm can be extended to support custom gap penalties We further optimized the Landau-Vishkin method We achieved up to 7.4x speed up over the state-of-the-art implementation
16
Special thanks to: Acknowledgement
17
Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.