Download presentation
Presentation is loading. Please wait.
Published byMelanie Small Modified over 8 years ago
1
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford
2
Multiple Sequence Alignment Quantifies similarities among [DNA, Protein] sequences Detects highly conserved motifs & remote homologues –Evolutionary insights –Transfer of annotation –Representation of protein families
3
Multiple Sequence Alignment Input: k sequences Output: optimal alignment –Gap infused sequences (-), one per row. –Restrictions column pattern (1) GARFIELD MET NERMAL (2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE (3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE ----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------
4
Multiple Sequence Alignment Input: k sequences Output: optimal alignment –Minimal width –Score function Columns summation e.g. sum of pairs (1) GARFIELD MET NERMAL (2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE (3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE ----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------
5
DP solves MSA –Build a score matrix k-dimensional hypercube –An alignment is a path –Time: GARFIELDANDHISASSOCIATENERMAL GARFIELDMETNERMAL num of nodes num neighbors per node GARFIELDMET---------------NERMAL GARFIELD---ANDHISASSOCIATENERMAL
6
Previous Work MSA HeuristicsMSA Complexity Analysis Faster pairwise SA [Carrillo Lipman 88] MACAW [Schuler, Altschul, Lipman 91] ClustalW [Thompson et al 94] DIAlign [Werner,Morgenstern, Dress 96] T-Coffee [Notredame et al. 00] POA [Lee et al. 02] … Optimizing over the space of all possible inputs is NP hard [Jiang,Wang 94] NP hard for SP [Just 01] NP hard for SP that is a metric [Bonizzoni, Della Vedova 01] Assuming many common subsequences [Wilbur,Lipman 83] Convex/Concave score functions [Eppstein et al. 92] Exploiting compressibility of sequences [Landau Crochemore Ziv Ukelson 02] … Review : Biological Sequence Analysis [Durbin et al.]
7
Pairwise Restriction The “true” information: the aligned subsequences and their relative positioning Study pairwise alignment first and restrict the alignment –Time: Focus efforts on “true” tradeoffs GARFIELDMETNERMAL GARFIELDANDHISASSOCIATENERMAL
8
Segments Matching Graph (SMG) Sequences are partitioned into segments GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET nodes Edges: self edges between 2-equal-lengths-segments of different sequences have scores Defines allowed paths and their score
9
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET
10
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET
11
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET Extreme paths:
12
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET Extreme paths:
13
All paths Extreme paths Optimal paths Lemma : there is an optimal path that is extreme
14
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE Improved algorithm: DP on the segments
15
Transitive PR-MSA More restrictions: Transitivity Scoring function is shortest path Faster algorithms DNA sequences *no scores in SMG, only matches
16
Maximal Directions Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques –Defines maximal directions The shortest path can be taken over maximal directions. Pushes down the work per node
17
Obvious Directions GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET Obvious : Non-Obvious : ?
18
Obvious Directions Lemma: Optimal path is found, even when making obvious decisions Not all nodes are relevant Work for every node increases to
19
Special Vertices (0,0) Straight junction Corner junction
20
Thank you
21
Special Vertices A vertex is special w.r.t vertex dominates There is a maximal-edges path between the vertices No other vertex satisfies all the above and dominates
22
Other pieces of information Somewhere a slide with the circle and which paths are you looking at Remember to add: –Partial order in proof of lemma 1. Remember to think: –Diagonals that are not diagonals – Overlapping streaks in first bit Non-diagonal diagonals in transitive MSAS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.