Presentation is loading. Please wait.

Presentation is loading. Please wait.

Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel

Similar presentations


Presentation on theme: "Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel"— Presentation transcript:

1 Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford

2 Multiple Sequence Alignment Quantifies similarities among [DNA, Protein] sequences Detects highly conserved motifs & remote homologues –Evolutionary insights –Transfer of annotation –Representation of protein families

3 Multiple Sequence Alignment Input: k sequences Output: optimal alignment –Gap infused sequences (-), one per row. –Restrictions column pattern (1) GARFIELD MET NERMAL (2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE (3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE ----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------

4 Multiple Sequence Alignment Input: k sequences Output: optimal alignment –Minimal width –Score function Columns summation e.g. sum of pairs (1) GARFIELD MET NERMAL (2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE (3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE ----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------

5 DP solves MSA –Build a score matrix k-dimensional hypercube –An alignment is a path –Time: GARFIELDANDHISASSOCIATENERMAL GARFIELDMETNERMAL num of nodes num neighbors per node GARFIELDMET---------------NERMAL GARFIELD---ANDHISASSOCIATENERMAL

6 Previous Work MSA HeuristicsMSA Complexity Analysis Faster pairwise SA [Carrillo Lipman 88] MACAW [Schuler, Altschul, Lipman 91] ClustalW [Thompson et al 94] DIAlign [Werner,Morgenstern, Dress 96] T-Coffee [Notredame et al. 00] POA [Lee et al. 02] … Optimizing over the space of all possible inputs is NP hard [Jiang,Wang 94] NP hard for SP [Just 01] NP hard for SP that is a metric [Bonizzoni, Della Vedova 01] Assuming many common subsequences [Wilbur,Lipman 83] Convex/Concave score functions [Eppstein et al. 92] Exploiting compressibility of sequences [Landau Crochemore Ziv Ukelson 02] … Review : Biological Sequence Analysis [Durbin et al.]

7 Pairwise Restriction The “true” information: the aligned subsequences and their relative positioning Study pairwise alignment first and restrict the alignment –Time: Focus efforts on “true” tradeoffs GARFIELDMETNERMAL GARFIELDANDHISASSOCIATENERMAL

8 Segments Matching Graph (SMG) Sequences are partitioned into segments GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET nodes Edges: self edges between 2-equal-lengths-segments of different sequences have scores Defines allowed paths and their score

9 GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET

10 GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET

11 GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET Extreme paths:

12 GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET Extreme paths:

13 All paths Extreme paths Optimal paths Lemma : there is an optimal path that is extreme

14 GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE Improved algorithm: DP on the segments

15 Transitive PR-MSA More restrictions: Transitivity Scoring function is shortest path Faster algorithms DNA sequences *no scores in SMG, only matches

16 Maximal Directions Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques –Defines maximal directions The shortest path can be taken over maximal directions. Pushes down the work per node

17 Obvious Directions GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET Obvious : Non-Obvious : ?

18 Obvious Directions Lemma: Optimal path is found, even when making obvious decisions Not all nodes are relevant Work for every node increases to

19 Special Vertices (0,0) Straight junction Corner junction

20 Thank you

21 Special Vertices A vertex is special w.r.t vertex dominates There is a maximal-edges path between the vertices No other vertex satisfies all the above and dominates

22 Other pieces of information Somewhere a slide with the circle and which paths are you looking at Remember to add: –Partial order in proof of lemma 1. Remember to think: –Diagonals that are not diagonals – Overlapping streaks in first bit Non-diagonal diagonals in transitive MSAS


Download ppt "Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel"

Similar presentations


Ads by Google