Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel

Slides:



Advertisements
Similar presentations
Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.
Advertisements

Fundamental tools: clustering
Hidden Markov Model in Biological Sequence Analysis – Part 2
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Multiple Shape Correspondence by Dynamic Programming Yusuf Sahillioğlu 1 and Yücel Yemez 2 Pacific Graphics 2014 Computer Eng. Depts, 1, 2, Turkey.
Reducibility Class of problems A can be reduced to the class of problems B Take any instance of problem A Show how you can construct an instance of problem.
Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Structural bioinformatics
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Multiple sequence alignment
Multiple Sequence alignment Chitta Baral Arizona State University.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
Multiple Sequence Alignments
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Multiple Sequence Alignment
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Hardness Results for Problems
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Catherine S. Grasso Christopher J. Lee Multiple Sequence Alignment Construction, Visualization, and Analysis Using Partial Order Graphs.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Introduction to Profile Hidden Markov Models
Gene expression & Clustering (Chapter 10)
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Chapter 3 Computational Molecular Biology Michael Smith
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.
Multiple Sequence Alignment Colin Dewey BMI/CS 576 Fall 2015.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
GA for Sequence Alignment  Pair-wise alignment  Multiple string alignment.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
More NP-Complete and NP-hard Problems
Spectral methods for Global Network Alignment
Algorithms for Finding Distance-Edge-Colorings of Graphs
Sequence comparison: Local alignment
ICS 353: Design and Analysis of Algorithms
Intro to Alignment Algorithms: Global and Local
On the k-Closest Substring and k-Consensus Pattern Problems
Multiple Sequence Alignment
Multiple Sequence Alignment (I)
Multiple Sequence Alignment
Presentation transcript:

Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel

Multiple Sequence Alignment Quantifies similarities among [DNA, Protein] sequences Detects highly conserved motifs & remote homologues –Evolutionary insights –Transfer of annotation –Representation of protein families

Multiple Sequence Alignment Input: k sequences Output: optimal alignment –Gap infused sequences (-), one per row. –Restrictions column pattern (1) GARFIELD MET NERMAL (2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE (3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET NERMAL ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE ----GARFIELD ---AND HIS ASSOCIATE NERMAL

Multiple Sequence Alignment Input: k sequences Output: optimal alignment –Minimal width –Score function Columns summation e.g. sum of pairs (1) GARFIELD MET NERMAL (2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE (3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET NERMAL ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE ----GARFIELD ---AND HIS ASSOCIATE NERMAL

DP solves MSA –Build a score matrix k-dimensional hypercube –An alignment is a path –Time: GARFIELDANDHISASSOCIATENERMAL GARFIELDMETNERMAL num of nodes num neighbors per node GARFIELDMET NERMAL GARFIELD---ANDHISASSOCIATENERMAL

Previous Work MSA HeuristicsMSA Complexity Analysis Faster pairwise SA [Carrillo Lipman 88] MACAW [Schuler, Altschul, Lipman 91] ClustalW [Thompson et al 94] DIAlign [Werner,Morgenstern, Dress 96] T-Coffee [Notredame et al. 00] POA [Lee et al. 02] … Optimizing over the space of all possible inputs is NP hard [Jiang,Wang 94] NP hard for SP [Just 01] NP hard for SP that is a metric [Bonizzoni, Della Vedova 01] Assuming many common subsequences [Wilbur,Lipman 83] Convex/Concave score functions [Eppstein et al. 92] Exploiting compressibility of sequences [Landau Crochemore Ziv Ukelson 02] … Review : Biological Sequence Analysis [Durbin et al.]

Pairwise Restriction The “true” information: the aligned subsequences and their relative positioning Study pairwise alignment first and restrict the alignment –Time: Focus efforts on “true” tradeoffs GARFIELDMETNERMAL GARFIELDANDHISASSOCIATENERMAL

Segments Matching Graph (SMG) Sequences are partitioned into segments GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET nodes Edges: self edges between 2-equal-lengths-segments of different sequences have scores Defines allowed paths and their score

GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET

GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET

GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET Extreme paths:

GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATENERMAL ODIEANDHISASSOCIATE GARFIELDMET Extreme paths:

All paths Extreme paths Optimal paths Lemma : there is an optimal path that is extreme

GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE Improved algorithm: DP on the segments

Transitive PR-MSA More restrictions: Transitivity Scoring function is shortest path Faster algorithms DNA sequences *no scores in SMG, only matches

Maximal Directions Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques –Defines maximal directions The shortest path can be taken over maximal directions. Pushes down the work per node

Obvious Directions GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET GARFIELD ANDHISASSOCIATE NERMAL GARFIELDNERMALMET NERMALODIEANDHISASSOCIATE GARFIELDMET Obvious : Non-Obvious : ?

Obvious Directions Lemma: Optimal path is found, even when making obvious decisions Not all nodes are relevant Work for every node increases to

Special Vertices (0,0) Straight junction Corner junction

Thank you

Special Vertices A vertex is special w.r.t vertex dominates There is a maximal-edges path between the vertices No other vertex satisfies all the above and dominates

Other pieces of information Somewhere a slide with the circle and which paths are you looking at Remember to add: –Partial order in proof of lemma 1. Remember to think: –Diagonals that are not diagonals – Overlapping streaks in first bit Non-diagonal diagonals in transitive MSAS