Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.

Slides:



Advertisements
Similar presentations
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Advertisements

Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Problem Set 2 Solutions Tree Reconstruction Algorithms
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
4 -1 Chapter 4 The Sequence Alignment Problem The Longest Common Subsequence (LCS) Problem A string : S 1 = “ TAGTCACG ” A subsequence of S 1 :
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
Computational Genomics Lecture #3a Much of this class has been edited from Nir Friedman’s lecture which is available at Changes.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
4 - 1 Chap 4 The Sequence Alignment Problem The Sequence Alignment Problem Introduction –What, Who, Where, Why, When, How The Sequence Alignment.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Multiple alignment: heuristics
Multiple sequence alignment
Multiple Sequence alignment Chitta Baral Arizona State University.
Aligning Alignments Exactly By John Kececioglu, Dean Starrett CS Dept. Univ. of Arizona Appeared in 8 th ACM RECOME 2004, Presented by Jie Meng.
. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger.
PAM250. M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly.
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
Project Phase II Report l Due on 10/20, send me through l Write on top of Phase I report. l 5-20 Pages l Free style in writing (use 11pt font or.
Multiple Sequence Alignment
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Introduction to Bioinformatics Algorithms Multiple Alignment.
1 Seminar in Structural Bioinformatics - Multiple sequence alignment algorithms. Elya Flax & Inbar Matarasso Multiple sequence alignment algorithms.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
CS 473 All Pairs Shortest Paths1 CS473 – Algorithms I All Pairs Shortest Paths.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignments
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
Chapter 3 Computational Molecular Biology Michael Smith
CS 8833 Algorithms Algorithms Dynamic Programming.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Tutorial 5 Phylogenetic Trees.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Multiple Alignment.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Multiple String Comparison – The Holy Grail. Why multiple string comparison? It is the most critical cutting-edge toοl for extracting and representing.
Multiple sequence alignment (msa)
Bioinformatics Algorithms and Data Structures
Sequence Alignment 11/24/2018.
Computational Biology Lecture #6: Matching and Alignment
Computational Biology Lecture #6: Matching and Alignment
Multiple Sequence Alignment
Multiple Sequence Alignment (I)
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Computational Genomics Lecture #3a
Clustering.
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Multiple Sequence Alignment
Presentation transcript:

Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family characteristics. Three questions: 1.Scoring 2.Computation of Mult-Seq-Align. 3.Family representation.

Multiple Sequence Alignment

Example of MSA (Multiple Sequence Alignment)

Scoring: SP (sum of pairs) SP – the sum of pairwise scores of all pairs of symbols in the column. ρ 3 (-,A,A) = (-,A)+(-,A)+(A,A) SP Total Score = Σ ρ i Here, we will assume that: (-,-) = 0

Induced pairwise alignment Induced pairwise alignment or projection of a multiple alignment. a(S 1, S 2 ) a(S 2, S 3 ) a(S 1, S 3 ) (-,-) = 0 SP Total Score = Σ i<j score[ a(S i, S j ) ]

Dyn.Prog. Solution

Dynamic Programming Solution The best multiple alignment of r sequences is calculated using an r- dimensional hyper-cube The size of the hyper-cube is O( Πn i ) Time complexity O(2 r n r ) * O( computation of the ρ function ). Exact problem is NP-Hard (metrics: sum-of-pairs or evolutionary tree). more efficient solution is needed

Multiple Alignment from Pairwise Alignments ? Problem: The best pairwise alignment does not necessary lead to the best multiple alignment.

Pattern-APattern-X Pattern-APattern-X Pattern-B Pattern-XPattern-B Pattern-D S1 S3 S2 S1S2S1S3S2S3 Pattern-APattern-BPattern-D Empty Correct Solution S1S2S3 Pattern-X

Center Star Alignment S1S1 S2S2 S3S3 SkSk ScSc S k-1 S k-2 (a)Scoring scheme – distance. (b)Scoring scheme satisfies the triangle inequality: for any character a,b,c dist(a,c) ≤ dist(a,b) + dist(b,c) (in practice not all scoring matrices satisfy the triangle inequality) (c) D(S i, S j ) – score of the optimal pairwise alignment. (d) D(M) = Σ i<j a M (S i, S j ) – score of the multiple alignment M. (e) a M (S i, S j ) – pairwise alignment/score induced by M.

S1S1 S2S2 S3S3 SkSk ScSc S k-1 S k-2 The Center Star Algorithm: (a) Find S c minimizing Σ i  c D(S c, S i ). (b) Iteratively construct the multiple alignment M c : 1. M c ={S c } 2. Add the sequences in S\{S c } to M c one by one so that the induced alignment a Mc (S c, S i ) of every newly added sequence S i with S c is optimal. Add spaces, when needed, to all pre-aligned sequences. Running time: * O(n 2 ). AC-BC DCABC AC--BC DCAAB C AC--BC DCA-BC DCAAB C

D(M c ) is at most twice the score of the D(M opt ) D (M c ) / D (M opt ) ≤ 2(k-1)/k ( < 2 ) Proof: (a) a(S i, S j ) ≥ D (S i, S j ) (any induced align. is not better than optimal align.) a Mc (S c, S j ) = D (S c, S j ) (b) a Mc (S i, S j ) ≤ a Mc (S i, S c ) + a Mc (S c, S j ) = D (S i, S c ) + D (S c, S j ) (follows from the triangle inequality) (c) 2 D(M c ) = Σ i=1..k Σ j=1..k,j  i a Mc (S i, S j ) ≤ Σ i=1..k Σ j=1..k,j  i ( a Mc (S i, S c ) + a Mc (S c, S j ) )= 2(k-1) Σ j  c a Mc (S c, S j ) = 2(k-1) Σ j  c D(S c, S j )

(d) k Σ j=1..k,j  c D(S c, S j ) = Σ i=1..k Σ j=1..k,j  c D(S c, S j ) ≤ Σ i=1..k Σ j=1..k,j  i D(S i, S j ) ≤ Σ i=1..k Σ j=1..k,j  i a Mopt (S i, S j ) = 2 D(M opt ) (e) → 2 D(M c ) ≤ 2(k-1) Σ j  c D(S c, S j ) k Σ j  c D(S c, S j ) ≤ 2 D(M opt ) → D(M c )/(k-1) ≤ Σ j  c D(S c, S i ) Σ j  c D(S c, S i ) ≤ 2 D(M opt )/k → D (M c ) / D (M opt ) ≤ 2(k-1)/k