. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.

Slides:



Advertisements
Similar presentations
Computational Genomics Lecture #3a
Advertisements

. Sequence Alignment III Lecture #4 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
Problem Set 2 Solutions Tree Reconstruction Algorithms
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Computational Genomics Lecture #3a Much of this class has been edited from Nir Friedman’s lecture which is available at Changes.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
4 - 1 Chap 4 The Sequence Alignment Problem The Sequence Alignment Problem Introduction –What, Who, Where, Why, When, How The Sequence Alignment.
Defining Scoring Functions, Multiple Sequence Alignment Lecture #4
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Multiple sequence alignment
Multiple Sequence alignment Chitta Baral Arizona State University.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Chapter 5 The Evolution Trees.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Clarifications and Corrections. 2 The ‘star’ algorithm (tutorial #3 slide 13) can be implemented with the following modification: Instead of step (a)
. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger.
PAM250. M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly.
Multiple Sequence Alignment
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Introduction to Bioinformatics Algorithms Multiple Alignment.
1 Seminar in Structural Bioinformatics - Multiple sequence alignment algorithms. Elya Flax & Inbar Matarasso Multiple sequence alignment algorithms.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
. Pairwise and Multiple Alignment Lecture #4 This class has been edited from Nir Friedman’s lecture which is available at Changes.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Multiple Sequence Alignments
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Sequence Alignment. Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Tutorial 5 Phylogenetic Trees.
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Multiple Alignment.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger.
Phylogenetic Trees - Parsimony Tutorial #12
dij(T) - the length of a path between leaves i and j
Bioinformatics Algorithms and Data Structures
Sequence Alignment 11/24/2018.
Computational Biology Lecture #6: Matching and Alignment
Computational Biology Lecture #6: Matching and Alignment
Intro to Alignment Algorithms: Global and Local
CSE 589 Applied Algorithms Spring 1999
Multiple Sequence Alignment
Computational Genomics Lecture #3a
Fragment Assembly 7/30/2019.
Presentation transcript:

. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau

2 Multiple Sequence Alignment Reminder S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG T-AT-A --A--A CCACCA -GC-GC

3 Input: Sequences S 1, S 2,…, S k over the same alphabet Output: Gapped sequences S’ 1, S’ 2,…, S’ k of equal length 1.|S’ 1 |= |S’ 2 |=…= |S’ k | 2.Removal of spaces from S’ i obtains S i Sum-of-pairs (SP) score for a multiple global alignment is the sum of scores of all pairwise alignments induced by it. Multiple Sequence Alignment Reminder

4 The ‘star’ algorithm: Input: Γ - set of k strings S 1, …,S k. 1.Find the string S 1 (center) that minimizes 2.Iteratively add S 2, …,S k to the alignment Finds MA costing at most twice the optimal cost! Multiple Sequence Alignment Reminder Problem: Conventional MA does not model correctly evolutionary relationships

5 Input: X - set of sequences T – phylogenetic tree on X (leaves labeled by X ) Output: labels on internal vertices of T, s.t. sum of costs of all edges of T is minimal. How do we label internal vertices? Sequences Profiles (multiple alignments) Tree Alignment

6 A profile of a MA of length n over alphabet Σ is a (| Σ |+1)*n table. Column i holds the distribution of Σ (and gap) in that position Profile Alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- A T G C : 3

7 Aligning a sequence to a profile: Matching letter to position: weighted average of scores Indels: introducing new columns gets special consideration (same goes for aligning two profiles) Profile Alignment A T G C : 3

8 Iteratively constructs MA for intermediate nodes At each point holds profiles for all leaves Chooses closest pair of neighbors - neighbors – have common father in T - distance - cost of optimal (pairwise) alignment Aligns the two profiles to get the ‘father-profile’ Replaces the two leaves with their father Analysis: Initialization – O(k 2 ) alignments k-1 iterations Iteration i involves k-i-1 new pairwise alignments Clustal Algorithm ClustalW – more advanced version. Sequences/profiles are weighted

9 Lifted Tree Alignments Lifted tree alignment – each internal node is labeled by one of the labels of its daughters Internal nodes are sequences and not profiles Example: S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5 We’ll show: 1. DP algorithm for optimal lifted tree alignment 2. Optimal lifted alignment is 2-approximation of optimal tree alignment

10 Lifted Tree Alignments Algorithm Input: X - set of sequences T – phylogenetic tree on X (leaves labeled by X ) Output: lifted labels on internal vertices of T, s.t. sum of costs of all edges of T is minimal. Basic principle: calculate for every node v in T, and sequence S in X : d(v,S) - the optimal cost of v ’s subtree when it is labeled by S The cost of optimal tree is S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5

11 Lifted Tree Alignments Algorithm d(v,S) - the optimal cost of v ’s subtree when it is labeled by S Initialization: for leaf v labeled S v - Recurrence: for internal node v with daughters u 1,…u l - Correctness: check for suboptimal solution property Complexity: O(k 2 ) pairwise alignments - O(n 2 k 2 ). k-1 iterations For internal node v - O(k v 2 ) work Total: O(k 2 (n 2 +depth(T))) S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5 O(k 2 depth(T))=O(k 3 )

12 Lifted Tree Alignments Approximation analysis Claim: Optimal LTA 2-approximates general tree alignments We’ll show construction of LTA which costs at most twice the optimal TA with sequence-labeled nodes (? can be generalized for profile-labeled nodes ?) Notations: T* - optimal TA labels S v * - label of node v in T* T L – our constructed LTA S v L - label of node v in T L S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5

13 Lifted Tree Alignments Approximation analysis Construction: We label the nodes bottom-up. For node v with daughters u 1,…u l – we choose the label (from S u1 L,…,S u l L ) closest to S v * We need to show: D(T L ) ≤ 2D(T*) S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5

14 Lifted Tree Alignments Approximation analysis Analysis: Some edges in T L have cost 0 Observe edges (v,u) of cost > 0: S i - label of father( v ) S j - label of daughter ( u ) P(v,u) – the path in T* from v to the leaf labeled by S j D(S i,S j ) ≤ D(S i,S v *) + D(S j,S v *) ≤ 2D(S j,S v *) ≤ 2D(P(v,u)) S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5 triangle inequality choice of i triangle inequality

15 Lifted Tree Alignments Approximation analysis D(S i,S j ) ≤ 2D(P(v,u)) S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 S2S2 S4S4 S4S4 S5S5 If (u,v) and (u’,v’) are two different edges with cost > 0 in T L, then P(u,v) and P(u’,v’) are mutually disjoint in edges Final Remarks: Lifted tree alignment T L is only conceptual (we don’t have T* ) Optimal LTA cannot cost more than T L In case of profile-labeled nodes: construction and analysis OK when cost is still distance function Q.E.D.