Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Techniques for Protein Sequence Alignment and Database Searching
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Sequence Similarity Searching Class 4 March 2010.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Sequence analysis lecture 6 Sequence analysis course Lecture 6 Multiple sequence alignment 2 of 3 Multiple alignment methods.
Multiple Sequence Comparison.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Multiple sequence alignment
BNFO 602 Multiple sequence alignment Usman Roshan.
Multiple Sequence Alignments
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 16/11/06 Multiple sequence alignment 1 Sequence analysis 2006 Multiple.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Needleman-Wunsch with affine gaps
Bioinformatics Sequence Analysis III
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
CS 177 Sequence Alignment Classification of sequence alignments
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Construction of Substitution Matrices
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Cédric Notredame (07/11/2015) Recent Progress in Multiple Sequence Alignments: A Survey Cédric Notredame.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Alignment.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Multiple alignments, PATTERNS, PSI-BLAST.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Protein multiple sequence alignment by hybrid bio-inspired algorithms Vincenzo Cutello, Giuseppe Nicosia*, Mario Pavone and Igor Prizzi Nucleic Acids Research,
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Topic 3: MSA Iterative Algorithms in Multiple Sequence Alignment Prepared By: 1. Chan Wei Luen 2. Lim Chee Chong 3. Poon Wei Koot 4. Xu Jin Mei 5. Yuan.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Multiple Sequence Alignment
Multiple Alignment Anders Gorm Pedersen / Henrik Nielsen
Multiple sequence alignment (msa)
Recent Progress in Multiple Sequence Alignments: A Survey
Sequence Alignment 11/24/2018.
Lecture #7: FASTA & LFASTA
Introduction to Bioinformatics
Presentation transcript:

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000

2 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

3 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

4 / 29 VLSPADNVKAAWGKVGAHAGEYGAEALERMF VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVY GLSDGEWQLVLNVWGKVEADIPGHVLIRLFK V-LSPADN--VKAAWGKVGAHAGEYGAEALERM---F- VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP G-LSDGEWQLVLNVWGKVEA---DIPGHVLIRL---FK -VF---- -VLSPADN--VKAAWGKVGAHAGEYGAEALERMF---- VHLVVYP VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP -GFK--- -GLSDGEWQLVLNVWGKVEA---DIPGHVLIRLFK--- Multiple Sequence Alignment Problem Given Sequence Set: –Insert gaps into sequences so that evolutionary conserved regions are aligned Important tool –Relate Homologous Proteins –Discover Conserved Regions

5 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

6 / 29 Tree based  cost(edge) m Sum of Pairs  cost(i,j)  cost(i,j) = 6  cost(edge) = 1 m Scoring Multiple Alignments gorilla orangutan gibbon chimpanzee human

7 / 29 Alignments Scoring Cost Matrix: C (aa 1, aa 2 ) Gaps Penalties: Simple: C (aa, -) Affine: C(-) + Len * C (aa,-) Cost(s[1..i],t[i..j]) = min( Cost(s[1..i],t[i..j-1]) – g, Cost(s[1..i-1],t[i..j-1]) – C(s[i],t[j]) Cost(s[1..i-1],t[i..j]) – g)) VLSPADNVKA G L S D G E W Q L V L

8 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

9 / 29 Global Methods –Optimal Algorithms (MSA, MWT, MUSEQAL) –Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) Local methods –PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign Combined (GENALIGN, ASSEMBLE, DCA) Statistical (HMMT, SAGA, SAM, Match Box) Parsimony (MALIGN, TreeAlign) Current Approaches Global Methods –Optimal Algorithms (MSA, MWT, MUSEQAL) –Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) Local methods –PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign Combined (GENALIGN, ASSEMBLE, DCA) Statistical (HMMT, SAGA, SAM, Match Box) Parsimony (MALIGN, TreeAlign) Global Alignment ABCDEFGHI :::: ABCD-FGHI Local Alignment XXXABCDYYY :::: ZZZABCDEEEE

10 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

11 / 29 Our Heuristic Distance Estimation Tree Construction Node Initialization Tree Partitioning Iteration

12 / 29 PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY Estimation of Protein Distance Aligned Sequences Estimated Pair Distances Issue: Implied vs. Optimal Pair Alignments PEAAALYGRFT---IKSDVW PESAALYGRFT---IKSDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY PESLALYNKFSIKSDVW PEALNYGRY-SSESDVW PESLALYNKFSIKSDVW PEAL-NYGRYSSESDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW

13 / 29 Optimal Pair vs. Implied Pair

14 / 29 Interior Node Classification Interior Nodes Classified by Percent Identity –PID = (# matched residues) / (# total residues) –User Specified Tiers –User Specified Cost Criterion Example: –PID > 60% -- PAM 40 – High Gap Penalties –PID > 40% -- PAM 120 – Medium Gap Penalties –PID < 40% -- PAM 200 – Low Gap Penalty

15 / 29 Ordering Alignments Isolate Sub Trees Threshold PID Order Alignments 1.Sub Tree 2.Border Nodes 3.Integrate All

16 / 29 Interior Alignments Sum of Pairs Bounded Search Implementation Modular Reentrant Flexible Cost Criterion

17 / 29 Generating Consensus Alignment (A1,A2,A3) Consensus X  Min (  D i (A i,X) ) For Each Position i: X i =   A1 X D1 D2 D3 A3 A2 Min (cost( , A1 i ) + cost( , A2 i ) + cost( , A3 i ))

18 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

19 / 29 Testing the Method BAliBASE benchmark –“Correct” Alignments –Core Blocks of Conserved Motifs –Typical “Hard Problem” Sets Protein Parsimony –Measures “Evolutionary Steps” of Alignment

20 / 29 Baseline BAliBASE SP betterbetter

21 / 29 Baseline BAliBASE TC betterbetter

22 / 29 Baseline - ProtPars betterbetter

23 / 29 Orphans/Families BAliBASE SP betterbetter

24 / 29 Orphans/Families ProtPars betterbetter

25 / 29 Larger Families betterbetter

26 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

27 / 29 Conclusions Solution Quality Captures Evolutionary Information Iterations Converge Quickly Useful Tool

28 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

29 / 29 Future Work Improved Alignment Consensus Multiple Partitioning Thresholds Multiple Solutions Integrated Phylogeny Modifications Parallel Implementation