Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Multiple Sequence Alignment
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
COFFEE: an objective function for multiple sequence alignments
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.
Multiple alignment: heuristics
Multiple sequence alignment
Sequence Alignment III CIS 667 February 10, 2004.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Needleman-Wunsch with affine gaps
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
3D-COFFEE Mixing Sequences and Structures Cédric Notredame.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Multiple Sequence Alignment School of B&I TCD May 2010.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Multiple sequence alignment
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Tutorial 5 Phylogenetic Trees.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007.
T-COFFEE, a novel method for Multiple Sequence Alignments Cédric Notredame.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Protein Sequence Alignment Multiple Sequence Alignment
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
T-COFFEE, a novel method for combining biological information Cédric Notredame.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Topic 3: MSA Iterative Algorithms in Multiple Sequence Alignment Prepared By: 1. Chan Wei Luen 2. Lim Chee Chong 3. Poon Wei Koot 4. Xu Jin Mei 5. Yuan.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Multiple sequence alignment (msa)
Multiple Sequence Alignment
Olivier Poirot, Eamonn O'Toole and Cedric Notredame
Introduction to Bioinformatics
Presentation transcript:

Coffee Shop F 黃仁暐 F 戴志華 F 施逸優 R 吳於芳 R 林與絜

2005/12/14 2 Menu Coffee Shop Opening Why coffee shop? Three Flavors COFFEE T-Coffee 3DCoffee Remarks Recipes

2005/12/14 3 Multiple Sequence Alignment Multiple sequence alignment is one of the most important tool for analyzing biological sequence. structure prediction phylogenetic analysis function prediction polymerase chain reaction (PCR) primer design.

2005/12/14 4 Multiple Sequence Alignment However, the accuracy is not good enough. difficult to evaluate the quality of a multiple alignment algorithmically very hard to produce the optimal alignment In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.

2005/12/14 5 Before (drinking) COFFEE For comparative genomics, and why? Understanding the process of evolution at gross level and local level Translate DNA sequence data into proteins of known function Meaning of conservative regions E. coli, C. elegans, Drosophila, Human… What’s their relationship?

2005/12/14 6 阿拉伯芥 大腸桿菌 酵母菌 集胞藻屬 ( 藍綠藻類 ) 線蟲 果蠅 人類 Classification for genes of different function Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3 rd edition

2005/12/14 7 Comparative genomics vs. multiple sequence alignment Alignment → conservative region Conservative region → gene location Evolution evidence

2005/12/ A: human chromosome I B: human chromosome II C: human chromosome III Chromosome III region Mb was magnified 120X The alignment between the chromosomes

2005/12/14 9 Our Flavors COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) ,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp ,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp ,2004

COFFEE

2005/12/14 11 COFFEE An objective function for multiple sequence alignments Cédirc Notredame, Liisa Holm and Desmond G. Higgins SAGA with COFFEE score

2005/12/14 12 Introduction COFFEE - Consistency based Objective Function For alignmEnt Evaluation An objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignments Optimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)

2005/12/14 13 Overview of their method Given a set of sequences to be aligned a library containing all pairwise alignments between them, the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.

2005/12/14 14 COFFEE score       × ×  1 11,, 1 11,, )( )( COFFEE N i N ij jiji N i N ij jiji ALENW ASCOREW score librarytheandAbetweensharedarethat residuesofpairsalignedofnumberASCORE with ji ji,, )( : 

2005/12/14 15 COFFEE score

2005/12/14 16 Using COFFEE in SAGA Iteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm. The notion of survival of the fittest SAGA iteratively does: Evaluate the score of the alignments The fitter an alignment, the more likely it is to survive and produce an offspring Alignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)

2005/12/14 17 Results COFFEE function SAGA Optimization of COFFEE function Effect of optimization Comparison: COFFEE and others Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM COFFEE score & alignment accuracy 等下會看到一堆表格 很枯燥,所以請忍耐 …

2005/12/14 18 Optimization COFFEE function was optimized by SAGA Using ClustalW alignments Using SAGA alignments

2005/12/14 19 Comparison Multiple alignments of SAGA COFFEE and 5 other methods PRRP, ClustalW, PILEUP, SAGA MSA, SAM Performance of SAGA and ClustalW Comparison of other 5 methods 即使 SAGA-COFFEE 不是最好的結果 → 跟最 好的也相去不遠 Identity level lower → better SAGA- COFFEE results

2005/12/14 20

2005/12/14 21 Ratio of (E+H) residue correctly aligned Better of worse alignment? SAGA-COFFEE & others NO such thing as an ideal method Correctly aligned ratio Better than PRRP Worse than PRRP

2005/12/14 22 COFFEE score and alignment accuracy r=0.65 Coffee sequence score E+H accuracy (%) Average identity (%) 由 coffee score 去預測 alignment 的準確度 Average identity 並沒有辦 法預測 alignment 的準確度 >85% 的 sequence 都可預測 (error ~ ±10%)

2005/12/14 23 Correlation between score and accuracy Higher score → higher accuracy SAGA produces more high-score sequence than ClustalW

Coffee Break ?

T-Coffee

2005/12/14 26 T-Coffee A novel method for multiple sequence alignments C.Notredame, D. Higgins, J. Heringa ClustalW with extended library

2005/12/14 27 ClustalW ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below: Pairwise Alignment: calculate distance matrix Guide Tree Unrooted Neighbor-Joining Tree Rooted Neighbor-Joining Tree: guide tree with sequence weights Progressive Alignment: align following the guide tree

2005/12/14 28 Calculate distance matrix

2005/12/14 29 Guide tree Use Neighbor-Joining Method to build guide tree from distance matrix. First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor- Joining tree, the guide tree.

2005/12/14 30 Unrooted Neighbor-Joining Tree

2005/12/14 31 Rooted Neighbor-Joining Tree

2005/12/14 32 Progressive Alignment: align following the guide tree Seq1Seq2 Seq3Seq4 Seq5 Alignment 1 Alignment 2 Alignment 3 Final alignment

2005/12/14 33 Progressive-alignment strategy Pros Faster and saving spaces. (compared with computing all possible multiple alignments) Cons May not find optimum solution. Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in. T-Coffee is an attempt to minimize that effect! “Once a gap, always a gap!”

2005/12/14 34 T-Coffee Algorithm Generating a primary library of alignments Derivetion of the primary library weights Combination of the libraries Extending the library Progressive alignment strategy

2005/12/14 35 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library

2005/12/14 36 Primary Library

2005/12/14 37 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library

2005/12/14 38 Extended Library A Weight(A-C-B) = min( Weigh(A-C), Weight(B-C) ) = min( 77, 100 ) = 77 Weight(A-D-B) = min( Weight(A-D), Weight(B-D) ) = min( 100, 100 ) = 100

2005/12/14 39 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A

2005/12/14 40 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT

2005/12/14 41 Progressive Alignment ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library Multiple Alignment Information

2005/12/14 42 Progressive Assignment

2005/12/14 43 Complexity Analysis complexity of the whole procedure: O(N 2 L 2 ) + O(N 3 L) + O(N 3 ) + O(NL 2 ) O(N 2 L 2 ): computation of the pair-wise library O(N 3 L): computation of the extended pair-wise library O(N 3 ): computation of the NJ tree O(NL 2 ): computation of the progressive alignment N sequences that can be aligned in a multiple alignment of length L

2005/12/14 44 Experiment Implementation environment Result 1: Effect of combining local and global alignments without extension; effect of the library extension Result 2: compared with other multiple sequence alignment methods

2005/12/14 45 Implementation environment Programming language: ANSI C Hardware: LINUX platform with Pentium II processors (330 MHz). Test case: BaliBase database of multiple sequence alignment

2005/12/14 46 Result 1 Table 1: The effect of combining local and global alignments Nameglobal/local/extendCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total(141) Significance CClustalW pw /.../ CEClustalW pw/…/ex L.../Lalign pw/ LE.../Lalign pw/ex CLClustalW pw/Lalign pw/ g CLEClustalW pw/Lalign pw /ex

2005/12/14 47 Result 2 Table 2: T-coffee compared with other multiple sequence alignment methods MethodCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total1(141) Total2(141) Significance Dialign ClustalW Prrp T-Coffee

3DCoffee

2005/12/ DCoffee Combining protein sequences and structures within multiple sequence alignments O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame T-Coffee with structure information

2005/12/ DCoffee Structural information can help to improve the quality of multiple sequence alignments 3DCoffee Combines protein sequences and structures Is based on T-Coffee version 2.00 Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.

2005/12/ DCoffee Use T-Coffee to compile A primary library: a list of weighted pairs of residues. An extended library: usage the column consistency relationship between all sequences According to the structure information Fugue, SAP, LSQman

2005/12/ DCoffee Fugue – a threading method that aligns a protein sequence with a 3D-structure SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition LSQman – a rigid body structure superposition package

2005/12/ DCoffee Set the weight of new alignment as 100 which is the most score of primary library Add the weighted alignments into the library Carry out progressive alignment the same as T-Coffee

2005/12/14 54 Remarks COFFEE : An objective function for multiple sequence alignments SAGA with COFFEE score T-Coffee : A novel method for multiple sequence alignments ClustalW with extended library 3DCoffee : Combining protein sequences and structures within multiple sequence alignments T-Coffee with structure information

2005/12/14 55 Recipes CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson* COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) ,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp ,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp ,2004

2005/12/14 56 Q & A

2005/12/14 57 Thank You

2005/12/14 58 Residue score Sequence score measurement Global measurement Residue was scored 9 >90% of the pairs involved in were also present in the reference library Residue score evaluated → substitution defined Class 5 substitution → residue score ≥ 5

2005/12/ vsdvprdlevvaatptslliswdap gslevvaatptslliswdap

2005/12/14 60 Correct substitution: SAGA > ClustalW Lower accuracy: more false positive in SAGA alignment

2005/12/14 61 High-scoring residues with high accuracy Higher substitution category → smaller number of prediction