Coffee Shop F 黃仁暐 F 戴志華 F 施逸優 R 吳於芳 R 林與絜
2005/12/14 2 Menu Coffee Shop Opening Why coffee shop? Three Flavors COFFEE T-Coffee 3DCoffee Remarks Recipes
2005/12/14 3 Multiple Sequence Alignment Multiple sequence alignment is one of the most important tool for analyzing biological sequence. structure prediction phylogenetic analysis function prediction polymerase chain reaction (PCR) primer design.
2005/12/14 4 Multiple Sequence Alignment However, the accuracy is not good enough. difficult to evaluate the quality of a multiple alignment algorithmically very hard to produce the optimal alignment In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.
2005/12/14 5 Before (drinking) COFFEE For comparative genomics, and why? Understanding the process of evolution at gross level and local level Translate DNA sequence data into proteins of known function Meaning of conservative regions E. coli, C. elegans, Drosophila, Human… What’s their relationship?
2005/12/14 6 阿拉伯芥 大腸桿菌 酵母菌 集胞藻屬 ( 藍綠藻類 ) 線蟲 果蠅 人類 Classification for genes of different function Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3 rd edition
2005/12/14 7 Comparative genomics vs. multiple sequence alignment Alignment → conservative region Conservative region → gene location Evolution evidence
2005/12/ A: human chromosome I B: human chromosome II C: human chromosome III Chromosome III region Mb was magnified 120X The alignment between the chromosomes
2005/12/14 9 Our Flavors COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) ,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp ,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp ,2004
COFFEE
2005/12/14 11 COFFEE An objective function for multiple sequence alignments Cédirc Notredame, Liisa Holm and Desmond G. Higgins SAGA with COFFEE score
2005/12/14 12 Introduction COFFEE - Consistency based Objective Function For alignmEnt Evaluation An objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignments Optimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)
2005/12/14 13 Overview of their method Given a set of sequences to be aligned a library containing all pairwise alignments between them, the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.
2005/12/14 14 COFFEE score × × 1 11,, 1 11,, )( )( COFFEE N i N ij jiji N i N ij jiji ALENW ASCOREW score librarytheandAbetweensharedarethat residuesofpairsalignedofnumberASCORE with ji ji,, )( :
2005/12/14 15 COFFEE score
2005/12/14 16 Using COFFEE in SAGA Iteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm. The notion of survival of the fittest SAGA iteratively does: Evaluate the score of the alignments The fitter an alignment, the more likely it is to survive and produce an offspring Alignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)
2005/12/14 17 Results COFFEE function SAGA Optimization of COFFEE function Effect of optimization Comparison: COFFEE and others Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM COFFEE score & alignment accuracy 等下會看到一堆表格 很枯燥,所以請忍耐 …
2005/12/14 18 Optimization COFFEE function was optimized by SAGA Using ClustalW alignments Using SAGA alignments
2005/12/14 19 Comparison Multiple alignments of SAGA COFFEE and 5 other methods PRRP, ClustalW, PILEUP, SAGA MSA, SAM Performance of SAGA and ClustalW Comparison of other 5 methods 即使 SAGA-COFFEE 不是最好的結果 → 跟最 好的也相去不遠 Identity level lower → better SAGA- COFFEE results
2005/12/14 20
2005/12/14 21 Ratio of (E+H) residue correctly aligned Better of worse alignment? SAGA-COFFEE & others NO such thing as an ideal method Correctly aligned ratio Better than PRRP Worse than PRRP
2005/12/14 22 COFFEE score and alignment accuracy r=0.65 Coffee sequence score E+H accuracy (%) Average identity (%) 由 coffee score 去預測 alignment 的準確度 Average identity 並沒有辦 法預測 alignment 的準確度 >85% 的 sequence 都可預測 (error ~ ±10%)
2005/12/14 23 Correlation between score and accuracy Higher score → higher accuracy SAGA produces more high-score sequence than ClustalW
Coffee Break ?
T-Coffee
2005/12/14 26 T-Coffee A novel method for multiple sequence alignments C.Notredame, D. Higgins, J. Heringa ClustalW with extended library
2005/12/14 27 ClustalW ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below: Pairwise Alignment: calculate distance matrix Guide Tree Unrooted Neighbor-Joining Tree Rooted Neighbor-Joining Tree: guide tree with sequence weights Progressive Alignment: align following the guide tree
2005/12/14 28 Calculate distance matrix
2005/12/14 29 Guide tree Use Neighbor-Joining Method to build guide tree from distance matrix. First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor- Joining tree, the guide tree.
2005/12/14 30 Unrooted Neighbor-Joining Tree
2005/12/14 31 Rooted Neighbor-Joining Tree
2005/12/14 32 Progressive Alignment: align following the guide tree Seq1Seq2 Seq3Seq4 Seq5 Alignment 1 Alignment 2 Alignment 3 Final alignment
2005/12/14 33 Progressive-alignment strategy Pros Faster and saving spaces. (compared with computing all possible multiple alignments) Cons May not find optimum solution. Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in. T-Coffee is an attempt to minimize that effect! “Once a gap, always a gap!”
2005/12/14 34 T-Coffee Algorithm Generating a primary library of alignments Derivetion of the primary library weights Combination of the libraries Extending the library Progressive alignment strategy
2005/12/14 35 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library
2005/12/14 36 Primary Library
2005/12/14 37 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library
2005/12/14 38 Extended Library A Weight(A-C-B) = min( Weigh(A-C), Weight(B-C) ) = min( 77, 100 ) = 77 Weight(A-D-B) = min( Weight(A-D), Weight(B-D) ) = min( 100, 100 ) = 100
2005/12/14 39 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A
2005/12/14 40 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT
2005/12/14 41 Progressive Alignment ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library Multiple Alignment Information
2005/12/14 42 Progressive Assignment
2005/12/14 43 Complexity Analysis complexity of the whole procedure: O(N 2 L 2 ) + O(N 3 L) + O(N 3 ) + O(NL 2 ) O(N 2 L 2 ): computation of the pair-wise library O(N 3 L): computation of the extended pair-wise library O(N 3 ): computation of the NJ tree O(NL 2 ): computation of the progressive alignment N sequences that can be aligned in a multiple alignment of length L
2005/12/14 44 Experiment Implementation environment Result 1: Effect of combining local and global alignments without extension; effect of the library extension Result 2: compared with other multiple sequence alignment methods
2005/12/14 45 Implementation environment Programming language: ANSI C Hardware: LINUX platform with Pentium II processors (330 MHz). Test case: BaliBase database of multiple sequence alignment
2005/12/14 46 Result 1 Table 1: The effect of combining local and global alignments Nameglobal/local/extendCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total(141) Significance CClustalW pw /.../ CEClustalW pw/…/ex L.../Lalign pw/ LE.../Lalign pw/ex CLClustalW pw/Lalign pw/ g CLEClustalW pw/Lalign pw /ex
2005/12/14 47 Result 2 Table 2: T-coffee compared with other multiple sequence alignment methods MethodCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total1(141) Total2(141) Significance Dialign ClustalW Prrp T-Coffee
3DCoffee
2005/12/ DCoffee Combining protein sequences and structures within multiple sequence alignments O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame T-Coffee with structure information
2005/12/ DCoffee Structural information can help to improve the quality of multiple sequence alignments 3DCoffee Combines protein sequences and structures Is based on T-Coffee version 2.00 Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.
2005/12/ DCoffee Use T-Coffee to compile A primary library: a list of weighted pairs of residues. An extended library: usage the column consistency relationship between all sequences According to the structure information Fugue, SAP, LSQman
2005/12/ DCoffee Fugue – a threading method that aligns a protein sequence with a 3D-structure SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition LSQman – a rigid body structure superposition package
2005/12/ DCoffee Set the weight of new alignment as 100 which is the most score of primary library Add the weighted alignments into the library Carry out progressive alignment the same as T-Coffee
2005/12/14 54 Remarks COFFEE : An objective function for multiple sequence alignments SAGA with COFFEE score T-Coffee : A novel method for multiple sequence alignments ClustalW with extended library 3DCoffee : Combining protein sequences and structures within multiple sequence alignments T-Coffee with structure information
2005/12/14 55 Recipes CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson* COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) ,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp ,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp ,2004
2005/12/14 56 Q & A
2005/12/14 57 Thank You
2005/12/14 58 Residue score Sequence score measurement Global measurement Residue was scored 9 >90% of the pairs involved in were also present in the reference library Residue score evaluated → substitution defined Class 5 substitution → residue score ≥ 5
2005/12/ vsdvprdlevvaatptslliswdap gslevvaatptslliswdap
2005/12/14 60 Correct substitution: SAGA > ClustalW Lower accuracy: more false positive in SAGA alignment
2005/12/14 61 High-scoring residues with high accuracy Higher substitution category → smaller number of prediction