Download presentation
Presentation is loading. Please wait.
Published byJoseph Hutchinson Modified over 9 years ago
1
Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜
2
2005/12/14 2 Menu Coffee Shop Opening Why coffee shop? Three Flavors COFFEE T-Coffee 3DCoffee Remarks Recipes
3
2005/12/14 3 Multiple Sequence Alignment Multiple sequence alignment is one of the most important tool for analyzing biological sequence. structure prediction phylogenetic analysis function prediction polymerase chain reaction (PCR) primer design.
4
2005/12/14 4 Multiple Sequence Alignment However, the accuracy is not good enough. difficult to evaluate the quality of a multiple alignment algorithmically very hard to produce the optimal alignment In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.
5
2005/12/14 5 Before (drinking) COFFEE For comparative genomics, and why? Understanding the process of evolution at gross level and local level Translate DNA sequence data into proteins of known function Meaning of conservative regions E. coli, C. elegans, Drosophila, Human… What’s their relationship?
6
2005/12/14 6 阿拉伯芥 大腸桿菌 酵母菌 集胞藻屬 ( 藍綠藻類 ) 線蟲 果蠅 人類 Classification for genes of different function Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3 rd edition
7
2005/12/14 7 Comparative genomics vs. multiple sequence alignment Alignment → conservative region Conservative region → gene location Evolution evidence http://www.public.iastate.edu/~semrich/compgen/
8
2005/12/14 8 http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php A: human chromosome I B: human chromosome II C: human chromosome III Chromosome III region 125- 128 Mb was magnified 120X The alignment between the chromosomes
9
2005/12/14 9 Our Flavors COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385- 395,2004
10
COFFEE
11
2005/12/14 11 COFFEE An objective function for multiple sequence alignments Cédirc Notredame, Liisa Holm and Desmond G. Higgins SAGA with COFFEE score
12
2005/12/14 12 Introduction COFFEE - Consistency based Objective Function For alignmEnt Evaluation An objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignments Optimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)
13
2005/12/14 13 Overview of their method Given a set of sequences to be aligned a library containing all pairwise alignments between them, the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.
14
2005/12/14 14 COFFEE score × × 1 11,, 1 11,, )( )( COFFEE N i N ij jiji N i N ij jiji ALENW ASCOREW score librarytheandAbetweensharedarethat residuesofpairsalignedofnumberASCORE with ji ji,, )( :
15
2005/12/14 15 COFFEE score
16
2005/12/14 16 Using COFFEE in SAGA Iteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm. The notion of survival of the fittest SAGA iteratively does: Evaluate the score of the alignments The fitter an alignment, the more likely it is to survive and produce an offspring Alignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)
17
2005/12/14 17 Results COFFEE function SAGA Optimization of COFFEE function Effect of optimization Comparison: COFFEE and others Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM COFFEE score & alignment accuracy 等下會看到一堆表格 很枯燥,所以請忍耐 …
18
2005/12/14 18 Optimization COFFEE function was optimized by SAGA Using ClustalW alignments Using SAGA alignments
19
2005/12/14 19 Comparison Multiple alignments of SAGA COFFEE and 5 other methods PRRP, ClustalW, PILEUP, SAGA MSA, SAM Performance of SAGA and ClustalW Comparison of other 5 methods 即使 SAGA-COFFEE 不是最好的結果 → 跟最 好的也相去不遠 Identity level lower → better SAGA- COFFEE results
20
2005/12/14 20
21
2005/12/14 21 Ratio of (E+H) residue correctly aligned Better of worse alignment? SAGA-COFFEE & others NO such thing as an ideal method Correctly aligned ratio Better than PRRP Worse than PRRP
22
2005/12/14 22 COFFEE score and alignment accuracy r=0.65 Coffee sequence score E+H accuracy (%) Average identity (%) 由 coffee score 去預測 alignment 的準確度 Average identity 並沒有辦 法預測 alignment 的準確度 >85% 的 sequence 都可預測 (error ~ ±10%)
23
2005/12/14 23 Correlation between score and accuracy Higher score → higher accuracy SAGA produces more high-score sequence than ClustalW
24
Coffee Break ?
25
T-Coffee
26
2005/12/14 26 T-Coffee A novel method for multiple sequence alignments C.Notredame, D. Higgins, J. Heringa ClustalW with extended library
27
2005/12/14 27 ClustalW ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below: Pairwise Alignment: calculate distance matrix Guide Tree Unrooted Neighbor-Joining Tree Rooted Neighbor-Joining Tree: guide tree with sequence weights Progressive Alignment: align following the guide tree
28
2005/12/14 28 Calculate distance matrix
29
2005/12/14 29 Guide tree Use Neighbor-Joining Method to build guide tree from distance matrix. First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor- Joining tree, the guide tree.
30
2005/12/14 30 Unrooted Neighbor-Joining Tree
31
2005/12/14 31 Rooted Neighbor-Joining Tree
32
2005/12/14 32 Progressive Alignment: align following the guide tree Seq1Seq2 Seq3Seq4 Seq5 Alignment 1 Alignment 2 Alignment 3 Final alignment
33
2005/12/14 33 Progressive-alignment strategy Pros Faster and saving spaces. (compared with computing all possible multiple alignments) Cons May not find optimum solution. Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in. T-Coffee is an attempt to minimize that effect! “Once a gap, always a gap!”
34
2005/12/14 34 T-Coffee Algorithm Generating a primary library of alignments Derivetion of the primary library weights Combination of the libraries Extending the library Progressive alignment strategy
35
2005/12/14 35 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library
36
2005/12/14 36 Primary Library
37
2005/12/14 37 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library
38
2005/12/14 38 Extended Library A Weight(A-C-B) = min( Weigh(A-C), Weight(B-C) ) = min( 77, 100 ) = 77 Weight(A-D-B) = min( Weight(A-D), Weight(B-D) ) = min( 100, 100 ) = 100
39
2005/12/14 39 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A
40
2005/12/14 40 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT
41
2005/12/14 41 Progressive Alignment ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library Multiple Alignment Information
42
2005/12/14 42 Progressive Assignment
43
2005/12/14 43 Complexity Analysis complexity of the whole procedure: O(N 2 L 2 ) + O(N 3 L) + O(N 3 ) + O(NL 2 ) O(N 2 L 2 ): computation of the pair-wise library O(N 3 L): computation of the extended pair-wise library O(N 3 ): computation of the NJ tree O(NL 2 ): computation of the progressive alignment N sequences that can be aligned in a multiple alignment of length L
44
2005/12/14 44 Experiment Implementation environment Result 1: Effect of combining local and global alignments without extension; effect of the library extension Result 2: compared with other multiple sequence alignment methods
45
2005/12/14 45 Implementation environment Programming language: ANSI C Hardware: LINUX platform with Pentium II processors (330 MHz). Test case: BaliBase database of multiple sequence alignment
46
2005/12/14 46 Result 1 Table 1: The effect of combining local and global alignments Nameglobal/local/extendCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total(141) Significance CClustalW pw /.../...70.626.743.056.060.058.97.8 CEClustalW pw/…/ex77.133.647.664.875.966.317.7 L.../Lalign pw/...65.412.122.853.966.052.07.8 LE.../Lalign pw/ex72.625.647.277.585.564.216.3 CLClustalW pw/Lalign pw/..76.232.048.376.274.666.512.1g CLEClustalW pw/Lalign pw /ex80.6 37.1 52.9 83.2 88.6 72.0
47
2005/12/14 47 Result 2 Table 2: T-coffee compared with other multiple sequence alignment methods MethodCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total1(141) Total2(141) Significance Dialign71.025.235.174.780.461.557.311.3 ClustalW78.532.242.565.774.366.458.626.2 Prrp78.632.550.251.182.766.459.036.9 T-Coffee80.6 37.1 52.9 83.2 88.6 72.0 68.6
48
3DCoffee
49
2005/12/14 49 3DCoffee Combining protein sequences and structures within multiple sequence alignments O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame T-Coffee with structure information
50
2005/12/14 50 3DCoffee Structural information can help to improve the quality of multiple sequence alignments 3DCoffee Combines protein sequences and structures Is based on T-Coffee version 2.00 Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.
51
2005/12/14 51 3DCoffee Use T-Coffee to compile A primary library: a list of weighted pairs of residues. An extended library: usage the column consistency relationship between all sequences According to the structure information Fugue, SAP, LSQman
52
2005/12/14 52 3DCoffee Fugue – a threading method that aligns a protein sequence with a 3D-structure SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition LSQman – a rigid body structure superposition package
53
2005/12/14 53 3DCoffee Set the weight of new alignment as 100 which is the most score of primary library Add the weighted alignments into the library Carry out progressive alignment the same as T-Coffee
54
2005/12/14 54 Remarks COFFEE : An objective function for multiple sequence alignments SAGA with COFFEE score T-Coffee : A novel method for multiple sequence alignments ClustalW with extended library 3DCoffee : Combining protein sequences and structures within multiple sequence alignments T-Coffee with structure information
55
2005/12/14 55 Recipes CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994 COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407- 422,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004
56
2005/12/14 56 Q & A
57
2005/12/14 57 Thank You
58
2005/12/14 58 Residue score Sequence score measurement Global measurement Residue was scored 9 >90% of the pairs involved in were also present in the reference library Residue score evaluated → substitution defined Class 5 substitution → residue score ≥ 5
59
2005/12/14 59 5566677788888888899999877 - - - - -66666666788888888887 vsdvprdlevvaatptslliswdap gslevvaatptslliswdap
60
2005/12/14 60 Correct substitution: SAGA > ClustalW Lower accuracy: more false positive in SAGA alignment
61
2005/12/14 61 High-scoring residues with high accuracy Higher substitution category → smaller number of prediction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.