Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜.

Similar presentations


Presentation on theme: "Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜."— Presentation transcript:

1 Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2 2005/12/14 2 Menu Coffee Shop Opening Why coffee shop? Three Flavors COFFEE T-Coffee 3DCoffee Remarks Recipes

3 2005/12/14 3 Multiple Sequence Alignment Multiple sequence alignment is one of the most important tool for analyzing biological sequence. structure prediction phylogenetic analysis function prediction polymerase chain reaction (PCR) primer design.

4 2005/12/14 4 Multiple Sequence Alignment However, the accuracy is not good enough. difficult to evaluate the quality of a multiple alignment algorithmically very hard to produce the optimal alignment In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.

5 2005/12/14 5 Before (drinking) COFFEE For comparative genomics, and why? Understanding the process of evolution at gross level and local level Translate DNA sequence data into proteins of known function Meaning of conservative regions E. coli, C. elegans, Drosophila, Human… What’s their relationship?

6 2005/12/14 6 阿拉伯芥 大腸桿菌 酵母菌 集胞藻屬 ( 藍綠藻類 ) 線蟲 果蠅 人類 Classification for genes of different function Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3 rd edition

7 2005/12/14 7 Comparative genomics vs. multiple sequence alignment Alignment → conservative region Conservative region → gene location Evolution evidence http://www.public.iastate.edu/~semrich/compgen/

8 2005/12/14 8 http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php A: human chromosome I B: human chromosome II C: human chromosome III Chromosome III region 125- 128 Mb was magnified 120X The alignment between the chromosomes

9 2005/12/14 9 Our Flavors COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385- 395,2004

10 COFFEE

11 2005/12/14 11 COFFEE An objective function for multiple sequence alignments Cédirc Notredame, Liisa Holm and Desmond G. Higgins SAGA with COFFEE score

12 2005/12/14 12 Introduction COFFEE - Consistency based Objective Function For alignmEnt Evaluation An objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignments Optimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)

13 2005/12/14 13 Overview of their method Given a set of sequences to be aligned a library containing all pairwise alignments between them, the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.

14 2005/12/14 14 COFFEE score       × ×  1 11,, 1 11,, )( )( COFFEE N i N ij jiji N i N ij jiji ALENW ASCOREW score librarytheandAbetweensharedarethat residuesofpairsalignedofnumberASCORE with ji ji,, )( : 

15 2005/12/14 15 COFFEE score

16 2005/12/14 16 Using COFFEE in SAGA Iteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm. The notion of survival of the fittest SAGA iteratively does: Evaluate the score of the alignments The fitter an alignment, the more likely it is to survive and produce an offspring Alignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)

17 2005/12/14 17 Results COFFEE function SAGA Optimization of COFFEE function Effect of optimization Comparison: COFFEE and others Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM COFFEE score & alignment accuracy 等下會看到一堆表格 很枯燥,所以請忍耐 …

18 2005/12/14 18 Optimization COFFEE function was optimized by SAGA Using ClustalW alignments Using SAGA alignments

19 2005/12/14 19 Comparison Multiple alignments of SAGA COFFEE and 5 other methods PRRP, ClustalW, PILEUP, SAGA MSA, SAM Performance of SAGA and ClustalW Comparison of other 5 methods 即使 SAGA-COFFEE 不是最好的結果 → 跟最 好的也相去不遠 Identity level lower → better SAGA- COFFEE results

20 2005/12/14 20

21 2005/12/14 21 Ratio of (E+H) residue correctly aligned Better of worse alignment? SAGA-COFFEE & others NO such thing as an ideal method Correctly aligned ratio Better than PRRP Worse than PRRP

22 2005/12/14 22 COFFEE score and alignment accuracy r=0.65 Coffee sequence score E+H accuracy (%) Average identity (%) 由 coffee score 去預測 alignment 的準確度 Average identity 並沒有辦 法預測 alignment 的準確度 >85% 的 sequence 都可預測 (error ~ ±10%)

23 2005/12/14 23 Correlation between score and accuracy Higher score → higher accuracy SAGA produces more high-score sequence than ClustalW

24 Coffee Break ?

25 T-Coffee

26 2005/12/14 26 T-Coffee A novel method for multiple sequence alignments C.Notredame, D. Higgins, J. Heringa ClustalW with extended library

27 2005/12/14 27 ClustalW ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below: Pairwise Alignment: calculate distance matrix Guide Tree Unrooted Neighbor-Joining Tree Rooted Neighbor-Joining Tree: guide tree with sequence weights Progressive Alignment: align following the guide tree

28 2005/12/14 28 Calculate distance matrix

29 2005/12/14 29 Guide tree Use Neighbor-Joining Method to build guide tree from distance matrix. First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor- Joining tree, the guide tree.

30 2005/12/14 30 Unrooted Neighbor-Joining Tree

31 2005/12/14 31 Rooted Neighbor-Joining Tree

32 2005/12/14 32 Progressive Alignment: align following the guide tree Seq1Seq2 Seq3Seq4 Seq5 Alignment 1 Alignment 2 Alignment 3 Final alignment

33 2005/12/14 33 Progressive-alignment strategy Pros Faster and saving spaces. (compared with computing all possible multiple alignments) Cons May not find optimum solution. Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in. T-Coffee is an attempt to minimize that effect! “Once a gap, always a gap!”

34 2005/12/14 34 T-Coffee Algorithm Generating a primary library of alignments Derivetion of the primary library weights Combination of the libraries Extending the library Progressive alignment strategy

35 2005/12/14 35 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library

36 2005/12/14 36 Primary Library

37 2005/12/14 37 ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library

38 2005/12/14 38 Extended Library A Weight(A-C-B) = min( Weigh(A-C), Weight(B-C) ) = min( 77, 100 ) = 77 Weight(A-D-B) = min( Weight(A-D), Weight(B-D) ) = min( 100, 100 ) = 100

39 2005/12/14 39 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A

40 2005/12/14 40 Extended Library SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT A SeqA: GARFIELD THE LAST FAT CAT SeqB: GARFIELD THE FAST CAT

41 2005/12/14 41 Progressive Alignment ClustalW Primary Library (Global) Lalign Primary Library (Local) Weighting Primary Library Extension Extended Library Multiple Alignment Information

42 2005/12/14 42 Progressive Assignment

43 2005/12/14 43 Complexity Analysis complexity of the whole procedure: O(N 2 L 2 ) + O(N 3 L) + O(N 3 ) + O(NL 2 ) O(N 2 L 2 ): computation of the pair-wise library O(N 3 L): computation of the extended pair-wise library O(N 3 ): computation of the NJ tree O(NL 2 ): computation of the progressive alignment N sequences that can be aligned in a multiple alignment of length L

44 2005/12/14 44 Experiment Implementation environment Result 1: Effect of combining local and global alignments without extension; effect of the library extension Result 2: compared with other multiple sequence alignment methods

45 2005/12/14 45 Implementation environment Programming language: ANSI C Hardware: LINUX platform with Pentium II processors (330 MHz). Test case: BaliBase database of multiple sequence alignment

46 2005/12/14 46 Result 1 Table 1: The effect of combining local and global alignments Nameglobal/local/extendCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total(141) Significance CClustalW pw /.../...70.626.743.056.060.058.97.8 CEClustalW pw/…/ex77.133.647.664.875.966.317.7 L.../Lalign pw/...65.412.122.853.966.052.07.8 LE.../Lalign pw/ex72.625.647.277.585.564.216.3 CLClustalW pw/Lalign pw/..76.232.048.376.274.666.512.1g CLEClustalW pw/Lalign pw /ex80.6 37.1 52.9 83.2 88.6 72.0

47 2005/12/14 47 Result 2 Table 2: T-coffee compared with other multiple sequence alignment methods MethodCat1(81)Cat2(23)Cat3(4)Cat4(12)Cat5(11)Total1(141) Total2(141) Significance Dialign71.025.235.174.780.461.557.311.3 ClustalW78.532.242.565.774.366.458.626.2 Prrp78.632.550.251.182.766.459.036.9 T-Coffee80.6 37.1 52.9 83.2 88.6 72.0 68.6

48 3DCoffee

49 2005/12/14 49 3DCoffee Combining protein sequences and structures within multiple sequence alignments O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame T-Coffee with structure information

50 2005/12/14 50 3DCoffee Structural information can help to improve the quality of multiple sequence alignments 3DCoffee Combines protein sequences and structures Is based on T-Coffee version 2.00 Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.

51 2005/12/14 51 3DCoffee Use T-Coffee to compile A primary library: a list of weighted pairs of residues. An extended library: usage the column consistency relationship between all sequences According to the structure information Fugue, SAP, LSQman

52 2005/12/14 52 3DCoffee Fugue – a threading method that aligns a protein sequence with a 3D-structure SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition LSQman – a rigid body structure superposition package

53 2005/12/14 53 3DCoffee Set the weight of new alignment as 100 which is the most score of primary library Add the weighted alignments into the library Carry out progressive alignment the same as T-Coffee

54 2005/12/14 54 Remarks COFFEE : An objective function for multiple sequence alignments SAGA with COFFEE score T-Coffee : A novel method for multiple sequence alignments ClustalW with extended library 3DCoffee : Combining protein sequences and structures within multiple sequence alignments T-Coffee with structure information

55 2005/12/14 55 Recipes CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994 COFFEE: A New Objective Function For Multiple Sequence Alignmnent. C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407- 422,1998 T-Coffee: A novel method for multiple sequence alignments. C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments. O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004

56 2005/12/14 56 Q & A

57 2005/12/14 57 Thank You

58 2005/12/14 58 Residue score Sequence score measurement Global measurement Residue was scored 9 >90% of the pairs involved in were also present in the reference library Residue score evaluated → substitution defined Class 5 substitution → residue score ≥ 5

59 2005/12/14 59 5566677788888888899999877 - - - - -66666666788888888887 vsdvprdlevvaatptslliswdap gslevvaatptslliswdap

60 2005/12/14 60 Correct substitution: SAGA > ClustalW Lower accuracy: more false positive in SAGA alignment

61 2005/12/14 61 High-scoring residues with high accuracy Higher substitution category → smaller number of prediction


Download ppt "Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜."

Similar presentations


Ads by Google