Download presentation
Presentation is loading. Please wait.
Published byAlvin Caldwell Modified over 9 years ago
1
Fragment Assembly method Mika Takata
2
outline Fragment Assembly Basic theory Process Techniques David Baker’s group approaches Other top ranked approaches at CASP7 Discussion
3
Fragment theory Short fragments(5~9 residues) have tendency to have specific conformation These tendency is repeated in the structure of proteins Sequences with particular short structural motif is somewhat similar. [Unger et al., 1989][Rooman st al., 1990] Protein can be “re-constructed” by using fragments excised from other proteins in library [Jones and Thirup, 1986, Claessems et al. 1989]
4
Appendix; Other basics Levinthal’s paradox (Levinthal 1968) Local structural bias Pauling and Corey (1951); Rooman et al. 1990; Bystroff etal. 1996; han et al. 1997; Bystroff and Baker 1998; Camproux et al. 1999; Gerard 1999; Li et al. 2008 Recurrent sequence pattern HMMSTR (Bystroff et al. 2000) : probabilistic version of a structural motif library Ramachandran basin technique: both φ and ψ angle intervals into six ranges
5
FA: Fragment Assembly – basic theory High evaluation @CASP David Baker’s team developed Assemble consecutive short fragments Applied for low identical protein (less than 30%) Fragment extraction Choose several candidates Consecutive about 10 residues Assemble local structures Whole structure Lowest Free energy sampling Global optimization
6
Building blocks library How many data set do we need to cover all protein structure? 78% of all hexamer structures in test group were covered by a library of 81 hexamers [Unger et al, 1989]. What size of fragment data do we need to use? A fixed size Most studies use 6 amino-acids[Unger et al] Others use 5~8 9 is greater than other fragment length of less than 15 amino acids[Bystroff et al., 1996] Combination of length 3,9, and 12[Baker et al, 1998] Adjusted size Library of “natural” building blocks 3~4 to 10~12
7
Library of building blocks Polypeptide chain was represented by a sequence of rigid fragments and concatenated without any degree of freedom [Koloduy et at; 2002] The quality of total conformation depends on i. The length (f) of the utilized fragment ii. The size (s) of the library Complexity of a library = 2.9Å RMSD (2.7 complexity, f=7) ~ 0.76Å (15 complexity, f=5)
8
Overlapping manner Using library Concatenating building blocks in an overlapping manner superimposition; one block is fused to other one The level of “superimposability”; how well matching Too low; two fragments in query do not belong together Too high; two fragments are connected in a rigid manner, which means the chain are not flexible enough to reconstruct the overall conformation
9
concatenation Non-local interactions I.Knowledge-based protein functions derived from the protein database II.Potential functions based on chemical intuition Residue to All-atom conformation
10
Backbone evaluation Naϊve approach[Unger et al., 1989] Fragments clustering & clustering algorithm[Bonneau et al., 2001] Global evaluation I. Approach based on graph algorithm II. Optimization algorithm such as Monte Carlo and Genetic Algorithm[Unger and Moult, 1993, Pedersen and Moult, 97a, 97b, Yadgari et al, 1998] Backbone structure Simple Energy function high Full all-atom models All-atom energy evaluation Best low C-alpha conformation search Non-local interaction
11
Baker’s method I (top of Free Modeling) CASP 7 Low resolution fragment assembly ( backbone structure) + full chain refinement 1.sampling Query + its homology (max 30 sequences) of short fragments Long-range Beta Strand Pairing based on Secondary structure prediction 2. all-atom energy function High computational Power : ROSETTA@Home Simple topology targets with about 100 residues are treatable Good Secondary Prediction -> High accuracy of targets with about 100 residues (<3Å)
12
Baker’s method II - Constructing main-chain structure Overlapping:All-atom refinement Two atoms rejected within 2.5Å Metropolis criterion[eq.(2)] Nearest neighbors of a segment demonstrate the structure of the sequence mapping around the segment [Han & Baker, 1995] Reliable even without knowledge of the true structure [Yi & Lander, 1993] 25 nearest neighbors used [Baker et al, 1997] [eq.(1)] fragment i-1 fragment i fragment i+1
13
equation The nearest neighbor (1) (2) All-atom refinement
14
Baker’s method III –overlapping Side-chain refinement Expected neighbor density around each residue The number of atoms of other residues within 10 Å of the atom of the residue
15
Baker’s method IV ; All-atom refinement CASP7 Improvement Main chain accuracy is not widely different, but the final conformation is greatly improved Problem 500k CPU hours per domain 140k computers with performance of 37TFLOPS Long protein is not treatable All-atom energy landscape is rugged
16
Top of CASP7 (FM section) GroupMethod BakerROSETTA (FA) ZhangI-TASSER (FA, Replica Exchange, Lattice Model) Zhang-serverI-TASSER (FA, Replica Exchange, Lattice Model) SBCServer results (Meta Selector) POEM-REFINEROSETTA(FA), Full-atom Refinement GeneSiliceROSETTA(FA) ROBETTAROSETTA(FA) ROKKOSimFold + FA Jones-UCLFRAGFOLD (FA) SAM-T06Frag finder, Undertaker (FA) TASSERFA, Replica Exchange, Lattice Model CASP7 2 sections: Template Based Modeling (TBM), Free Modeling (FM) Baker, Zhang, Zhang-server predominated in both sections
17
CASP 7 I-TASSER protocol (top of template-based modeling) Various lengths 1~2 days for a sequence to submit a final prediction ~4 Å (TBA), ~11Å (FA) RMSD [8]
18
Summary Fragment assembly method simplify protein folding problem Not require a new structure for a query, but select the correct parts to be fit in building the accurate conformation Local compactness is considered by using known data Baker’s high success all-atom refinement by using high computational power Problems High Computational cost performance Computational distribution, ex. Rosetta@home Sampling methods..
19
To improve.. Fragment Assembly How to choose fragments where to cut and separate what is the optimal length How to constraint Competitive learning? Scoring function Cf. statistic potential energy, Bayesian scoring function.. 19
20
Reference 1. Ron Unger, THE BUILDING BLOCK APPROACH TO PROTEIN STRUCTURE PREDICTION, The New Avenues in Bioinformatics, 2004, 177-188; http://www.springerlink.com/content/h63474928680757x/ http://www.springerlink.com/content/h63474928680757x/ 2. Kim T. Simons, Charles Kooperberg, Enoch Huang and David Baker, Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions, J. Mol. Biol. (1997) 268, 209- 225 3. Shuai Cheng Li, Dongbo Bu, Jinbo Xu, and Ming Li, Fragment-HMM: A new approach to protein structure prediction, Protein Science (2008), 17: 1925-1934 4. Vladimir Yarov-Yarovoy, Jack Schonbrun, and David Baker, Multipass Membrane Protein Structure Prediction Using Rosetta, Proteins. 2006 March 1; 62(4): 1010- 1025. 5. Rhiju Das and David Baker, Prospects for de novo phasing with de novo protein models, Biological Crystallography ISSN 0907-4449 6. Arthur M. Lesk, Loredana Lo Conte, and Tim J.P. Hubbard, Assessment of Novel Fold Targets in CASP4: Predictions of Three-Dimensional Structures, Secondary Structures, and Interresidue Contacts, Proteins: Structure, Function, and Genetics Suppl 5:98-118 (2001) 7. David Baker, CASP 7 ; http://www.cs.nott.ac.uk/CFNJC/slides/Presentation_Pawel_JC_04-2008.pdf David Baker, CASP 7 ; http://www.cs.nott.ac.uk/CFNJC/slides/Presentation_Pawel_JC_04-2008.pdf 8. Y. Zhang, I-TASSER; http://zhang.bioinformatics.ku.edu/I-TASSER/ Y. Zhang, I-TASSER; http://zhang.bioinformatics.ku.edu/I-TASSER/
21
Previous approach and experiments by using Fragment Assembly
22
Main backbone Need to improve main bone structure ( Cα conformation) Need to apply FA theory Need to use classification SCOP: Class, Fold, superfamily, family, domain level 22
23
Cα conformation prediction Remote homology profiling ・ PSI-BLAST profile Classification ・ Fold, superfamily, Family Level SCOP Fragment assembly 5~11 residues 23
24
Previous Searching approach based on FA 24
25
From previous experiment… Target 1aa2 (108 residues) Fragment 7 amno-acids fragment Classification Family, superfamily level training data: e-value low hit10 Global scoring function HCF: Hydrophobic compactness function 25
26
result (1) 26 dRMS( Å ) Family level classification Lattice modelLow energy Top 10 All FA sets (68) Cubic lattice FCC lattice BestmeanSDBestmeanmaxSD 14.098.681.560.8700.8860.208.4920.965.98
27
result(2) 27 dRMS( Å ) Family + Superfamily level Low energy Top 10 All FA sets (68) BestmeanSDBestmeanmaxSD 1.560.8700.8860.205.8321.05.98
28
Appendix(ii) -Experiment –all atom- purpose All-atom complexity Data 10 relatively small data Lattice Model potential energy function Scoring function based on chemical features Hydrophilic, Hydrophobicity, Electric charge, tendency of side-chain and electric charge Accuracy measurement RMSD
29
Face Centered Cubic Lattice Model Nearest neighbor : 12 residues Nearest real model -> considering space among residues difficulties Accuracy of the Model How to evaluate energy; energy function How to search optimization 29
30
Appendix(ii) – lattice to all-atom result Protein (PDB id) Size dRMS ( Å ) Cubic(main chain) FCC(main chain) ※ ref.)(main chain) All-atom ※ ref.) (All-atom) 1 alg 249.448.5810.33 1ku57019.1116.2420.41 1aa210814.098.686.0616.1212.09 1beo9813.8211.526.3615.8912.01 1ctf6812.589.405.4514.199.20 1dkt-A7213.7510.665.5915.6210.98 1fca5511.188.345.1612.309.00 1fgp7012.378.205.9814.0211.16 1jer11015.6410.997.5316.9013.79 1nkl7812.689.785.7014.6110.13 average 13.4710.24 5.98 15.0411.05
31
Discussion Classification should be applied to improve accuracy To choose fragment data Accuracy of Energy function 31
32
Reference Yu Xia, Enoch S.Huang, Michael Levitt and Ram Samudrala, Ab Initio Construction of Protein Tertiary Structures Using a Hierarchical Approach. Journal Molecular Biology, (2000), 300,171-185. G. Raghunathan and R.L.Jernigan, Ideal architecture of residue packing and its observation in protein structures, Cambridge University, 1997, Protein Science Feng Jiao, Jinbo Xu, Libo Yu, Dale Schuurmans, Protein Fold Recognition Using the Gradient Boost Algorithm, University of Alberta, May 22 2006, WSPC http://www.ecosci.jp/amino/amino2.html http://www.ecosci.jp/amino/amino2.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.