Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fragment Assembly method Mika Takata. outline  Fragment Assembly  Basic theory  Process  Techniques  David Baker’s group approaches  Other top ranked.

Similar presentations


Presentation on theme: "Fragment Assembly method Mika Takata. outline  Fragment Assembly  Basic theory  Process  Techniques  David Baker’s group approaches  Other top ranked."— Presentation transcript:

1 Fragment Assembly method Mika Takata

2 outline  Fragment Assembly  Basic theory  Process  Techniques  David Baker’s group approaches  Other top ranked approaches at CASP7  Discussion

3 Fragment theory  Short fragments(5~9 residues) have tendency to have specific conformation  These tendency is repeated in the structure of proteins  Sequences with particular short structural motif is somewhat similar. [Unger et al., 1989][Rooman st al., 1990]  Protein can be “re-constructed” by using fragments excised from other proteins in library [Jones and Thirup, 1986, Claessems et al. 1989]

4 Appendix; Other basics  Levinthal’s paradox (Levinthal 1968)  Local structural bias  Pauling and Corey (1951); Rooman et al. 1990; Bystroff etal. 1996; han et al. 1997; Bystroff and Baker 1998; Camproux et al. 1999; Gerard 1999; Li et al. 2008  Recurrent sequence pattern  HMMSTR (Bystroff et al. 2000) : probabilistic version of a structural motif library  Ramachandran basin technique: both φ and ψ angle intervals into six ranges

5 FA: Fragment Assembly – basic theory  High evaluation @CASP  David Baker’s team developed  Assemble consecutive short fragments  Applied for low identical protein (less than 30%) Fragment extraction Choose several candidates Consecutive about 10 residues Assemble local structures Whole structure Lowest Free energy sampling Global optimization

6 Building blocks library  How many data set do we need to cover all protein structure?  78% of all hexamer structures in test group were covered by a library of 81 hexamers [Unger et al, 1989].  What size of fragment data do we need to use?  A fixed size  Most studies use 6 amino-acids[Unger et al]  Others use 5~8  9 is greater than other fragment length of less than 15 amino acids[Bystroff et al., 1996]  Combination of length 3,9, and 12[Baker et al, 1998]  Adjusted size  Library of “natural” building blocks  3~4 to 10~12

7 Library of building blocks  Polypeptide chain was represented by a sequence of rigid fragments and concatenated without any degree of freedom [Koloduy et at; 2002]  The quality of total conformation depends on i. The length (f) of the utilized fragment ii. The size (s) of the library Complexity of a library =  2.9Å RMSD (2.7 complexity, f=7) ~ 0.76Å (15 complexity, f=5)

8 Overlapping manner Using library Concatenating building blocks in an overlapping manner  superimposition; one block is fused to other one  The level of “superimposability”; how well matching  Too low; two fragments in query do not belong together  Too high; two fragments are connected in a rigid manner, which means the chain are not flexible enough to reconstruct the overall conformation

9 concatenation Non-local interactions I.Knowledge-based protein functions derived from the protein database II.Potential functions based on chemical intuition Residue to All-atom conformation

10  Backbone evaluation  Naϊve approach[Unger et al., 1989]  Fragments clustering & clustering algorithm[Bonneau et al., 2001]  Global evaluation I. Approach based on graph algorithm II. Optimization algorithm such as Monte Carlo and Genetic Algorithm[Unger and Moult, 1993, Pedersen and Moult, 97a, 97b, Yadgari et al, 1998] Backbone structure Simple Energy function high Full all-atom models All-atom energy evaluation Best low C-alpha conformation search Non-local interaction

11 Baker’s method I (top of Free Modeling) CASP 7  Low resolution fragment assembly ( backbone structure) + full chain refinement  1.sampling  Query + its homology (max 30 sequences)  of short fragments  Long-range Beta Strand Pairing based on Secondary structure prediction  2. all-atom energy function  High computational Power : ROSETTA@Home  Simple topology targets with about 100 residues are treatable  Good Secondary Prediction -> High accuracy of targets with about 100 residues (<3Å)

12 Baker’s method II - Constructing main-chain structure Overlapping:All-atom refinement  Two atoms rejected within 2.5Å  Metropolis criterion[eq.(2)] Nearest neighbors of a segment demonstrate the structure of the sequence mapping around the segment [Han & Baker, 1995] Reliable even without knowledge of the true structure [Yi & Lander, 1993] 25 nearest neighbors used [Baker et al, 1997] [eq.(1)] fragment i-1 fragment i fragment i+1

13 equation The nearest neighbor (1) (2) All-atom refinement

14 Baker’s method III –overlapping Side-chain refinement  Expected neighbor density around each residue  The number of atoms of other residues within 10 Å of the atom of the residue

15 Baker’s method IV ; All-atom refinement  CASP7 Improvement  Main chain accuracy is not widely different, but the final conformation is greatly improved Problem  500k CPU hours per domain  140k computers with performance of 37TFLOPS  Long protein is not treatable  All-atom energy landscape is rugged

16 Top of CASP7 (FM section) GroupMethod BakerROSETTA (FA) ZhangI-TASSER (FA, Replica Exchange, Lattice Model) Zhang-serverI-TASSER (FA, Replica Exchange, Lattice Model) SBCServer results (Meta Selector) POEM-REFINEROSETTA(FA), Full-atom Refinement GeneSiliceROSETTA(FA) ROBETTAROSETTA(FA) ROKKOSimFold + FA Jones-UCLFRAGFOLD (FA) SAM-T06Frag finder, Undertaker (FA) TASSERFA, Replica Exchange, Lattice Model CASP7 2 sections: Template Based Modeling (TBM), Free Modeling (FM) Baker, Zhang, Zhang-server predominated in both sections

17 CASP 7 I-TASSER protocol (top of template-based modeling)  Various lengths  1~2 days for a sequence to submit a final prediction  ~4 Å (TBA), ~11Å (FA) RMSD [8]

18 Summary  Fragment assembly method simplify protein folding problem  Not require a new structure for a query, but select the correct parts to be fit in building the accurate conformation  Local compactness is considered by using known data  Baker’s high success  all-atom refinement by using high computational power Problems  High Computational cost performance  Computational distribution, ex. Rosetta@home  Sampling methods..

19 To improve..  Fragment Assembly  How to choose fragments  where to cut and separate  what is the optimal length  How to constraint  Competitive learning?  Scoring function  Cf. statistic potential energy, Bayesian scoring function.. 19

20 Reference 1. Ron Unger, THE BUILDING BLOCK APPROACH TO PROTEIN STRUCTURE PREDICTION, The New Avenues in Bioinformatics, 2004, 177-188; http://www.springerlink.com/content/h63474928680757x/ http://www.springerlink.com/content/h63474928680757x/ 2. Kim T. Simons, Charles Kooperberg, Enoch Huang and David Baker, Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions, J. Mol. Biol. (1997) 268, 209- 225 3. Shuai Cheng Li, Dongbo Bu, Jinbo Xu, and Ming Li, Fragment-HMM: A new approach to protein structure prediction, Protein Science (2008), 17: 1925-1934 4. Vladimir Yarov-Yarovoy, Jack Schonbrun, and David Baker, Multipass Membrane Protein Structure Prediction Using Rosetta, Proteins. 2006 March 1; 62(4): 1010- 1025. 5. Rhiju Das and David Baker, Prospects for de novo phasing with de novo protein models, Biological Crystallography ISSN 0907-4449 6. Arthur M. Lesk, Loredana Lo Conte, and Tim J.P. Hubbard, Assessment of Novel Fold Targets in CASP4: Predictions of Three-Dimensional Structures, Secondary Structures, and Interresidue Contacts, Proteins: Structure, Function, and Genetics Suppl 5:98-118 (2001) 7. David Baker, CASP 7 ; http://www.cs.nott.ac.uk/CFNJC/slides/Presentation_Pawel_JC_04-2008.pdf David Baker, CASP 7 ; http://www.cs.nott.ac.uk/CFNJC/slides/Presentation_Pawel_JC_04-2008.pdf 8. Y. Zhang, I-TASSER; http://zhang.bioinformatics.ku.edu/I-TASSER/ Y. Zhang, I-TASSER; http://zhang.bioinformatics.ku.edu/I-TASSER/

21 Previous approach and experiments by using Fragment Assembly

22 Main backbone  Need to improve main bone structure ( Cα conformation)  Need to apply FA theory  Need to use classification  SCOP: Class, Fold, superfamily, family, domain level 22

23 Cα conformation prediction Remote homology profiling ・ PSI-BLAST profile Classification ・ Fold, superfamily, Family Level SCOP Fragment assembly 5~11 residues 23

24 Previous Searching approach based on FA 24

25 From previous experiment…  Target  1aa2 (108 residues)  Fragment  7 amno-acids fragment  Classification  Family, superfamily level  training data: e-value low hit10  Global scoring function  HCF: Hydrophobic compactness function 25

26 result (1) 26 dRMS( Å ) Family level classification Lattice modelLow energy Top 10 All FA sets (68) Cubic lattice FCC lattice BestmeanSDBestmeanmaxSD 14.098.681.560.8700.8860.208.4920.965.98

27 result(2) 27 dRMS( Å ) Family + Superfamily level Low energy Top 10 All FA sets (68) BestmeanSDBestmeanmaxSD 1.560.8700.8860.205.8321.05.98

28 Appendix(ii) -Experiment –all atom-  purpose  All-atom complexity  Data  10 relatively small data  Lattice Model potential energy function  Scoring function based on chemical features  Hydrophilic, Hydrophobicity, Electric charge, tendency of side-chain and electric charge  Accuracy measurement  RMSD

29 Face Centered Cubic Lattice Model  Nearest neighbor : 12 residues  Nearest real model -> considering space among residues difficulties  Accuracy of the Model  How to evaluate energy; energy function  How to search optimization 29

30 Appendix(ii) – lattice to all-atom result Protein (PDB id) Size dRMS ( Å ) Cubic(main chain) FCC(main chain) ※ ref.)(main chain) All-atom ※ ref.) (All-atom) 1 alg 249.448.5810.33 1ku57019.1116.2420.41 1aa210814.098.686.0616.1212.09 1beo9813.8211.526.3615.8912.01 1ctf6812.589.405.4514.199.20 1dkt-A7213.7510.665.5915.6210.98 1fca5511.188.345.1612.309.00 1fgp7012.378.205.9814.0211.16 1jer11015.6410.997.5316.9013.79 1nkl7812.689.785.7014.6110.13 average 13.4710.24 5.98 15.0411.05

31 Discussion  Classification should be applied to improve accuracy  To choose fragment data  Accuracy of Energy function 31

32 Reference  Yu Xia, Enoch S.Huang, Michael Levitt and Ram Samudrala, Ab Initio Construction of Protein Tertiary Structures Using a Hierarchical Approach. Journal Molecular Biology, (2000), 300,171-185.  G. Raghunathan and R.L.Jernigan, Ideal architecture of residue packing and its observation in protein structures, Cambridge University, 1997, Protein Science  Feng Jiao, Jinbo Xu, Libo Yu, Dale Schuurmans, Protein Fold Recognition Using the Gradient Boost Algorithm, University of Alberta, May 22 2006, WSPC  http://www.ecosci.jp/amino/amino2.html http://www.ecosci.jp/amino/amino2.html


Download ppt "Fragment Assembly method Mika Takata. outline  Fragment Assembly  Basic theory  Process  Techniques  David Baker’s group approaches  Other top ranked."

Similar presentations


Ads by Google