Fragment Assembly method Mika Takata. outline  Fragment Assembly  Basic theory  Process  Techniques  David Baker’s group approaches  Other top ranked.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Thomas Blicher Center for Biological Sequence Analysis
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structural Prediction. Protein Structure is Hierarchical.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Representations of Molecular Structure: Bonds Only.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Ab Initio Methods for Protein Structure Prediction CS882 Presentation, by Shuai C., Li.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
JM - 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Meller Jarek Meller Division.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.
Protein Structure Prediction Graham Wood Charlotte Deane.
Rosetta Steven Bitner. Objectives Introduction How Rosetta works How to get it How to install/use it.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Challenges and accomplishments in molecular prediction Yanay Ofran.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
University of Washington
Modelling the rice proteome
Protein Structures.
3-Dimensional Structure
Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick  Biophysical Journal 
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Presentation transcript:

Fragment Assembly method Mika Takata

outline  Fragment Assembly  Basic theory  Process  Techniques  David Baker’s group approaches  Other top ranked approaches at CASP7  Discussion

Fragment theory  Short fragments(5~9 residues) have tendency to have specific conformation  These tendency is repeated in the structure of proteins  Sequences with particular short structural motif is somewhat similar. [Unger et al., 1989][Rooman st al., 1990]  Protein can be “re-constructed” by using fragments excised from other proteins in library [Jones and Thirup, 1986, Claessems et al. 1989]

Appendix; Other basics  Levinthal’s paradox (Levinthal 1968)  Local structural bias  Pauling and Corey (1951); Rooman et al. 1990; Bystroff etal. 1996; han et al. 1997; Bystroff and Baker 1998; Camproux et al. 1999; Gerard 1999; Li et al  Recurrent sequence pattern  HMMSTR (Bystroff et al. 2000) : probabilistic version of a structural motif library  Ramachandran basin technique: both φ and ψ angle intervals into six ranges

FA: Fragment Assembly – basic theory  High  David Baker’s team developed  Assemble consecutive short fragments  Applied for low identical protein (less than 30%) Fragment extraction Choose several candidates Consecutive about 10 residues Assemble local structures Whole structure Lowest Free energy sampling Global optimization

Building blocks library  How many data set do we need to cover all protein structure?  78% of all hexamer structures in test group were covered by a library of 81 hexamers [Unger et al, 1989].  What size of fragment data do we need to use?  A fixed size  Most studies use 6 amino-acids[Unger et al]  Others use 5~8  9 is greater than other fragment length of less than 15 amino acids[Bystroff et al., 1996]  Combination of length 3,9, and 12[Baker et al, 1998]  Adjusted size  Library of “natural” building blocks  3~4 to 10~12

Library of building blocks  Polypeptide chain was represented by a sequence of rigid fragments and concatenated without any degree of freedom [Koloduy et at; 2002]  The quality of total conformation depends on i. The length (f) of the utilized fragment ii. The size (s) of the library Complexity of a library =  2.9Å RMSD (2.7 complexity, f=7) ~ 0.76Å (15 complexity, f=5)

Overlapping manner Using library Concatenating building blocks in an overlapping manner  superimposition; one block is fused to other one  The level of “superimposability”; how well matching  Too low; two fragments in query do not belong together  Too high; two fragments are connected in a rigid manner, which means the chain are not flexible enough to reconstruct the overall conformation

concatenation Non-local interactions I.Knowledge-based protein functions derived from the protein database II.Potential functions based on chemical intuition Residue to All-atom conformation

 Backbone evaluation  Naϊve approach[Unger et al., 1989]  Fragments clustering & clustering algorithm[Bonneau et al., 2001]  Global evaluation I. Approach based on graph algorithm II. Optimization algorithm such as Monte Carlo and Genetic Algorithm[Unger and Moult, 1993, Pedersen and Moult, 97a, 97b, Yadgari et al, 1998] Backbone structure Simple Energy function high Full all-atom models All-atom energy evaluation Best low C-alpha conformation search Non-local interaction

Baker’s method I (top of Free Modeling) CASP 7  Low resolution fragment assembly ( backbone structure) + full chain refinement  1.sampling  Query + its homology (max 30 sequences)  of short fragments  Long-range Beta Strand Pairing based on Secondary structure prediction  2. all-atom energy function  High computational Power :  Simple topology targets with about 100 residues are treatable  Good Secondary Prediction -> High accuracy of targets with about 100 residues (<3Å)

Baker’s method II - Constructing main-chain structure Overlapping:All-atom refinement  Two atoms rejected within 2.5Å  Metropolis criterion[eq.(2)] Nearest neighbors of a segment demonstrate the structure of the sequence mapping around the segment [Han & Baker, 1995] Reliable even without knowledge of the true structure [Yi & Lander, 1993] 25 nearest neighbors used [Baker et al, 1997] [eq.(1)] fragment i-1 fragment i fragment i+1

equation The nearest neighbor (1) (2) All-atom refinement

Baker’s method III –overlapping Side-chain refinement  Expected neighbor density around each residue  The number of atoms of other residues within 10 Å of the atom of the residue

Baker’s method IV ; All-atom refinement  CASP7 Improvement  Main chain accuracy is not widely different, but the final conformation is greatly improved Problem  500k CPU hours per domain  140k computers with performance of 37TFLOPS  Long protein is not treatable  All-atom energy landscape is rugged

Top of CASP7 (FM section) GroupMethod BakerROSETTA (FA) ZhangI-TASSER (FA, Replica Exchange, Lattice Model) Zhang-serverI-TASSER (FA, Replica Exchange, Lattice Model) SBCServer results (Meta Selector) POEM-REFINEROSETTA(FA), Full-atom Refinement GeneSiliceROSETTA(FA) ROBETTAROSETTA(FA) ROKKOSimFold + FA Jones-UCLFRAGFOLD (FA) SAM-T06Frag finder, Undertaker (FA) TASSERFA, Replica Exchange, Lattice Model CASP7 2 sections: Template Based Modeling (TBM), Free Modeling (FM) Baker, Zhang, Zhang-server predominated in both sections

CASP 7 I-TASSER protocol (top of template-based modeling)  Various lengths  1~2 days for a sequence to submit a final prediction  ~4 Å (TBA), ~11Å (FA) RMSD [8]

Summary  Fragment assembly method simplify protein folding problem  Not require a new structure for a query, but select the correct parts to be fit in building the accurate conformation  Local compactness is considered by using known data  Baker’s high success  all-atom refinement by using high computational power Problems  High Computational cost performance  Computational distribution, ex.  Sampling methods..

To improve..  Fragment Assembly  How to choose fragments  where to cut and separate  what is the optimal length  How to constraint  Competitive learning?  Scoring function  Cf. statistic potential energy, Bayesian scoring function.. 19

Reference 1. Ron Unger, THE BUILDING BLOCK APPROACH TO PROTEIN STRUCTURE PREDICTION, The New Avenues in Bioinformatics, 2004, ; Kim T. Simons, Charles Kooperberg, Enoch Huang and David Baker, Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions, J. Mol. Biol. (1997) 268, Shuai Cheng Li, Dongbo Bu, Jinbo Xu, and Ming Li, Fragment-HMM: A new approach to protein structure prediction, Protein Science (2008), 17: Vladimir Yarov-Yarovoy, Jack Schonbrun, and David Baker, Multipass Membrane Protein Structure Prediction Using Rosetta, Proteins March 1; 62(4): Rhiju Das and David Baker, Prospects for de novo phasing with de novo protein models, Biological Crystallography ISSN Arthur M. Lesk, Loredana Lo Conte, and Tim J.P. Hubbard, Assessment of Novel Fold Targets in CASP4: Predictions of Three-Dimensional Structures, Secondary Structures, and Interresidue Contacts, Proteins: Structure, Function, and Genetics Suppl 5: (2001) 7. David Baker, CASP 7 ; David Baker, CASP 7 ; 8. Y. Zhang, I-TASSER; Y. Zhang, I-TASSER;

Previous approach and experiments by using Fragment Assembly

Main backbone  Need to improve main bone structure ( Cα conformation)  Need to apply FA theory  Need to use classification  SCOP: Class, Fold, superfamily, family, domain level 22

Cα conformation prediction Remote homology profiling ・ PSI-BLAST profile Classification ・ Fold, superfamily, Family Level SCOP Fragment assembly 5~11 residues 23

Previous Searching approach based on FA 24

From previous experiment…  Target  1aa2 (108 residues)  Fragment  7 amno-acids fragment  Classification  Family, superfamily level  training data: e-value low hit10  Global scoring function  HCF: Hydrophobic compactness function 25

result (1) 26 dRMS( Å ) Family level classification Lattice modelLow energy Top 10 All FA sets (68) Cubic lattice FCC lattice BestmeanSDBestmeanmaxSD

result(2) 27 dRMS( Å ) Family + Superfamily level Low energy Top 10 All FA sets (68) BestmeanSDBestmeanmaxSD

Appendix(ii) -Experiment –all atom-  purpose  All-atom complexity  Data  10 relatively small data  Lattice Model potential energy function  Scoring function based on chemical features  Hydrophilic, Hydrophobicity, Electric charge, tendency of side-chain and electric charge  Accuracy measurement  RMSD

Face Centered Cubic Lattice Model  Nearest neighbor : 12 residues  Nearest real model -> considering space among residues difficulties  Accuracy of the Model  How to evaluate energy; energy function  How to search optimization 29

Appendix(ii) – lattice to all-atom result Protein (PDB id) Size dRMS ( Å ) Cubic(main chain) FCC(main chain) ※ ref.)(main chain) All-atom ※ ref.) (All-atom) 1 alg ku aa beo ctf dkt-A fca fgp jer nkl average

Discussion  Classification should be applied to improve accuracy  To choose fragment data  Accuracy of Energy function 31

Reference  Yu Xia, Enoch S.Huang, Michael Levitt and Ram Samudrala, Ab Initio Construction of Protein Tertiary Structures Using a Hierarchical Approach. Journal Molecular Biology, (2000), 300,  G. Raghunathan and R.L.Jernigan, Ideal architecture of residue packing and its observation in protein structures, Cambridge University, 1997, Protein Science  Feng Jiao, Jinbo Xu, Libo Yu, Dale Schuurmans, Protein Fold Recognition Using the Gradient Boost Algorithm, University of Alberta, May , WSPC 