AMPLE – Using de novo or ab initio protein structure modelling techniques to create and enhance search models for use in Molecular Replacement Jaclyn Bibby,

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Molecular Replacement
Secondary structure prediction from amino acid sequence.
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
SAN DIEGO SUPERCOMPUTER CENTER Blue Gene for Protein Structure Prediction (Predicting CASP Targets in Record Time) Ross C. Walker.
Protein Structure Prediction using ROSETTA
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,
Bayesian Classification of Protein Data Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Bioinformatics Ayesha M. Khan Spring 2013.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
28 th March 2007 MrBUMP – Automated Molecular Replacement Ronan Keegan, Martyn Winn CCP4, Daresbury Laboratory.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
AMPLE – Using de novo protein structure modelling techniques to create and enhance search models for use in Molecular Replacement Figure taken from:
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Representations of Molecular Structure: Bonds Only.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study George Chikenji*, Yoshimi Fujitsuka, and Shoji Takada*
Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
MrBUMP – Molecular Replacement with Bulk Model Preparation Automated search model discovery and preparation for structure solution by molecular replacement.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
Rosetta Steven Bitner. Objectives Introduction How Rosetta works How to get it How to install/use it.
CCP4 Molecular Replacement Model Generation Create a CCP4i task for generating Molecular Replacement models. - Selecting suitable PDB entries, based on.
CCP4 Study Weekend 2013 “Molecular Replacements”
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Tertiary Structure Prediction Structural Bioinformatics.
What does the future hold? SAPHIRE CCP4 libraries Program Developments More automation 3D viewer Project CCP4 Study Weekend 2003 BAR!
Stony Brook Integrative Structural Biology Organization
Protein Structure Visualisation
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Prediction of Protein Structure and Function on a Proteomic Scale
Protein dynamics Folding/unfolding dynamics
Rosetta: De Novo determination of protein structure
Homology Modeling.
Automated Molecular Replacement
Volume 20, Issue 3, Pages (March 2012)
Yan Xia, Axel W. Fischer, Pedro Teixeira, Brian Weiner, Jens Meiler 
The site to download BALBES:
Yang Zhang, Jeffrey Skolnick  Biophysical Journal 
Presentation transcript:

AMPLE – Using de novo or ab initio protein structure modelling techniques to create and enhance search models for use in Molecular Replacement Jaclyn Bibby, Jens Thomas, Olga Mayans and Daniel Rigden Institute of Integrative Biology Ronan Keegan and Martyn Winn Collaborative Computational Project 4 (CCP4)

ab initio modelling of proteins for molecular replacement AMPLE ab initio modelling of proteins for molecular replacement Joint development by CCP4 and the University of Liverpool AMPLE is a comprehensive project to assess the suitability of using cheaply obtained ab initio models in molecular replacement An additional goal of the project is to make AMPLE into an automated software tool that can be made generally available to potential users through CCP4

Ab initio structure prediction Ab initio (or de novo) structure prediction is the prediction of a target structure fold based purely on its sequence information Methods have greatly improved in recent years with the aid of the CASP experiments (Critical Assessment of Protein structure prediction) Some examples are: Rosetta I-TASSER QUARK

Ab initio structure prediction 1000’s of “Decoys” assembled of fragments from PDB structures Decoys are clustered and centroid representatives of largest cluster are considered candidate fold predictions Side chains added to selected decoys Refinement under a more realistic physics-based force field Initial fragment assembly stage requires relatively modest computing power Refinement stage can require supercomputing resources

Ab initio structure prediction 1000’s of “Decoys” assembled of fragments from PDB structures Decoys are clustered and centroid representatives of largest cluster are considered candidate fold predictions Side chains added to selected decoys Refinement under a more realistic physics-based force field Initial fragment assembly stage requires relatively modest computing power Refinement stage can require supercomputing resources

Ab initio modelling and Molecular Replacement Combining this method with molecular replacement can be a powerful technique for solving the phase problem in cases where there are no obvious homologous structures available Two approaches have been taken All-atom modelling to produce single search models of maximum completeness and accuracy (Qian et. al, 2007) Solved 1/3 of test set of 30 targets (Das et. al, 2009) Computationally expensive Taking cheaply obtained decoys from the initial fragment assembly step and preparing them has been shown to produce successful MR search models (Rigden et al. 2008, Caliandor 2009)

Molecular Replacement Synergies ab initio modelling Molecular Replacement Produces clusters of similar model structures Works effectively with superposed ensembles approximating the target Within and between clusters, similarity indicates accuracy  can trim inaccurate regions leaving more reliable core May only require a partial model Fast modelling is polyAla only Side chains are often (partially) removed for MR

The AMPLE Pipeline Uses Rosetta to perform ab initio modelling and the generation of decoy models Can also accept models generated externally Currently designed for < 120 residues and resolution better than 2.2 Å (but may work outside these restrictions e.g. transmembrane, coiled-coil proteins)

Decoy Generation Decoys generated first with Rosetta Quark has also been used during development Typical number of decoys required is 1000 but this can be varied In easier cases as few as 50 decoys can be sufficient

Decoy Clustering Clustering using SPICKER to identify the most likely fold for the target A large top cluster is usually indicative of a correct prediction A subset of decoys (max. 200) closest to the centroid of each of the largest 3 clusters are selected for further processing

Decoy Clustering Each cluster is then structurally aligned using the maximum likelihood algorithm implemented in Theseus (Theobold & Wuttke, 2006) This helps to identify structurally conserved regions Gives a variance score which can later be used to guide truncation

Ensemble Truncation We’ve found that success or failure in the molecular replacement step is highly sensitive to the accuracy of the search model Sampling many degrees of truncation with different levels of side chain inclusion is essential High variance regions are cut away in steps to give a set of ensemble models

Further processing These truncated clusters are further processed and side chains are added to give a large set of search models for molecular replacement

Molecular replacement Molecular replacement is performed using MrBUMP from the CCP4 suite which automates the procedure Search models are processed by both Phaser and Molrep Post molecular replacement, positioned search models are refined using Refmac5 to get an initial indication of success or failure C-alpha tracing with SHELXE, model building with Buccaneer, ARP/wARP

Testing Test set of 295 small proteins (40-120 residues) from the PDB Structure factor data also available Resolution of 2.2 Å or better Single molecule in the asymmetric unit Mixture of all-α, all-β and mixed α-β secondary structure 1000 decoys generated for each case using Rosetta Information from any homologues was excluded from the fragment generation step

Assessing Solutions Initially we used Reforigin to compare solutions with the deposited structures More stringent method: attempt to rebuild the structures SHELXE: partial CC of >25% & average fragment length of 10 or more Further confirmation provided through building with ARP/wARP and Buccaneer.

Using these guidelines, 126 successes out of 295 (~43%) were achieved These are solutions that could be successfully traced in SHELXE Other well positioned solutions existed but could not be traced. These may be possible to solve through manual model building

Results based on secondary structure type Overall success rate: all-α = 80%; all-β = 2%; mixed α-β = 37%

Variance and Truncation Variability between decoys in each cluster corresponds to their deviation from the deposited native structure 2P5K example: C-terminal region predicted as least reliably modelled portion by Theseus alignment variance score

Search model ensemble size/truncation

Running Times Average times for complete run (decoy generation, preparation, MR and chain tracing) was 2 CPU days A parallelised version of the code making use of Sun Grid Engine for batch farming of model generation and molecular replacement significantly speeds up the process. Results can be achieved in less than 1 hour

Exploiting distant homologues a. Clustered decoy models, b. Truncated ensemble, c. Positioned MR solution, d. Shelxe c-alpha trace, e. Completed structure

Remodelling related NMR structures or distant homologues Can provide AMPLE with a template for the target which could be a related NMR structure or a distant homologue AMPLE will use Rosetta to “re-model” this template to something that should in theory be closer to the target

Transmembrane Proteins Experimentally very difficult to work with/crystallise Represent ~30% of all proteins Make up < 3% of structures in the Protein Data Bank Presence in the membrane constrains their shape, so they can be easier to model MR with ab initio transmembrane models hasn't been tried yet extra cellular (aqueous) transmembrane region (hydrophobic) intra cellular (aqueous) Image: http://en.wikiversity.org/wiki/File:Cytochrome_C_Oxidase_1OCC_in_Membrane_2.png

Selected 18 transmembrane proteins: 23 – 249 residues 1.45 – 2.5A resolution 7 clear successes 5 possible successes 223 residue structure (3GD8) could be largest ever solved with ab initio modelling.

Selected 18 transmembrane proteins: 23 – 249 residues 1.45 – 2.5A resolution 7 clear successes 5 possible successes 223 residue structure (3GD8) could be largest ever solved with ab initio modelling.

Coiled-coil targets Difficult to solve in MR even with good homologues α-helical nature makes them suitable targets for AMPLE Initial testing has been very promising with 80% success rate Some novel structures have also been solved

AMPLE Availability Beta version available as part of CCP4 6.3.0. Improved and more robust version to be released as part of CCP4 6.4.0. Requires installation of several non-ccp4 packages: Rosetta, SHELXE, Theseus, SPICKER, Maxcluster Future versions will have a reduced number of dependencies

Documentation available from http://ccp4wiki.org

Summary AMPLE is a pipeline designed to prepare cheaply obtained decoy models from ab initio modelling for use as search models in molecular replacement Results show that the method works well for smaller proteins particularly those containing α-helical secondary structure Tests were limited to structures of 120 residues in length but has worked for cases up to 250 residues New avenues – NMR, Homolgue remodelling Several real successes Currently available as a beta-version in CCP4 6.3.0

Acknowledgements Jaclyn Bibby, Daniel Rigden, Jens Thomas, University of Liverpool Olga Mayans, University of Liverpool Martyn Winn, Daresbury Laboratory Andrea Thorn, Tim Gruene & George Sheldrick (SHELX) Developers of Rosetta and Quark Refmac: Garib Mushudov, LMB-MRC Cambridge Molrep: Alexei Vagin & Andrey Lebedev Phaser: Randy Read, Airlie McCoy & Gabor Bunkozci Thanks to authors of all underlying programs Funding: BBSRC Support from CCP4 & the Research Complex at Harwell Poster: MS04-12 (Rootes Building)