RECOORD REcalculated COORdinates Database Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University Jurgen Doreleijers.

Slides:



Advertisements
Similar presentations
Refinement of a pdb-structure and Convert A. Search for a pdb with the closest sequence to your protein of interest. B. Choose the most suitable entry.
Advertisements

Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Solving NMR Structures II: Calculation and evaluation What NMR-based (solution) structures look like the NMR ensemble inclusion of hydrogen coordinates.
CCPNmr Analysis – from spectrum to structure and more Victoria A. Higman Leibniz-Institut für Molekulare Pharmakologie, Berlin and School of Chemistry,
Lab Meeting 06/05/20051 NMRQ: Quality Assessment and Validation for Protein Structures Generated by NMR Spectroscopy Gary Van Domselaar
Analysis of the Quality of NMR Protein Structures With A Structure Calculated From Your NMR Data, How Do You Determine the Accuracy and Quality of the.
CING Validation Tools for Biomolecular NMR Structures Jurgen F. Doreleijers & Geerten W. Vuister Protein Biophysics, IMM & CMBI Radboud University Nijmegen.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
The MEMOPS Programming Framework Wayne Boucher, Cambridge
Solving NMR structures II: Calculation and evaluation The NMR ensemble Methods for calculating structures distance geometry, restrained molecular dynamics,
Seminar series 2 Protein structure validation. In 't verleden ligt het heden; in 't nu, wat worden zal. The past: Linus Pauling ‘Inventor’ of helix and.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Physics of Protein Folding. Why is the protein folding problem important? Understanding the function Drug design Types of experiments: X-ray crystallography.
Behaviour of velocities in protein folding events Aldo Rampioni, University of Groningen Leipzig, 17th May 2007.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Summary What is CCPN? What approach are we taking and why? What are (some of) the technical details? Software team –Cambridge (Rasmus Fogh, Tim Stevens)
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structure determination 01/24/05 NMR.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Automating Steps in Protein Structure Determination by NMR CS April 13, 2009.
Computer Simulation of Biomolecules and the Interpretation of NMR Measurements generates ensemble of molecular configurations all atomic quantities Problems.
Biomolecular Nuclear Magnetic Resonance Spectroscopy FROM ASSIGNMENT TO STRUCTURE Sequential resonance assignment strategies NMR data for structure determination.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
A Technical Introduction to the MD-OPEP Simulation Tools
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
NMRQ: A Web Server for the Validation, Comparison and Analysis of Protein Structures Solved by NMR Gary Van Domselaar †, Paul Stothard, Trent Bjorndahl,
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
 Our mission Deploying and unifying the NMR e-Infrastructure in System Biology is to make bio-NMR available to the scientific community in.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
EBI is an Outstation of the European Molecular Biology Laboratory. Validation & Structure Quality.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC Molecular Dynamics Method 2 Justin Gullingsrud.
Protein NMR Part II.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Seminar series 2 Protein structure validation. Structure validation Everything that can go wrong, will go wrong, especially with things as complicated.
RAPPER Nick Furnham Blundell Group – Department of Biochemistry Cambridge University UK
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
How NMR is Used for the Study of Biomacromolecules Analytical biochemistry Comparative analysis Interactions between biomolecules Structure determination.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Computational Structure Prediction
Title: How to determine the solution structure of murine epidermal growth factor by NMR Spectroscopy Hong Liu.
Extra Tree Classifier-WS3 Bagging Classifier-WS3
1. Pure Protein (0.3 mL, mM; ~ 10 mg)
Axel T Brünger, Paul D Adams, Luke M Rice  Structure 
Volume 15, Issue 9, Pages (September 2007)
Increased Reliability of Nuclear Magnetic Resonance Protein Structures by Consensus Structure Bundles  Lena Buchner, Peter Güntert  Structure  Volume.
Volume 20, Issue 3, Pages (March 2012)
Ligand Binding to the Voltage-Gated Kv1
Volume 20, Issue 2, Pages (February 2012)
Combining Efficient Conformational Sampling with a Deformable Elastic Network Model Facilitates Structure Refinement at Low Resolution  Gunnar F. Schröder,
Unmasking the Annexin I Interaction from the Structure of Apo-S100A11
Basic procedure for MD simulations
Gydo C.P. van Zundert, Adrien S.J. Melquiond, Alexandre M.J.J. Bonvin 
Protein structure prediction
Volume 15, Issue 6, Pages (June 2007)
Presentation transcript:

RECOORD REcalculated COORdinates Database Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin Wim Vranken Macromolecular Structure Database European Bioinformatics Institute

Aim Recalculation of protein structures based on deposited NMR restraints using state of the art methods Goals: decrease user- and software-dependent biases allow a better comparison between structures comparison between different structure calculation programs provide a database for the development and assessments of validation tools and calculation protocols

Overview recalculation project analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… BMRB: STAR files Doreleijers et al BMRB: STAR files Doreleijers et al EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files PDB: -coordinates -restraints PDB: -coordinates -restraints CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… restraint manipulation analysis recalculation design of RECOORD CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement

Databases now publicly available DOCR/FRED (BMRB) databases containing converted and filtered restraints RECOORD (EBI) database containing recalculated coordinates

Selection Formats (if distance restraints available) : CNS/XPLOR DIANA/DYANA/CYANA DISCOVER/MSI PDB entries selected: only proteins no HET atoms multimers allowed (not yet re-calculated) at least 20 residues Finally 545 monomers were selected BMRB: STAR files Doreleijers et al BMRB: STAR files Doreleijers et al PDB: -coordinates -restraints PDB: -coordinates -restraints 1 2

Conversion issues Data is converted to formats readable by calculation software (e.g. XPLOR/CNS and CYANA) by the FormatConverter available within CCPN software (Wim Vranken, EBI). Problems: Differences between coordinate and restraint data: e.g. 1 chain in pdb entry, 2 chains in restraint list residue numbering can differ in PDB entry and restraint list restraints for residues not present in PDB entry… Nomenclature in restraint list EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files 3

Violation analysis What did the authors do? Pseudo atoms Stereospecific assignments Floating chirality Calculation of distance restraints: sum, r^-6, center, … … Our method: sum averaging as implemented in CNS/XPLOR Distances might be shorter than calculated by authors OK if pseudo atom corrections are included EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files 3

Violation analysis Recipe for modifying restraint list 1. Swap methylene protons and methyl groups (after recalculation of all protons) in restraint list if NOE energy is lower in > 75 % of models in ensemble 2. Deassign stereospecific atoms if violating in >50% of the models more than 1Å more than 2Å 3. Remove violating restraints for recalculation? (not implemented for RECOORD) 4. Surplus check using Wattos (Doreleijers et al. 2003) EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files 3

Building topology Starting script: generate_easy.inp from CNS Automated detection in original ensemble of: Disulfide bridges (<3Å S-S distance in original first models) CIS peptides (if |  |<25º in original first models) Protonation state of histidines (use CNS patches HISD, HISE) CYANA: sequence based on CNS topology Add CYSS, HIST, HIST+, cPRO in sequence Automated generation of disulfide restraints CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

Calculation protocol CNS Calculate 200 structures with ARIA 1.2 refine.inp: Torsion angle MD at 10,000 K, 2000 steps Torsion angle MD cooling phase 10, K, 2000 steps Cartesian space MD cooling phase K, 8000 steps Cartesian space MD cooling phase K, 8000 steps 200 steps of restrained Powell minimization timestep CSD: 3fstimestep TAD: 3x8=24fs Standard water refinement (re_h20.inp) from ARIA for 50 best energy structures CNS runs outside ARIA CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

Calculation protocol CYANA All torsion angle dynamics (TAD) MD simulated annealing protocol: 2000 steps, at 9600 K 8000 steps, from 9600 K to 0 K 1000 steps conjugated gradient minimization Timestep depends on change of energy per timestep Calculate 200 structures, choose 50 best energy structures No water refinement available within CYANA Much faster than CNS CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

CONDOR computer cluster CS University Madison More than 800 processor used Total CPU time: 31,169 hours (  3.5 years on single workstation) Example 2EZM, calculation of 1 model (101 a.a. & 2.2 GHz P4 computer) CYANA 31 seconds CNS340 seconds CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

Evaluation of structure quality Agreement with experimental restraints Improvement? Comparison CNS and CYANA Relation NMR data quality and structural quality analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

Distance restraints violations ORG: 0.08 Å (0.14 Å) original entries CNW: 0.04 Å (0.05 Å) recalculated in CNS and refined in water analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 RMS distance restraints violations (Å) frequency

Dihedral restraints violations analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 RMS dihedral restraints violations (degrees) frequency ORG: 1.6° (4.6°) original entries CNW: 0.5° (0.5°) recalculated in CNS and refined in water

Results: quality indicators performance CNS vs. CYANA (no water refinement yet) Average value over 545 entries Original PDBCNS recalculation CYANA recalculation RMS distance restraints violations (Å) 0.08 ± ± ± 0.05 RMS dihedral restraints violations (degrees) 1.6 ± ± 0.7 Packing quality (Z-score) WHATCHECK -3.5 ± ± ± 1.8 Bumps per 100 residues 73 ± 6311 ± 986 ± 37 % most favoured PROCHECK 69 ± 1469 ± 1361 ± 14 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

Results: quality indicators performance CNS before and after water refinement Average value over 545 entries Original PDBCNS recalculation CNS + water refinement RMS distance restraints violations (Å) 0.08 ± ± ± 0.05 RMS dihedral restraints violations (degrees) 1.6 ± ± ± 0.5 Packing quality (Z-score) WHATCHECK -3.5 ± ± ± 2.0 Bumps per 100 residues 73 ± 6311 ± 910 ± 7 % most favoured PROCHECK 69 ± 1469 ± 1376 ± 11 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

Improvement: packing and Ramachandran Z-scores missing data For ~ 5 % of entries no improvement possible because of missing NMR data compared to authors improvement packing improvement Ramachandran analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 Improvent Z-score:  Z=Z refined - Z original

In search of correlations (Pearson coefficient) data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density RMS violations circular variance packing (Z-score) Ramachandran (Z-score) bumps original refined analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 (correlations lower) (correlations higher)

data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density RMS violations circular variance packing (Z-score) Ramachandran (Z-score) bumps In search of correlations (Bumps) analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 original refined

data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density RMS violations circular variance packing (Z-score) Ramachandran (Z-score) bumps In search of correlations (NMR data density) analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 original refined

Correlation NMR data density Ramachandran Z-score NMR data density Ramachandran Z-score r=0.31 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

Correlation NOE completeness and packing Z-score NMR data-based indicators cannot yield any indication of the normality of the structures NOE completeness packing Z-score r=0.20 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density RMS violations circular variance packing (Z-score) Ramachandran (Z-score) bumps In search of correlations (Precision) analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 original refined

Correlation between precision and data density analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 circular variance NMR data density r=-0.46

Correlation between precision and Ramachandran 1SUT analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 Ramachandran plot appearance (Z-score) circular variance r=-0.67 Protein with high Ramachandran normality will have small circular variance

Correlation between RMSD and structural uncertainty (QUEEN) r=-0.69 structural uncertainty backbone RMSD (Å) Structural uncertainty imposes lower limit to the RMSD analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

Conclusions I NMR-STAR files made consistent for 545 out of ±1700 entries Protocols and scripts available for recalculation in CYANA and CNS Validation database available for testing of new protocols Improvement compared to original data: 1 standard deviation closer to X-ray db violations in original data do no limit recalculation effort refinement in water required 5 % no improvement: data missing

Conclusions II Correlations higher after recalculation and refinement, though most of them still weak Highest correlation: precision vs. Ramachandran score & structural uncertainty (QUEEN)

Acknowledgements Utrecht University Alexandre Bonvin Rob Kaptein EBI Cambridge Wim Vranken CESG/BMRB Jurgen Doreleijers Zachary Miller Eldon Ulrich John Markley Radboud University NijmegenChris Spronk Sander Nabuurs RIKEN Japan Peter Güntert Institut Pasteur Paris Michael Nilges