Presentation is loading. Please wait.

Presentation is loading. Please wait.

RECOORD REcalculated COORdinates Database Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University Jurgen Doreleijers.

Similar presentations


Presentation on theme: "RECOORD REcalculated COORdinates Database Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University Jurgen Doreleijers."— Presentation transcript:

1 RECOORD REcalculated COORdinates Database Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University a.j.nederveen@chem.uu.nl Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin jurgen@bmrb.wisc.edu Wim Vranken Macromolecular Structure Database European Bioinformatics Institute wim@ebi.ac.uk

2 Aim Recalculation of protein structures based on deposited NMR restraints using state of the art methods Goals: decrease user- and software-dependent biases allow a better comparison between structures comparison between different structure calculation programs provide a database for the development and assessments of validation tools and calculation protocols

3 Overview recalculation project analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… BMRB: STAR files Doreleijers et al. 2003 BMRB: STAR files Doreleijers et al. 2003 EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files PDB: -coordinates -restraints PDB: -coordinates -restraints CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… restraint manipulation analysis recalculation design of RECOORD CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 1 2 3 4 6 5

4 Databases now publicly available DOCR/FRED (BMRB) databases containing converted and filtered restraints http://www.bmrb.wisc.edu/servlets/MRGridServlet RECOORD (EBI) database containing recalculated coordinates http://www.ebi.ac.uk/msd/recoord

5 Selection Formats (if distance restraints available) : CNS/XPLOR DIANA/DYANA/CYANA DISCOVER/MSI PDB entries selected: only proteins no HET atoms multimers allowed (not yet re-calculated) at least 20 residues Finally 545 monomers were selected BMRB: STAR files Doreleijers et al. 2003 BMRB: STAR files Doreleijers et al. 2003 PDB: -coordinates -restraints PDB: -coordinates -restraints 1 2

6 Conversion issues Data is converted to formats readable by calculation software (e.g. XPLOR/CNS and CYANA) by the FormatConverter available within CCPN software (Wim Vranken, EBI). Problems: Differences between coordinate and restraint data: e.g. 1 chain in pdb entry, 2 chains in restraint list residue numbering can differ in PDB entry and restraint list restraints for residues not present in PDB entry… Nomenclature in restraint list EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files 3

7 Violation analysis What did the authors do? Pseudo atoms Stereospecific assignments Floating chirality Calculation of distance restraints: sum, r^-6, center, … … Our method: sum averaging as implemented in CNS/XPLOR Distances might be shorter than calculated by authors OK if pseudo atom corrections are included EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files 3

8 Violation analysis Recipe for modifying restraint list 1. Swap methylene protons and methyl groups (after recalculation of all protons) in restraint list if NOE energy is lower in > 75 % of models in ensemble 2. Deassign stereospecific atoms if violating in >50% of the models more than 1Å more than 2Å 3. Remove violating restraints for recalculation? (not implemented for RECOORD) 4. Surplus check using Wattos (Doreleijers et al. 2003) EBI/UU: Generation of consistent STAR files EBI/UU: Generation of consistent STAR files 3

9 Building topology Starting script: generate_easy.inp from CNS Automated detection in original ensemble of: Disulfide bridges (<3Å S-S distance in original first models) CIS peptides (if |  |<25º in original first models) Protonation state of histidines (use CNS patches HISD, HISE) CYANA: sequence based on CNS topology Add CYSS, HIST, HIST+, cPRO in sequence Automated generation of disulfide restraints CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

10 Calculation protocol CNS Calculate 200 structures with ARIA 1.2 refine.inp: Torsion angle MD at 10,000 K, 2000 steps Torsion angle MD cooling phase 10,000-50 K, 2000 steps Cartesian space MD cooling phase 2000-1000 K, 8000 steps Cartesian space MD cooling phase 1000-50 K, 8000 steps 200 steps of restrained Powell minimization timestep CSD: 3fstimestep TAD: 3x8=24fs Standard water refinement (re_h20.inp) from ARIA for 50 best energy structures CNS runs outside ARIA CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

11 Calculation protocol CYANA All torsion angle dynamics (TAD) MD simulated annealing protocol: 2000 steps, at 9600 K 8000 steps, from 9600 K to 0 K 1000 steps conjugated gradient minimization Timestep depends on change of energy per timestep Calculate 200 structures, choose 50 best energy structures No water refinement available within CYANA Much faster than CNS CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

12 CONDOR computer cluster CS University Madison More than 800 processor used Total CPU time: 31,169 hours (  3.5 years on single workstation) Example 2EZM, calculation of 1 model (101 a.a. & 2.2 GHz P4 computer) CYANA 31 seconds CNS340 seconds CYANA -sequence -MD SA -… CYANA -sequence -MD SA -… CNS -topology -MD SA -refinement CNS -topology -MD SA -refinement 4 5

13 Evaluation of structure quality Agreement with experimental restraints Improvement? Comparison CNS and CYANA Relation NMR data quality and structural quality analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

14 Distance restraints violations ORG: 0.08 Å (0.14 Å) original entries CNW: 0.04 Å (0.05 Å) recalculated in CNS and refined in water analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 RMS distance restraints violations (Å) frequency

15 Dihedral restraints violations analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 RMS dihedral restraints violations (degrees) frequency ORG: 1.6° (4.6°) original entries CNW: 0.5° (0.5°) recalculated in CNS and refined in water

16 Results: quality indicators performance CNS vs. CYANA (no water refinement yet) Average value over 545 entries Original PDBCNS recalculation CYANA recalculation RMS distance restraints violations (Å) 0.08 ± 0.140.04 ± 0.060.04 ± 0.05 RMS dihedral restraints violations (degrees) 1.6 ± 4.60.5 ± 0.7 Packing quality (Z-score) WHATCHECK -3.5 ± 1.9-4.1 ± 1.9-4.3 ± 1.8 Bumps per 100 residues 73 ± 6311 ± 986 ± 37 % most favoured PROCHECK 69 ± 1469 ± 1361 ± 14 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

17 Results: quality indicators performance CNS before and after water refinement Average value over 545 entries Original PDBCNS recalculation CNS + water refinement RMS distance restraints violations (Å) 0.08 ± 0.140.04 ± 0.060.04 ± 0.05 RMS dihedral restraints violations (degrees) 1.6 ± 4.60.5 ± 0.70.5 ± 0.5 Packing quality (Z-score) WHATCHECK -3.5 ± 1.9-4.1 ± 1.9-2.5 ± 2.0 Bumps per 100 residues 73 ± 6311 ± 910 ± 7 % most favoured PROCHECK 69 ± 1469 ± 1376 ± 11 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

18 Improvement: packing and Ramachandran Z-scores missing data For ~ 5 % of entries no improvement possible because of missing NMR data compared to authors improvement packing improvement Ramachandran analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 Improvent Z-score:  Z=Z refined - Z original

19 In search of correlations (Pearson coefficient) data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density-0.23-0.460.350.31-0.03 RMS violations -0.110.22-0.25-0.370.58 circular variance -0.320.00-0.60-0.670.25 packing (Z-score) 0.32-0.06-0.490.69-0.39 Ramachandran (Z-score) 0.16-0.11-0.480.48-0.51 bumps0.04 0.07-0.21-0.47 original refined analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 (correlations lower) (correlations higher)

20 data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density-0.23-0.460.350.31-0.03 RMS violations -0.110.22-0.25-0.370.58 circular variance -0.320.00-0.60-0.670.25 packing (Z-score) 0.32-0.06-0.490.69-0.39 Ramachandran (Z-score) 0.16-0.11-0.480.48-0.51 bumps0.04 0.07-0.21-0.47 In search of correlations (Bumps) analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 original refined

21 data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density-0.23-0.460.350.31-0.03 RMS violations -0.110.22-0.25-0.370.58 circular variance -0.320.00-0.60-0.670.25 packing (Z-score) 0.32-0.06-0.490.69-0.39 Ramachandran (Z-score) 0.16-0.11-0.480.48-0.51 bumps0.04 0.07-0.21-0.47 In search of correlations (NMR data density) analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 original refined

22 Correlation NMR data density Ramachandran Z-score NMR data density Ramachandran Z-score r=0.31 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

23 Correlation NOE completeness and packing Z-score NMR data-based indicators cannot yield any indication of the normality of the structures NOE completeness packing Z-score r=0.20 analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

24 data density RMS violations circular variance packing (Z score) Ramachandran (Z score) bumps data density-0.23-0.460.350.31-0.03 RMS violations -0.110.22-0.25-0.370.58 circular variance -0.320.00-0.60-0.670.25 packing (Z-score) 0.32-0.06-0.490.69-0.39 Ramachandran (Z-score) 0.16-0.11-0.480.48-0.51 bumps0.04 0.07-0.21-0.47 In search of correlations (Precision) analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 original refined

25 Correlation between precision and data density analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 circular variance NMR data density r=-0.46

26 Correlation between precision and Ramachandran 1SUT analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6 Ramachandran plot appearance (Z-score) circular variance r=-0.67 Protein with high Ramachandran normality will have small circular variance

27 Correlation between RMSD and structural uncertainty (QUEEN) r=-0.69 structural uncertainty backbone RMSD (Å) Structural uncertainty imposes lower limit to the RMSD analysis -improvement? -correlations? -… analysis -improvement? -correlations? -… 6

28 Conclusions I NMR-STAR files made consistent for 545 out of ±1700 entries Protocols and scripts available for recalculation in CYANA and CNS Validation database available for testing of new protocols Improvement compared to original data: 1 standard deviation closer to X-ray db violations in original data do no limit recalculation effort refinement in water required 5 % no improvement: data missing

29 Conclusions II Correlations higher after recalculation and refinement, though most of them still weak Highest correlation: precision vs. Ramachandran score & structural uncertainty (QUEEN)

30 Acknowledgements Utrecht University Alexandre Bonvin Rob Kaptein EBI Cambridge Wim Vranken CESG/BMRB Jurgen Doreleijers Zachary Miller Eldon Ulrich John Markley Radboud University NijmegenChris Spronk Sander Nabuurs RIKEN Japan Peter Güntert Institut Pasteur Paris Michael Nilges


Download ppt "RECOORD REcalculated COORdinates Database Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University Jurgen Doreleijers."

Similar presentations


Ads by Google