Seminar series 2 Protein structure validation
In 't verleden ligt het heden; in 't nu, wat worden zal. The past: Linus Pauling ‘Inventor’ of helix and strand. Inventor of Bioinformatics?! Worked on proteins.
The history of bioinformatics is proteins The future of bioinformatics is proteins Only the present is a bit confused……
Structure validation Everything that can go wrong, will go wrong, especially with things as complicated as protein structures.
What is real?
ATOM 1 N LEU ATOM 2 CA LEU ATOM 3 C LEU ATOM 4 O LEU ATOM 5 CB LEU ATOM 6 CG LEU ATOM 7 CD1 LEU ATOM 8 CD2 LEU
X-ray
‘FFT-inv’ FFT-inv
X-ray R-factor Error = Σ w.(obs-calc) 2 R-factor = Σ w.|obs-calc|
X-ray resolution
NMR data collection
NMR data NMR data consists mainly of short inter-atomic distances between atoms. We call these NOEs. Most NOEs are between close neighbours in the sequence. Those hold little information. The ‘good’ NOEs are between atoms far away in the sequence. There are few of those, normally. NOEs are known with low precision. E.g. NOEs are binned , , and
NMR Q-factor Error = Σ NOE-violations + Energy term 2
NMR versus X-ray ‘Error’ 1-2 Å Å Mobilityyesnot really Crystal artefactsnoyes Material needed20 mg1 mg Cost of hardware4 M Euronear infinite (share) Drug designnoalmost Better combine and use the best of both worlds.
Why ? Why does a sane (?) human being spend fourteen years to search for millions of errors in the PDB?
Because: Everything we know about proteins comes from PDB files. If a template is wrong the model will be wrong. Errors become less dangerous when you know about them.
What do we check? Administrative errors. Crystal-specific errors. NMR-specific errors. Really wrong things. Improbable things. Things worth looking at. Ad hoc things.
1FCC
Smile or cry? A 5RXN 1.2 B 7GPB 2.9 C 1DLP 3.3 D 1BIW 2.5
X-ray specific
Further… 4 The SCALE matrix gives a left handed axis system 26 Scale matrix represents wrong crystal class 4 Negated value in scale matrix 11Value in first row of scale matrix mistyped 10Value in second row of scale matrix mistyped 6Value in third row of scale matrix mistyped 88Determinant of MTRIX is incorrect 195Warning: New symmetry found 62Warning: MTRIX is not a pure rotation matrix 165Warning: Duplicate atoms encountered. 57Error: Threonine nomenclature problem 324Error: Weights outside the range 709Error: Weights outside the range 520Error: Decreasing residue numbers 362Error: Water clusters without contacts 10973Warning: Water molecules need moving
Further, further… 1599Error: B-factor over-refinement 901Error: Atoms too close to symmetry axes 21090Error: Abnormally short interatomic distances 169Note: No Van der Waals overlaps 9100Warning: Unusual bond lengths 8214Warning: Possible cell scaling problem 18458Warning: Unusual bond angles 2515Error: Ramachandran Z-score very low 15408Warning: Omega angles too tightly restrained 4987Error: Side chain planarity problems 780Warning: Inside/Outside residue distribution 12684Warning: Backbone oxygen evaluation 18612Error: HIS, ASN, GLN side chain flips
Little things hurt big
How bad is bad?
Errors or discoveries? Buried histidine. Warning for buried histidine triggered biochemical follow -up and new mechanism for KH-module of Vigilin. (A. Pastore, 1VIG).
Contact Probability
DACA
Contact probability box
Using contact probability
His, Asn, Gln ‘flips’
Where are the protons?
Hydrogen bond network
15% should be flipped
Your best check:
How difficult can it be? 1CBQ 2.2 A
How difficult can it be?
Progress A Chirality B Bond length C Planarity D Bond angle
Progress E Water island F Bond angle G Atom on axis H Chain name
Progress Chi-1 vs Chi 2 Ramachandran Structures at 1.8 – 2.0 A
Conclusions Everything that could go wrong has gone wrong. Errors are on a ‘sliding scale’. Error detection can detect a lot, but surely not everything (yet).
Acknowledgements: Rob Hooft Elmar Krieger Sander Nabuurs Chris Spronk Robbie Joosten Maarten Hekkelman