Seminar series 2 Protein structure validation
Structure validation Everything that can go wrong, will go wrong, especially with things as complicated as protein structures.
What is real?
ATOM 1 N LEU ATOM 2 CA LEU ATOM 3 C LEU ATOM 4 O LEU ATOM 5 CB LEU ATOM 6 CG LEU ATOM 7 CD1 LEU ATOM 8 CD2 LEU
X-ray
‘FFT-inv’ FFT-inv And now move the atoms around till the calculated reflections best match the observed ones.
Multiple minima X-ray refinement / multiple minima
X-ray R-factor Error = Σ w.(obs-calc) 2 R-factor = Σ w.|obs-calc|
X-ray resolution
NMR data collection
From NMR spectra to structure A B If proton A and proton B are close in space, so- called ‘cross peaks appear’ in a spectrum due to the Nuclear Overhauser Effect (NOE). The NOE depends on the distance between proton A and B
From NMR spectra to structure A B The NMR data thus contains distance information! Most NOEs are between close neighbours in the sequence. Those hold little information. The ‘good’ NOEs are between atoms far away in the sequence. There are few of those, normally. H1H2Distance(Å) AB3 AC4 BD2 CD
From NMR spectra to structure The list of distances can be used in a computer simulation that is reminiscent of protein folding.
From NMR spectra to structure.. until the protein is ‘folded’.
From NMR spectra to structure NMR ‘ensemble’
Multiple minima NMR refinement
Green lines: Distance OK Red lines: Protons too far away from each other
NMR Q-factor Error = Σ NOE-violations + Energy term 2
NMR versus X-ray With X-ray you measure reflections. Each reflection holds information about each atom. With NMR you measure pair-wise distances, angles, and orientations. These all hold local information. X-ray requires crystals, and crystals cause/are artefacts. NMR is in solution, but provides much less precision.
NMR versus X-ray NMRX-ray ‘Error’ 1-2 Å Å Mobilityyesnot really Crystal artefactsnoyes Material needed20 mg1 mg Cost of hardware4 M Euronear infinite (share) Drug designnoalmost Better combine and use the best of both worlds.
More about (protein) crystallography and NMR in: Kristalstructuur Magnetische Resonantie Fourier Analyse and Structuur, Functie, Bioinformatica, of course
Why validation ? Why have we spend twenty years to search for millions of errors in the PDB?
Validation because: Everything we know about proteins comes from PDB files. Errors become less dangerous when you know about them. And, going back to the connecting thread through this series, if a template is wrong the model will be wrong.
What kind of errors can the software find? Administrative errors. Crystal-specific errors. NMR-specific errors. Really wrong things. Improbable things. Things worth looking at. Ad hoc things.
Smile or cry? A 5RXN 1.2 B 7GPB 2.9 C 1DLP 3.3 D 1BIW 2.5
Little things hurt big
X-ray specific
His, Asn, Gln ‘flips’
Hydrogen bond network
Your best check:
Contact Probability
Contact probability box A positive nitrogen around a Phe
How bad is bad? X-ray In a normal distribution, half of the points are above and half of the points are below average In a normal distribution, 68% of the points are within 1 standard deviation of the mean Less than 1 in points is more than 4 sd away from the mean 95% are within 2 sd ΔG = -RT ln K
One slide about homology modelling
How difficult can it be? 1CBQ 2.2 A
How difficult can it be?
1CBQ 2.2 A
How difficult can it be? 1CBQ 2.2 A Even if the oxygen labels would have been reversed, the so-called ‘asymmetric unit’ could also have been chosen in a much better way… This is what standard viewers show:
Errors or discoveries? Buried histidine. Warning for buried histidine triggered biochemical follow -up and new mechanism for KH-module of Vigilin. (A. Pastore, 1VIG).
Acknowledgements: Elmar Krieger Sander Nabuurs Chris Spronk Maarten Hekkelman Rob Hooft Robbie Joosten