Structure validation Everything that can go wrong, will go wrong, especially with things as complicated as protein structures. Everything that can go wrong, will go wrong, especially with things as complicated as protein structures.
Why ? Why does a sane (?) human being spend fourteen years to search for millions of errors in the PDB?
Because: Everything we know about proteins comes from PDB files. If a template is wrong the model will be wrong. Errors become less dangerous when you know about them.
What do we check? Administrative errors. Crystal-specific errors. NMR-specific errors. Really wrong things. Improbable things. Things worth looking at. Ad hoc things.
How difficult can it be?
Contact Probability
DACA
DACA
DACA
DACA
DACA
Contact probability box
Using contact probability
Other errors
His, Asn, Gln ‘flips’
Where are the protons?
Hydrogen bond network
Hydrogen bond force field
A typical case: 5TIM
15% should be flipped
Little things hurt big
Improbable things
Your best check:
Planarity
How wrong is wrong?
Conclusions Everything that could go wrong has gone wrong. Errors are on a ‘sliding scale’. Error detection can detect a lot, but surely not everything (yet).
Acknowledgements: Rob Hooft