Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University
Workshop format First section –Learn how to use the RCSB PDB Validation Server and evaluate the results Second section –Follow the RCSB PDB Validation Server tutorial for a particular structure (with a partner) –Answer questions about the structure and evaluate it
Why assess structure quality? Over 26,000 PDB structures (August 2004) –Many structures have multiple entries (redundancy) –Some structures are represented only once Is the structure well determined? –Is the overall or detailed information derived from it reliable?
How to assess structure quality? RCSB PDB Validation Server No single standard (magic number) that all structures must meet to be of acceptable quality Quality is dependent on –experimental data –method of structure solution –agreement with available geometrical and chemical standards
Steps to validate structural data 1.Download mmCIF coordinate & sf file 2.Upload to the RCSB PDB Validation Server 3.Validate 4.Evaluate the reports (think!)
1. Feng Z, Westbrook J, Berman HM.(1998) NUCheck: Rutgers University, New Brunswick, NJ. Report No.: NDB Laskowski, R.A., McArthur, M.W., Moss, D.S., et al. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26: Vaguine A.A., Richelle J., Wodak S.J. (1999) SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D55: Validation Reports Contain Close contacts Bond and angle deviations Chirality errors Sequence/coordinate alignment Missing atoms or residues Distant waters NUCheck 1, PROCHECK 2, and SFCHECK 3 reports
What is the overall quality of a structure? Dependent on: –The size of the structure –The resolution and R-factor –Your evaluation of the validation letter –Your evaluation of the Ramachandran plot –Your evaluation of the SFCHECK report –What you’re studying
Acknowledgements The Protein Data Bank (PDB) is operated by –Rutgers, The State University of New Jersey –San Diego Supercomputer Center at the University of California, San Diego –Center for Advanced Research in Biotechnology/UMBI/NIST The RCSB PDB is supported by funds from – National Science Foundation (NSF) – National Institute of General Medical Sciences (NIGMS) – Office of Science, Department of Energy (DOE) – National Library of Medicine (NLM) – National Cancer Institute (NCI) – National Center for Research Resources (NCRR) – National Institute of Biomedical Imaging and Bioengineering (NIBIB) – National Institute of Neurological Disorders and Stroke (NINDS) The worldwide PDB (wwPDB) is a collaboration between –RCSB –MSD/EBI –PDBj