X-ray Validation Package Present status Swanand Gore PDBe D&A meeting : 21-Oct-2010
VTF recommendations Model-based indicators – Covalent geometry (E&H) outliers – Protein backbone (Ramachandran) and sidechains (rotamericity, flips) outliers – RNA backbone (atypical suites) – Carbohydrates chirality and naming – Ligands Features not observed in high-quality small-molecule xtal structures and other instances in PDB – Packing Bad vdw clashes Underpacking, voids Unusual contacts Unsatisfied hbond donors, acceptors
VTF recommendations Data-based indicators – Wilson plot – Data anisotropy plot – Twinning (Padilla Yeates plot) – Mislabelling of amplitudes / intensities – Translational NCS – Missed symmetry Data and model based indicators – R, Rfree Reproducibility and difference – Real-space R Per-residue measure of fit with 2FoFc map, normalized per residue type
VTF recommendations Percentile scores – Per criterion, calculate the percentile rank against the whole set of X-ray entries and also against structures in its resolution bin – Update the percentiles periodically
VTF recommendations Presentation of results for various consumers – Depositors (and annotators) – Reviewers Concise PDF report highlighting any unusual features – End-users (experts and non-experts) Web-based frontends with adjustable level of detail – Developers Webservices and XML files
VTF recommendations Validation package – Be open-source and freely distributable wwPDB sites, labs, companies – Import/wrap existing 3 rd party functionality EDS (Uppsala), Molprobity, CCDC Mogul, WhatIf Phenix, CCP4 RosettaHoles, pdb-care, DACA, ProSA – Calculate recommended validation metrics and publish XML file per entry – Present XML contents in various kinds of reports
Prototypes – Validation Viewer Entry viewer Residue and maps viewer Raw data and plots of phi-psi, omega, chi, B-factor, occupancy, RSR, RSCC
New ligand-validation functionality Mogul is a chemical mining engine developed by CCDC for small-molecule xtal structures in CSD – Splits query molecule into bond, angle, torsion and ring substructures – Finds comparable substructures from high-quality small-mol structures in CSD Compares query substructures against CSD distributions – Bonds, angles: Z scores can be computed – Torsions: Z-score is undefined but gives an idea where a torsion lies w.r.t. distribution – Rings: computes query ring’s torsion RMSD against each comparable CSD ring, finds mean, stdev of tRMSDs to estimate a Z score for ring
Prototypes – Mogul webservice Distribution for the angle from Mogul 2D & 3D views of ligand Bonds, angles, torsions, rings with comparable CSD fragments Upload or select a ligand
Validation package (installed on each site) mmCIF under deposition D&A API Validation XML file (Data, Percentiles) Distributions Calculator (Runs yearly) Distributions Oracle Database (Time-stamped by year) Distributions Webservice (if DB only at PDBe) D&A Webservers D&A clients Released Validation XML file D&A pipeline on all sites wwPDB sites (PDBe - ?) Public Access
Validation XML Contents – Administrative Version of validation package and various 3 rd party programs Creation date Distribution database version – Hierarchy of validation XML for data Entry (id) – Model (id) » Chain (id) Residue (seqnum, icode, resname, gri) Atom (name, altcode, gai) – Annotations Level (e.g. chain), identifier (chain_id), attributes Supports modular development of validation package as annotations can be appended as and when new wrapper modules are ready
Example annotations Atom-level – clashes Residue-level – Average B factor, occupancy – Phi-psi, Rama outliers – Sidechain flips, rotamer outliers – RNA backbone and pucker values Atom-group-level – Covalent bond-length and angles outliers Chain-level – WhatIf Rama score, average RSR, NCS deviation Entry-level – Rfree, Clash-score – twinning, tNCS, anisotropy, fit to ideal Wilson plot
Summary VTF recommendations will be implemented in a validation package. The package will consist of modules which import/wrap 3 rd party functionality. The package will be open-source and freely distributable. A process for periodically updating distributions and validation XMLs will be implemented.