Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine) New CCP4 project “Protein Interfaces” (E. Krissinel)
Coordinate support in CCP4 Old FORTRAN coordinate- related applications not using RWBrook (42%) Own coordinate functions Refmac group of programs Own coordinate functions Old FORTRAN coordinate- related applications using RWBrook (33%) New C & C++ coordinate- related applications (a few) Clipper Molecular Graphics Coot RWBrook emulator MMDB (C++ Coordinate Library) MMDB (C++ Coordinate Library) SSM DNA group Own coordinate functions other Own coordinate functions
CCP4 Coordinate Library (MMDB) Manager Interface API PDB file One or more C++ classes mmCIF fileBinary file Model Header Cryst Sequence Model Residue Atom Chain Residue Atom C++ class hierarchy PDB/mmCIF support Database features ~600 interface functions Emulate RWBrook Wealth of retrieval, selection, transformation and edit tools User-defined data Built-in high-level functionality (contacts, alignment, superposition etc.) Monomer database SWIG interface Stable and documented E. Krissinel et.al. (2004) Acta Cryst. D
Approximately 40% of CCP4 suite now uses a common set of coordinate functions provided by MMDB. This should help greatly in maintenance and adaptation to possible format changes. Conversion of older FORTRAN applications, which are not using RWBrook, to MMDB, in most cases means a complete rewriting. This does not seem to be necessary at the moment. All on-going developments in FORTRAN seem to be using their own coordinate functions and libraries. MMDB delivers all its power only in C++ interface. Most of MMDB functionality cannot be expressed in traditional FORTRAN terms. Should we encourage new coordinate developments in C/C++ using MMDB? - shift away from FORTRAN thinking. New coordinate-related CCP4 projects - MG, Coot, SSM and Protein Interfaces - are all based on MMDB and that seems to be an advantage for the projects. General remarks
PIAS Protein Interactions, Assemblies and Searches E. Krissinel CCP4 - EBI/MSD project
PIAS Project goals Develop a tool and publicly available interactive service to aid solution of different tasks that involve structural and chemical analysis of protein interactions, such as prediction of oligomeric states analysis of structure-function relationship analysis and prediction of protein interactions search for interface homologues active site recognition and analysis protein surface analysis structure specificity analysis other Project started in 2004.
Interactive Web server provisional parts, subject to progress and feasibility Crystal interfaces Interface calculations, analysis, scoring & biological significance Interfaces & structure similarity searches Interface fingerprinting Applied studies (e.g. discovery of multispecific proteins) Active site recognition Docking Procedures for CCP4 MG Prediction of interfaces Prediction of oligomeric states (PQS-3) Interfaces & surface similarity searches PIAS Project overview PIAS database
Crystal interfaces Interface calculations, analysis, scoring & biological significance Interfaces & structure similarity searches Interface fingerprinting Applied studies (e.g. discovery of multispecific proteins) Active site recognition Docking Procedures for CCP4 MG Prediction of interfaces Prediction of oligomeric states (PQS-3) Interfaces & surface similarity searches PIAS Project schedule PIAS database
PIAS Database Interface is defined as area that becomes inaccessible to solvent upon complex formation Databased properties for interfacing structures: Contains interfaces between polypeptides found in all PDB entries: all crystal contacts for X-ray entries and chain contacts for NMR entries. Also contains predicted protein assemblies. Interface area per residue (+ selection of interfacing atoms and residues) Number of atoms and residues involved Solvation energy gain (per residue) and P-value of hydrophobic patches List of potential hydrogen bonds and salt bridges Complexation significance score Databased properties for interfaces: Size, weight Solvent accessible area per residue (+ selection of surface atoms and residues) Databased properties for assemblies: Composition, chemical formula List of engaged interfaces Transformation matrices Solvation energy gain Solvent accessible and buried surface area Dissociation pattern and barrier Solvation energy per residue SSM data for structure search Structure and sequence alignment PIAS database
Existing tools for the calculation of quaternary structures Prediction of oligomeric states (PQS-3) PQS MSD (Kim Henrick) (PQS-1) Prediction of oligomeric states Method: recursive splitting of the largest complexes allowed by crystal symmetry. Termination criteria is derived from the individual statistical scores of crystal contacts. The results are not curated. PITA Thornton group EBI (Hannes Ponstingl) (PQS-2) Method: progressive built-up by addition of monomeric chains that suit the selection criteria. The results are partly curated.
Graph-chemical approach Crystal is represented as a periodic graph of monomers (a “supermolecule”) All possible assemblies that obey the symmetry criteria are recursively enumerated as subgraphs covering all the crystal Only sets of chemically stable assemblies are left as an answer: Prediction of oligomeric states Prediction of oligomeric states (PQS-3)
Success rate obtained on a benchmark set of 212 structures (H. Ponstingl) PQS MSD78%(not optimised on the benchmark set) PITA software84%(optimised with 18 parameters) PIAS89%(optimised with 8 parameters, underfit) Early results outside the benchmark set indicate some prevalence of PIAS, however the actual differences may be less significant. Prediction of oligomeric states Prediction of oligomeric states (PQS-3)
Prediction of oligomeric states PQS may be predicted only up to a certain level of confidence. It seems that 85-90% of correct predictions may be reached. Main reasons for why 100% success rate can never be achieved: theoretical models for protein affinity and entropy change upon complexation are primitive coordinate (experimental) data are of limited accuracy there is no feasible way to take conformation changes into account experimental data on multimeric states is very limited and not always reliable - calibration of parameters is difficult assemblies may exist in some environments and dissociate in other - a definitive answer is simply not there
Questions to answer Searching the PIAS database for structurally similar interfaces and interfaces between similar structures Interfaces & structure similarity searches What interfaces are formed by structures similar to the given one(s) in PDB What are the interface partners of a given structure in PDB What is the relation between sequence and biological (complexation) significance of the interface (function) What PQS may be formed by structures similar to the given one(s) and how the PQS may depend on the sequence Is a given structure interaction-specific and/or multispecific Interfaces and structure similarity searches
A preliminary version of the MSD protein interaction service is set up at The version includes: Calculations for uploaded files or database retrievals on PDB Id code of Solvent-Accessible Surface area Crystal contacts / interfaces Protein interface parameters and scoring Interface area Solvation energy gain Hydrogen bonds and salt bridges Hydrophobic P-value Biological relevance score Selection of interfacing residues and atoms Protein Quaternary Structures Interface and structure searches in protein interface database derived from PDB Visualisation of the structures, interfaces and PQS PIAS web server
PIAS web server
PIAS web server
PIAS web server
3gcb hexamerDissociation of 3gcb hexamer
Concluding remarks The PIAS software is almost ready for first release. It may be released in 2 months time after catching up with on-line help and documentation minor cleaning and re-design of output pages enhancement of structural search options further entropy calibration to increase accuracy of PQS prediction Further work will concentrate on surface calculation and analysis surface / active sites searches possibly docking additions to and improvements of existing functions (based on users’ feedback and own needs)