Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

Slides:



Advertisements
Similar presentations
Refinement of a pdb-structure and Convert A. Search for a pdb with the closest sequence to your protein of interest. B. Choose the most suitable entry.
Advertisements

Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
Atom 1 (centre) Atom 2 (centre) Joint face in both atoms polyhedra Fig. 3. Voronoi face between two atoms;it lies midway in between Analysis of atom-atom.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Overview of Key ICM Features NIBR - Emeryville July
1 Solid Form Control and Design through Structural Informatics Ghazala Sadiq Karachi, IYCr South Asia Summit Meeting, 2014.
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Insight into Molecular Geometry and Interactions using Small Molecule Crystallographic Data John Liebeschuetz Cambridge Crystallographic.
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Automatic Detection of Poor or Incorrect Single Crystal Structures A.L.Spek Utrecht University The Netherlands.
Docking of Protein Molecules
Chem Thermal Ellipsoids Remember that thermal ellipsoids can indicate problems with a refinement even when the R factors seem to indicate a reasonable.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
FLEX* - REVIEW.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Structure Validation in Chemical Crystallography Principles and Application Ton Spek, National Single Crystal Service Facility, Utrecht University SAB-Delft,
Protein Structure and Drug Discovery Workshop To be held at Monash University, Mebourne, Australia October 3 rd to 4 th 2006 Molecular Visualization Learn.
LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (3) Chen Yu Zong.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Protein Interfaces, Surfaces and Assemblies
Recommendations and Questions wwPDB/CCDC/D3R Ligand Validation Workshop Center for Integrative Proteomics Research, Rutgers 7/30-31/2015 Group D, Academic.
Protein Tertiary Structure Prediction
Module 2: Structure Based Ph4 Design
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Copyright OpenHelix. No use or reproduction without express written consent1.
EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-PISA a web based service for understanding Protein Interfaces, Surfaces and Assemblies.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
PDBe Protein Interfaces, Surfaces and Assemblies
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
Getting the Most out of the PDBe
Protein Structure Prediction and Protein Homology modeling
Crystal structure determination
Virtual Screening.
An Integrated Approach to Protein-Protein Docking
Protein structure prediction.
Ligand Binding to the Voltage-Gated Kv1
Volume 85, Issue 5, Pages (May 1996)
Presentation transcript:

CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre 12 Union Rd., Cambridge, UK

Assessment and Comparison of Ligand – Protein Structural Models For the Crystallographer –What is wrong with my model? –What interesting features or differences with related structures can I highlight in my publication? For the Molecular Modeller –What is wrong with the Crystallographer’s model? –What interesting features or differences with related structures can I use to inform my structure-based drug design campaign ? –Are there non-homologous structures with similar features that I need to watch out for?

Why can’t I take a structure from the PDB and just use it ? Validation of ligand structures bound to proteins 15% of 100 recent PDB entries have ligand geometry that are almost certainly in significant error (in house analysis using Relibase+/Mogul) Pre

How much ligand strain is accomodated by the protein? Accepted View –Many ligands adopt strained conformation when bound to proteins, some (60%) do not bind even in a local minimum conformation. ( Perola & Charifson, J. Med. Chem. 2004, 47, ) Alternative view – Ligands usually (but not always) bind in a local minimum. Many ‘strained’ structures found in the PDB are imperfectly refined. ( Open-Eye, B. Kelley and G. Warren, EuroCYP )

CCDC Tools that can help you Relibase/Relibase+ - Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB) –Relibase is freely available for academics –Relibase+ has extra features (some of these will be used in this workshop) The Cambridge Structural Database System - Database of > 400,000 small molecule crystallographic structures, and associated query software –Mogul and IsoStar knowledge-bases of molecular geometry and inter- molecular interactions –Directly linked access from Relibase+

The Workshop Part 1: Validation of models and structural analysis Analysing a protein structure for errors and interesting features Comparing a structure with structures related by homology or by functionality Part 2: Probing the Protein-Ligand Interface Substructure searching in Relibase/Relibase+ Comparing the interactions of different ligands with the same target Validating an unusual interaction using substructure searching in Relibase+

Relibase+ –Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB) –Successor to ReLiBase (developed by Manfred Hendlich et al. (Merck, Marburg U.) M. Hendlich, Acta Cryst. D54, , 1998 Relibase: free on WWW for academics – –

Relibase+ Keyword searching FASTA protein sequence searching 2D substructure searching 3D protein-ligand interaction searching Protein-protein interaction searching Similarity searching for ligands SMILES substructure matching Automatic superposition of related binding sites to compare ligand binding modes, water positions, etc. 3D visualisation with AstexViewer and ReliView(Hermes) Basic Functionality

Relibase+ Functionality for generation and search of proprietary databases of protein-ligand complexes alongside the PDB Links to the Mogul and IsoStar modules of the CSDS for geometry validation Additional modules: Crystal packing, WaterBase, CavBase Detailed analysis of superimposed binding sites Enhanced treatment of hitlists Reliscript: Command-line access via a Python-based toolkit Coming Soon: SecBase including Turn Classification Advanced Functionality

CavBase Detect unexpected similarities amongst protein cavities (e.g. active sites) that share little or no sequence homology. Similarity judged by matching 3D property descriptors (pseudocentres) that encode the shape and chemical characteristics of each cavity No sequence information used, can detect similar cavities even if they have no obvious secondary-structure relationship Developed by S.Schmitt et al., J.Mol.Biol. (2002) CavBase

Cambridge Structural Database Repository for the world’s small organic and metal-organic crystal structures (up to 500 non-H atoms) Experimentally determined 3D structures via X-ray, and neutron diffraction methods 2007 release contains 423,798 entries –approximately 32,000 entries added per year Derived from around 1200 published sources –official depository for >80 major journals –majority of data directly deposited electronically (CIF) Increasing number of Private Communications

How much Data is Available? CSD Growth ,768 entries June 2007 Growth of the CSD Predicted Growth to 2010 >500,000 entries during 2009

CSD Information content Atomic coordinates, unit-cell, space-group symmetry (fully validated) Crystal structure data

Bibliographic and Chemical Information Bibliographic and chemical text and properties (all searchable) 4-Oxonicotinamide-1- (1’-beta-D-2’,3’,5’-tri-O-acetyl-ribofuranoside) Source: Rothmannia longiflora Colour: pale yellow Habit: acicular Polymorph: Form IV C17 H20 N2 O9 G. Bringmann, M. Ochse, K. Wolf, J. Kraus, K. Peters, E-M. Peters, M. Herderich, L. Ake, F. Tayman Phytochemistry 51 (1999), p271 R-factor:.0506 Chemical diagram and chemical connectivity to enable 2D and 3D searching for substructures, pharmacophores and intermolecular interactions Cross-referencing between entries CSD Information content

Cambridge Structural Database System CambridgeStructuralDatabase PreQuest Database Production VISTA Statistical analysis Mercury Graphical display, packing analysis ConQuest Database Search Mogul Library of Molecular Geometry IsoStar Library of Intermolecular Interactions Knowledge Bases

Mogul A Knowledge Base of Molecular Geometries Bruno et al., J. Chem. Inf. Comput. Sci., 44, , 2004

 Incorporates pre-computed libraries of bond lengths, valence angles and torsion angles, derived entirely from the CSD  Sketch or import molecule, then click on feature of interest to view distribution, mean values and statistics  Very fast search speeds, with hyperlinks to the CSD to view specific structures  Complete geometry: retrieve distributions for all bonds, angles and torsions in the molecule Mogul Rapid access to CSD information

A Knowledge Base of Intermolecular Interactions Experimental data from: –Cambridge Structural Database –Protein Data Bank (protein-ligand complexes only) –Theoretical potential energy minima (DMA, IMPT) Interaction distributions displayed immediately as scatterplots or contour surfaces >20,000 CSD scatterplots, >5,500 PDB, 1,500 E minima IsoStar

central group: -CONH 2 contact group: NH IsoStar Methodology Search CSD or PDB for structures containing desired contact Superimpose hits and display as scatterplots

Density Maps Can also represent distribution as density maps

The Workshop Part 1: Validation of models and structural analysis Analysing a protein structure for errors and interesting features Comparing a structure with structures related by homology or by functionality Part 2: Probing the Protein-Ligand Interface Substructure searching in Relibase/Relibase+ Comparing the interactions of different ligands with the same target Validating an unusual interaction using substructure searching in Relibase+

How to access the workshop s1mple Webpage address Password

Cavity Detection PROTEIN Based on the LIGSITE Program M.Hendlich et al., J.Mol.Graph. (1997).

The pseudo-centre concept donor acceptor aliphatic pi/aromatic Coding Molecular Recognition into Simple Descriptors

Cavity Protein 3D Property Description

Similarity Search

Similarity Search Clique detection Bron-Kerbosch

Similarity Search Clique detection Bron-Kerbosch

Similarity Analysis Scoring based on matching pseudo- centres, and the associated surface patches

An Example 1OXO/1F2D Overlay of PLP ligands Matching pseudo-centres and surface patches shown

Crystal Packing Important e.g. when docking ligands Concanavalin A (1cjp) Binding site in Relibase+

1mtw reference ligand, no packing reference in green, first-rank solution atom-coloured

1mtw, Packing Included reference ligand, no packing including neighbouring chains GOLD’s first-rank solution