Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes
Mogul Knowledge base of molecular geometry information taken from CSD Bond length, valence angle and torsion angle distributions Aim: click on a molecular parameter of interest and get observed distribution with no intervening steps
Mogul - Search Setup User loads a molecule then specifies a bond length, bond angle or torsion angle, of interest
Mogul - Results Substructure
Mogul - Search Algorithm Substructures stored in a hierarchical tree: BC AD Properties of B,C Properties of A-B & C-D bonds Properties of atoms bound to B and C
Mogul - Getting More Hits Allow certain atoms to be more general Generification rules
Mogul - Generic Search Results Substructures sorted by 2D similarity with original query
IsoStar and SuperStar IsoStar - knowledge base of information about intermolecular interactions SuperStar - program for predicting binding points in an enzyme active site SuperStar predictions based solely on IsoStar data
IsoStar Scatterplots
CSD vs. PDB scatterplots Similarity index distribution for 72 comparisons
IsoStar Density Surfaces
Scaling of IsoStar Surfaces Densities of grid point i are converted to propensities by: Average density is the density of contacts expected by random chance:
SuperStar Calculate binding positions for specific probe atoms in protein active sites Identify functional groups in binding-site Look up relevant IsoStar scatterplots and overlay on functional groups Contour - combining by taking products + =
SuperStar - Example Map OH
SuperStar Features Cavity detection Surface or pharmacophore point display Metal coordination Hyperlinking to IsoStar scatterplots Choice of CSD- or PDB-based maps Gaussian fits
SuperStar Validation 265 PDB complexes Generate four maps (Me, C=O, NH, OH) See whether maps discriminate correctly, e.g. does Me have highest propensity where a ligand Me group is observed? Compute percentage success rate CSD 74% PDB75% Gaussian CSD % PDB maps fuzzier, fewer probes possible Gaussian 4-5 times faster
Relibase+ Protein-ligand database system Based on original software developed by Manfred Hendlich and colleagues at Merck and Marburg University Enables searching of PDB and of in-house proprietary databases
Some Relibase+ Options Text searching Sequence searching 2D substructure and similarity searching 3D substructure searching Logical combination of hit lists Searching for intermolecular interactions Auto-superposition of similar binding sites Scripting facility based on Python
Analysis of 3D Queries Distance Distribution Torsion Distribution Benzamidine-Carboxylate Interactions
Binding Site Superposition
Example Python Script # Find all benzamidines # and check contacts to ASP under 3Å relibase.load(’dbase1') ba = relibase.Hitlist({'smiles':'c1ccccc1C(=N)N'}) new = relibase.Hitlist() for ligand in ba: for chain in ligand.contacts(): for residue in chain.residues(): if residue.name() == 'ASP': ligatoms = ligand.atoms() resatoms = residue.atoms() d = mindist(ligatoms,resatoms) if d < 3.0: new.append(ligand) new.saveas(’contact')
Acknowledgements Manfred Hendlich Gerhard Klebe Ingo Dramburg Andreas Bergner Ian Bruno Jason Cole Paul Edgington Magnus Kessler Jie Luo Clare Macrae Patrick McCabe Willem Nissink Jon Pearson Scott Rowland Barry Smith Marcel Verdonk