Presentation is loading. Please wait.

Presentation is loading. Please wait.

ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Similar presentations


Presentation on theme: "ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity."— Presentation transcript:

1 ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity

2 Knowledge Base Integration Platform Query Interface Analysis/Display Applications The Aureus Pharma System

3 AurSCOPE Statistics: March 2006 PublicationsActivitiesLigands GPCR 17 250 publications including 3525 patents 635 000 152 300 Ion Channel 7 100 pub including 2 685 patents 217 600 58 400 Kinase 2 565 pub Including 1069 patents 163 700 51 800 ADME/ Drug-Drug Interactions 6 530 pub 179 000 9 100 parent compound + metabolites HERG 800 pub 14 300 3 530

4 AurQUEST Query management software for AurSCOPE Web-based application integrating ChemAxon technology Powerful Query Builder -Biological and Chemical Queries -Structural search using ChemAxon tools Efficient Navigation Different Export Formats (SDF, RDF, …)

5 Counterions MW > 700 Inorg NAS Stereo-duplicates Identical mol. but different salts … AurSCOPE database 2D unique structures 1 2 3 4 Data Preprocessing

6 11519 11519 molecules (*) (9897 uniques) Protocols: Binding or Electrophysiology Target: All Target type: Wild Parameter filter K i, EC 50, IC 50 < 300 nM (*) November 2005 AurSCOPE Ion Channels: Retrieving Active Molecules

7 AurSCOPE Ion Channels: Activity Distribution

8 Standardization of molecules. Generating Chemical Fingerprints (CF). Optimization of different CF parameters. CF-based Jarvis-Patrick clustering with various adjusted parameters. Encoding Chemical Space and Clustering

9 Parameters for Generating Hashed Chemical Fingerprints Fingerprint length - The number of bits in the bit string. - Bigger fingerprint increases the capacity for storing information on molecules. Maximum pattern length - The maximum length of atoms in the linear paths that are considered during the fragmentation of the molecule. (The length of cyclic patterns is not limited.). - Longer and more patterns hold more information on the molecule. Bits to be set for patterns - After detecting a pattern, some bits of the bit string are set to "1". The number of bits used to code patterns is constant. - Higher number of bits increases the coded information from a pattern. Darkness of the fingerprint - The percentage of "1" digits in the bit string. We consider fingerprints with more ones "darker" than those with less ones.

10 FP lengthMax #bondsMax #bitsAver. DarknessMax. Darkness 5127368.597.5 5127482.299.4 5127584.999.4 5128376.199.2 5128487.799.4 5128589.899.4 10247346.183.3 10247461.594.8 10247565.595.9 10248354.891.9 10248470.298.5 10248573.898.9 20487326.858.6 20487439.178.6 20487542.481.6 20488333.473.7 20488447.589.6 20488550.991.6 Chemical Fingerprints: Effect of Parameters

11 1. 1. For each structure, collect the set of nearest neighbors that has a dissimilarity (distance) less than a T threshold value. Two structures cluster together if they are in each others list of nearest neighbors. 2. 2. They have at least R min of their nearest neighbors in common, where R min is a ratio of the length of the shorter list. CF-based Jarvis-Patrick Clustering

12 T R min # Clusters# Singletons 0.15 0.150.29321663 0.39381663 0.49451663 0.59771663 0.16 0.160.38651499 0.59101500 0.17 0.170.38191372 0.58601373 0.18 0.180.37871238 0.58261238 0.19 0.190.37521140 0.57801141 0.20 0.200.37221051 0.57521051 Chemical fingerprint length in bits: 2048 Maximum number of bonds in patterns: 7 Maximum number of bits to set for each pattern: 5 CF-based Jarvis-Patrick Clustering

13 Similarity threshold = 0.85 (*) (*) Martin Y.C. et al. Do structurally similar molecules have similar biological activity? J. Med. Chem. 2002, 45, 4350-4358.

14 Most Populated Clusters

15 Jarvis-Patrick Clustering: missclassifications ??

16 Jarvis-Patrick Clustering: Diverse Singletons

17 Most Populated Clusters: Biological " Projection" Gamma aminobutyric acid A receptor Voltage-gated calcium channel Nicotinic acetylcholine receptorGamma aminobutyric acid A receptor Nicotinic acetylcholine receptorGamma aminobutyric acid A receptor

18 Potassium channel Gamma aminobutyric acid A receptor Voltage-gated calcium channel 5-HT 3 Nicotinic acetylcholine receptor Gamma aminobutyric acid A receptor

19 Conclusions JKlustor integrates computationally rapid and efficient clustering tools. Shortcomings to be addressed to deal with artificial singletons. Future work: combination with Maximum Common Substructure approach (LibMCS). Other algorithms (Ward,…)

20


Download ppt "ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity."

Similar presentations


Ads by Google