Chemical Similarity – An overview Dr. Nina Jeliazkova.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Lipinski’s rule of five
ChE 553 Lecture 25 Theory Of Activation Barriers 1.
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Periodicity of Atomic Properties Elements in the same group have the same number of valence electrons and related electron configurations; hence have similar.
Quantitative Structure- Activity Relationships (QSAR)
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.
Pharmacophore and FTrees
Molecular Modeling Part I Molecular Mechanics and Conformational Analysis ORG I Lab William Kelly.
Protein Tertiary Structure Prediction
Application of Toxicology Databases in Drug Development (Estimating potential toxicity) Joseph F. Contrera, Ph.D. Director, Regulatory Research and Analysis.
Molecular Descriptors
1. Ionic Compounds They are formed by the transfer of one or more valence electrons from one atom to another Electropositive atoms: give up electrons.
Ch 23 pages Lecture 15 – Molecular interactions.
Science and Technology of Nano Materials
Review of methods to assess a QSAR Applicability Domain Joanna Jaworska Procter & Gamble European Technical Center Brussels, Belgium and Nina Nikolova.
Similarity Methods C371 Fall 2004.
Leiden University. The university to discover. Enhancing Search Space Diversity in Multi-Objective Evolutionary Drug Molecule Design using Niching 1. Leiden.
Toxmatch - a tool to assess chemical similarity
Chem 1140; Molecular Modeling Molecular Mechanics Semiempirical QM Modeling CaCHE.
Ch 9 pages Lecture 23 – The Hydrogen Atom.
Copyright © 2005 Pearson Education, Inc. publishing as Benjamin Cummings Chapter 2 The Chemical Context of Life.
High Throughput Experimentation: Computational Requirements John M. Newsam Molecular Simulations Inc. (A Pharmacopeia subsidiary) “Workshop on Combinatorial.
“Topological Index Calculator” A JavaScript application to introduce quantitative structure-property relationships (QSPR) in undergraduate organic chemistry.
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter &
SAR vs QSAR or “is QSAR different from SAR”
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Lecture 16 – Molecular interactions
Selecting Diverse Sets of Compounds C371 Fall 2004.
McKim Conference on Predictive Toxicology
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
Quantum Mechanics/ Molecular Mechanics (QM/MM) Todd J. Martinez.
Atomic Structure The theories of atomic and molecular structure depend on quantum mechanics to describe atoms and molecules in mathematical terms.
Review Session BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
4. Molecular Similarity. 2 Similarity and Searching Historical Progression Similarity Measures Fingerprint Construction “Pathological” Cases MinMax- Counts.
Identification of structurally diverse Growth Hormone Secretagogue (GHS) agonists by virtual screening and structure-activity relationship analysis of.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Physical Behavior of Matter Review. Matter is classified as a substance or a mixture of substances.
SMA5422: Special Topics in Biotechnology Lecture 11: Computer aided drug design: QSAR approach. SMA5422: Special Topics in Biotechnology Lecture 11: Computer.
So that k k E 5 = - E 2 = = x J = x J Therefore = E 5 - E 2 = x J Now so 631.
Lipinski’s rule of five
5 Linear Free-Energy Relationships.
In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania
General Concepts in QSAR for Using the QSAR Application Toolbox
Daylight and Discovery
Virtual Screening.
High Throughput Experimentation: Computational Requirements
Current Status at BioChemtek
The Chemical Context of Life
Giovanni Settanni, Antonino Cattaneo, Paolo Carloni 
Describing a crystal to a computer: How to represent and predict material structure with machine learning Keith T Butler.
Presentation transcript:

Chemical Similarity – An overview Dr. Nina Jeliazkova

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)2 Similarity : philosophers’ view exploiting the similarity concept is a sign of immature science (Quine) “it is ill defined to say “A is similar to B” and it is only meaningful to say “A is similar to B with respect to C” A chemical “A” cannot be similar to a chemical “B” in absolute terms but only with respect to some measurable key feature

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)3 Similarity : chemists’ view Intuitively, based on expert judgment A chemist would describe “similar” compounds in terms of “approximately similar backbone and almost the same functional groups”. Chemists have different views on similarity Experience, context Lajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists in Reviewing Sets of Compounds, J. Med. Chem., 47(20),

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)4 Chemical similarity Computerized similarity assessment needs unambiguous definitions Structurally similar molecules have similar biological activities –The basic tenet of chemical similarity –Long supporting experience Exceptions are important! –Many exceptions Exceptions are important! Avoiding information loss is important!Identification of the most informative representation of molecular structures Avoiding information loss is important! Similarity measures

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)5 Chemical similarity quantified Numerical representation of chemical structure –Structural similarity –Descriptor –based similarity –3D similarity –Field –based –Spectral –Quantum mechanics –More… Comparison between numerical representations –Distance-like –Association, –Correlation

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)6 Structural similarity Substructure searching Maximum Common Substructure Fragment approach –Atom, bond or ring counts, degree of connectivity –Atom-centred, bond-centred, ring-centred fragments –Fingerprints, molecular holograms, atom environments Topological descriptors –Hosoya’ Z, Wiener number, Randic index, indices on distance matrices of graph (Bonchev & Trinajstic), bonding connectivity indices (Basak), Balaban J indices, etc. –Initially designed to account for branching, linearity, presence of cycles and other topological features –Attempts to include 3D information (e.g. distance matrices instead of adjacency matrices) Molecular eigenvalues (BCUT)

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)7 Structural similarity Oral LD 50 for male rats = 2.5g/kg Dermal LD 50 for male rats = 3.54g/kg Not irritating to eyes of rabbits Slightly irritating to skin of rabbits Not mutagenic in Salmonella strainsNot mutagenic in Salmonella strains Higher potential binding affinity to the estrogen receptor than the nitrophenyl acetateHigher potential binding affinity to the estrogen receptor than the nitrophenyl acetate 3-(2-chloro-4-(trifluoromethyl)phenoxy-) phenyl acetate, CAS# (2-chloro-4-(trifluoromethyl)phenoxy)-2- nitrophenyl acetate, CAS# Higher potential to cause cancer than the phenyl acetateHigher potential to cause cancer than the phenyl acetate Walker. J. (2003),QSARs for pollution prevention, Toxicity Screening, Risk Assessment and Web Applications, SETAC Press Isosteric replacements of groups Substituents: F, Cl, Br, I, CF 3,NO 2 Methyl,Ethyl, Isoprpyl, Cyclopropyl, t- Butyl,-OH,-SH,-NH 2,-OMe,-N(Me) 2 Atoms and groups in rings: -CH=,-N= -CH 2 -,-NH-,-O-,-S- More … Depends on the endpoint! (e.g. lipophilicity, receptor binding, many nice examples in Kubinyi H. Chemical similarity and biological activities ) A single group makes difference … but …

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)8 Structural similarity Rosenkranz H.S., Cunningham A.R. (2001) Chemical Categories for Health Hazard Identification: A feasibility Study, Regulatory Toxicology and Pharmacology 33, Examined the reliability of using chemical categories to classify HPV chemicals as toxic or nontoxic “most often only a proportion of chemicals in a category were toxic”Found: “most often only a proportion of chemicals in a category were toxic” "traditional organic chemical categories do not encompass groups of chemical that are predominately either toxic or nontoxic across a number of toxicological endpoints or even for specific toxic activities”Conclusion: "traditional organic chemical categories do not encompass groups of chemical that are predominately either toxic or nontoxic across a number of toxicological endpoints or even for specific toxic activities” The bold portion of the chemical in the Category column defined the fragment used to query each data set. Abbreviations: EyI,eye irritation;LD50, rat LD50; Dev, developmental toxicity;CA, rodent carcinogenesis; Mnt, in vivo induction of micronuclei; Sal, Salmonella mutagenesis; MLA, mutagenesis in cultured mouse lymphoma cells.

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)9 3D Similarity Distance-based and angle-based descriptors (e.g. inter-atomic distance) Field similarity (not exhaustive list) –Comparative Molecular Field Analysis (CoMFA), CoMSIA –Electrostatic potential –Shape –Electron density –Test probe –Any grid-based structural property Molecular multi-pole moments (CoMMA) Shape descriptors (not exhaustive list) –van der Waals volume and surface (reflect the size of substituents) –Taft steric parameter –STERIMOL –Molecular Shape Analysis –4D QSAR –WHIM descriptors Receptor binding

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)10 Structurally similar compounds can have very different 3D properties Kubinyi, H., Chemical Similarity and Biological activity

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)11 Physicochemical properties Molecular weight Octanol - water partition coefficient Total energy Heat of formation Ionization potential Molar refractivity More…

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)12 Quantum chemistry approaches The wave function and the density function contain all the information of a system. –All the information about any molecule could be extracted from the electron density. Bond creation and bond breaking in chemical reactions, as well as the shape changes in conformational processes, are expressed by changes in the electronic density of molecules. The electronic density fully determines the nuclear distribution, hence the electronic density and its changes account for all the relevant chemical information about the molecule. –In principle, quantum-chemical theory should be able to provide precise quantitative descriptions of molecular structures and their chemical properties.

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)13 Quantum chemistry approaches Quantum chemical descriptors - characterize the reactivity, shape and binding properties of a complete molecule or molecular fragments and substituents: HOMO and LUMO energies, total energy, number of filled orbitals, standard deviation of partial atomic charges and electron densities, dipole moment, partial atomic charges Approaches from The Theory of Atom in Molecules – BCP space, TAE/RECON, MEDLA, QShAR (additive density fragments) Quantum chemistry calculations depend on several levels of approximation Computationally intensive

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)14 Reactivity Similarity between reactions Similarity of chemical structures assessed by generalized reaction types and by gross structural features. Two structures are considered similar if they can be converted by reactions belonging to the same predefined groups (for example oxidation or substitution reactions).

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)15 Similarity indices Association, correlation, distance coefficients Most popular : –Tanimoto distance (fingerprints) –Euclidean distance (descriptors) –Carbo index (fields) Essentially a classification problem has to be solved (decide if a query compound is closer to one or another set of compounds) –Many methods available (Discriminant Analysis, Neural networks, SVM, Bayesian classification, etc.) –Statistical assumptions and statistical error is involved

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)16 Similarity indices Association indices Correlation indices J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter- Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High Throughput Screening,5,

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)17 Fingerprint similarity Information loss – fragments presence and absence instead of counts Bit string saturation – within a large database almost all bits are set Can give nonintuitive results The average similarity appears to increase with the complexity of the query compound Larger queries are more discriminating (flatter curve, Tanimoto values spread wider) Smaller queries have sharp peak, unable to distinguish between molecules Flower D., On the Properties of Bit String-Based Measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 The distribution of Tanimoto values found in database searches with a range of query molecules

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)18 Distance indices Euclidean distance City-block distance Mahalanobis distance Equidistant contours = Points on the equal distance from the query point Distances obey triangle inequality

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)19 Similarity in descriptor space Comparison between a point and groups of points is a classification problem. Euclidean distance performs very well if groups are separable (left). Other classification methods help in other cases.

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)20 What do we measure We compare numerical representations of chemical compounds –The numerical representation is not unique –The numerical representation includes only part of all the information about the compound –A distance measure reflects “closeness” only if the data holds specific assumptions

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)21 Example: Y. Martin et al ( 2002) Do structurally similar molecules have similar biological activity ? Set of 1645 chemicals with IC50s for monoamine oxidase inhibition Daylight fingerprints 1024 bits long ( 0-7 bonds) When using Tanimoto coefficient with a cut off value of 0.85 only 30 % of actives were detected J. Med. Chem. 2002,45, Cutoff values % of actives detected% False positives

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)22 Chemical similarity caveats The similarity computation may not correctly represent the intuitive similarity between two chemical structures –The properties of a chemical might not be implicit in its molecular structure –Molecular structure might not be fully measured and represented by a set of numbers (information loss) –Comparison by similarity indices may be counterintuitive Intuitively similar chemical structures may not have similar biological activity –Bioisosteric compounds –Structurally similar molecules may have different mechanisms of action

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)23 Similarity and Activity “Neighbourhood principle” Proximity with respect to descriptors does not necessary mean proximity with respect to the activity Depends on the relationship between descriptor and activity –True if a continuous & monotonous (e.g. linear,…) relationship holds between descriptors and activity –The linear relationship is only a special case, given the complexity of biochemical interactions. Its use should be justified in every specific case and/or used only locally Neighbourhood in the descriptor space Similar activity values

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)24 Similarity vs. Activity Black square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98) Red circle: Glende et al set: alkyl-substituted (ortho to the amino function) derivatives not included in original Debnath data set logP, E homo, E lumo Similar compounds, Relatively small data set

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)25 Similarity by atom environments vs. logP Syracuse Research KOWWin training set, 2400 compounds (diverse compounds, large data set)

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)26 Neighbourhood principle (Paterson plot) The differences between the descriptor values are plotted on X axis, while the differences between activity values are plotted on Y axis. For a good neighbourhood behaviour, the upper left triangle region should be empty (no large differences in activity for small differences in descriptors).

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)27 Neighbourhood principle (Paterson plot)

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)28 Molecular representation requirements Information preserving or allowing only controlled loss of information Feature selection –By domain knowledge (e.g. receptor binding, any knowledge of mechanism of action) –By verification of the « neighbourhood » assumption –By feature selection methods Examples: PCA, Entropy, Gini index, Kullback-Leibler distance, filter and wrapper methods Compounds should cluster tightly within a class and be far apart for different classes Combining different measures (consensus approach)

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)29 Structure is not the sole factor for biological activity Interactions with environment –Solvation effects –Metabolism –Time dependence –More... Biological activity in different species

ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)30 Conclusions Molecular similarity is relative Molecular representation and similarity index have to account for the underlying bio-chemistry Validation of the similarity formulation and its algorithmic solution is essential –“Neighbourhood” assumption has to be proved case by case “As understanding of the chemistry and biology of drug action improves and a greater ability to model the underlying mechanisms appears, the need for ‘similarity’ approaches will diminish.” Bender, A.; Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem., 2(22),

Thank you!

Nikolova N., Jaworska J., Approaches to Measure Chemical Similarity - a Review, QSAR Comb. Sci. 22 (2003) pp