Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Dimension reduction (1)
Lipinski’s rule of five
6th lecture Modern Methods in Drug Discovery WS07/08 1 More QSAR Problems: Which descriptors to use How to test/validate QSAR equations (continued from.
CS790 – Bioinformatics A Gentle Introduction to (or review of) Fundamentals of Chemistry and Organic Chemistry Square one… CS 790 – Bioinformatics.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Basic Steps of QSAR/QSPR Investigations
Quantative Structure- Activity Relationships. Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Chapter 2 Chemical Foundations.
Introduction to Statistical Thermodynamics of Soft and Biological Matter Lecture 4 Diffusion Random walk. Diffusion. Einstein relation. Diffusion equation.
Chapter 11 Multiple Regression.
Molecular Modeling: Statistical Analysis of Complex Data C372 Dr. Kelsey Forsythe.
Quantitative Structure- Activity Relationships (QSAR)
Advanced Medicinal Chemistry
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk:
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
QSAR Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of.
1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.
Chapter 11 Simple Regression
Molecular Descriptors
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
How H 2 0 interacts with: Itself –Hydrogen-bonding Ions and charged functional groups –Solvation, screening, dielectric value Non-polar groups –The hydrophobic.
Presented By Wanchen Lu 2/25/2013
Modern Methods in Drug Discovery WS08/09
Sung Kyu (Andrew) Maeng. Contents  QSAR Introduction  QSBR Introduction  Results and discussion  Current QSAR project in UNESCO-IHE.
Sp 3 Components of the early atmosphere H2H2 CH 4 NH 3 H 2 O Life prefers lighter atoms (1) More abundant on Earth (2) Stronger bonding between small atoms.
Molecular Modeling: Conformational Molecular Field Analysis (CoMFA)
Chapter Two Water: The Solvent for Biochemical Reactions
Comparative Binding Energy (COMBINE) Analysis of Barnase-Barstar Interfacial Mutants barstar barnase High binding affinity (Kd= M) Polar binding.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Quantitative Structure Activity Relationship (QSAR)
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Lecture 5 Interactions Introduction to Statistical Thermodynamics
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Lecture 5 Barometric formula and the Boltzmann equation (continued) Notions on Entropy and Free Energy Intermolecular interactions: Electrostatics.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: Room.
Lecture 9: Theory of Non-Covalent Binding Equilibria Dr. Ronald M. Levy Statistical Thermodynamics.
Principal Component Analysis (PCA)
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Use of Machine Learning in Chemoinformatics
Bioinformatics in Drug Design and Discovery Unit 2.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
SMA5422: Special Topics in Biotechnology Lecture 11: Computer aided drug design: QSAR approach. SMA5422: Special Topics in Biotechnology Lecture 11: Computer.
Toxicity vs CHEMICAL space
Introduction Lecture Dr Jehad Al-Shuneigat
Virtual Screening.
Current Status at BioChemtek
Descriptive Statistics vs. Factor Analysis
Chapter Two Water: The Solvent for Biochemical Reactions
Derivation of preliminary three-dimensional pharmacophoric maps for chemically diverse intravenous general anaesthetics†   J.C. Sewell, J.W. Sear  British.
New compounds with improved biological activity
Structure Activity Relationships (SAR) And
Introduction Lecture Dr Jehad Al-Shuneigat
Presentation transcript:

Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar

Outline Introduction Structures and activities Analysis techniques: Free-Wilson, Hansch Regression techniques: PCA, PLS Comparative Molecular Field Analysis

QSAR: The Setting Quantitative structure-activity relationships are used when there is little or no receptor information, but there are measured activities of (many) compounds

From Structure to Property EC 50

From Structure to Property LD 50

From Structure to Property

QSAR: Which Relationship? Quantitative structure-activity relationships correlate chemical/biological activities with structural features or atomic, group or molecular properties. within a range of structurally similar compounds

Free Energy of Binding and Equilibrium Constants The free energy of binding is related to the reaction constants of ligand-receptor complex formation:  G binding = –2.303 RT log K = –2.303 RT log (k on / k off ) Equilibrium constant K Rate constants k on (association) and k off (dissociation)

Concentration as Activity Measure A critical molar concentration C that produces the biological effect is related to the equilibrium constant K Usually log (1/C) is used (c.f. pH) For meaningful QSARs, activities need to be spread out over at least 3 log units

Free Energy of Binding  G binding =  G 0 +  G hb +  G ionic +  G lipo +  G rot  G 0 entropy loss (translat. + rotat.) +5.4  G hb ideal hydrogen bond –4.7  G ionic ideal ionic interaction –8.3  G lipo lipophilic contact –0.17  G rot entropy loss (rotat. bonds) +1.4 (Energies in kJ/mol per unit feature)

Molecules Are Not Numbers! Where are the numbers? Numerical descriptors

Basic Assumption in QSAR The structural properties of a compound contribute in a linearly additive way to its biological activity provided there are no non-linear dependencies of transport or binding on some properties

An Example: Capsaicin Analogs X EC 50 (  M) log(1/EC 50 ) H Cl NO CN C6H5C6H NMe I NHCHO??

An Example: Capsaicin Analogs Xlog(1/EC 50 )MR  EsEs H Cl NO CN C6H5C6H NMe I NHCHO? MR = molar refractivity (polarizability) parameter;  = hydrophobicity parameter;  = electronic sigma constant (para position); E s = Taft size parameter

An Example: Capsaicin Analogs log(1/EC 50 ) = * MR *  *  * E s

An Example: Capsaicin Analogs X EC 50 (  M) log(1/EC 50 ) H Cl NO CN C6H5C6H NMe I NHCHO??

First Approaches: The Early Days Free- Wilson Analysis Hansch Analysis

Free-Wilson Analysis log (1/C) =  a i x i +  x i :presence of group i (0 or 1) a i : activity group contribution of group i  : activity value of unsubstituted compound

Free-Wilson Analysis +Computationally straightforward –Predictions only for substituents already included –Requires large number of compounds

Hansch Analysis Drug transport and binding affinity depend nonlinearly on lipophilicity: log (1/C) = a (log P) 2 + b log P + c  + k P: n-octanol/water partition coefficient  : Hammett electronic parameter a,b,c:regression coefficients k:constant term

Hansch Analysis +Fewer regression coefficients needed for correlation +Interpretation in physicochemical terms +Predictions for other substituents possible

Molecular Descriptors Simple counts of features, e.g. of atoms, rings, H-bond donors, molecular weight Physicochemical properties, e.g. polarisability, hydrophobicity (logP), water-solubility Group properties, e.g. Hammett and Taft constants, volume 2D Fingerprints based on fragments 3D Screens based on fragments

2D Fingerprints CNOPSXFClBrIPhCONHOHMeEtPyCHOSOC=CC=CCΞCCΞCC=NC=NAmIm

Regression Techniques Principal Component Analysis (PCA) Partial Least Squares (PLS)

Principal Component Analysis (PCA) Many (>3) variables to describe objects = high dimensionality of descriptor data PCA is used to reduce dimensionality PCA extracts the most important factors (principal components or PCs) from the data Useful when correlations exist between descriptors The result is a new, small set of variables (PCs) which explain most of the data variation

PCA – From 2D to 1D

PCA – From 3D to 3D-

Different Views on PCA Statistically, PCA is a multivariate analysis technique closely related to eigenvector analysis In matrix terms, PCA is a decomposition of matrix X into two smaller matrices plus a set of residuals: X = TP T + R Geometrically, PCA is a projection technique in which X is projected onto a subspace of reduced dimensions

Partial Least Squares (PLS) y 1 = a 0 + a 1 x 11 + a 2 x 12 + a 3 x 13 + … + e 1 y 2 = a 0 + a 1 x 21 + a 2 x 22 + a 3 x 23 + … + e 2 y 3 = a 0 + a 1 x 31 + a 2 x 32 + a 3 x 33 + … + e 3 … y n = a 0 + a 1 x n1 + a 2 x n2 + a 3 x n3 + … + e n Y = XA + E (compound 1) (compound 2) (compound 3) … (compound n) X = independent variables Y = dependent variables

PLS – Cross-validation Squared correlation coefficient R 2 Value between 0 and 1 (> 0.9) Indicating explanative power of regression equation Squared correlation coefficient Q 2 Value between 0 and 1 (> 0.5) Indicating predictive power of regression equation With cross-validation:

PCA vs PLS PCA: The Principle Components describe the variance in the independent variables (descriptors) PLS: The Principle Components describe the variance in both the independent variables (descriptors) and the dependent variable (activity)

Comparative Molecular Field Analysis (CoMFA) Set of chemically related compounds Common substructure required 3D structures needed (e.g., Corina-generated) Bioactive conformations of the active compounds are to be aligned

CoMFA Alignment

CoMFA Grid and Field Probe (Only one molecule shown for clarity)

Electrostatic Potential Contour Lines

CoMFA Model Derivation Van der Waals field (probe is neutral carbon) E vdw =  (A i r ij B i r ij -6 ) Electrostatic field (probe is charged atom) E c =  q i q j / Dr ij Molecules are positioned in a regular grid according to alignment Probes are used to determine the molecular field:

3D Contour Map for Electronegativity

CoMFA Pros and Cons +Suitable to describe receptor-ligand interactions +3D visualization of important features +Good correlation within related set +Predictive power within scanned space –Alignment is often difficult –Training required