Download presentation
Presentation is loading. Please wait.
Published byDaisy Couzens Modified over 10 years ago
2
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/
3
Roberto Todeschini Milano Chemometrics and QSAR Research Group An introduction to molecular descriptors and QSAR Iran - February 2009
4
synthesis: chemistry produces the objetcs of its own study chemical composition: a unifying concept for all the experimental sciences molecular structure: one the most fruitful scientific concepts of this century synthesis: chemistry produces the objetcs of its own study chemical composition: a unifying concept for all the experimental sciences molecular structure: one the most fruitful scientific concepts of this century The chemical data
5
The concept of molecular structure is one of the most reach of the last 140 years. Molecular structure
6
The basic assumptions are that different molecular structures have different chemical properties and similar molecular structures have similar molecular properties. Molecular structure congenericity principle
7
Each molecular representation represents a different way to look at the molecular structure and its chemical meaning is strongly immersed in the framework of the chemical theories. Molecular structure
8
Some historical notes
9
Studi sull’isomeria delle così dette sostanze aromatiche a sei atomi di carbonio. Gazzetta Chimica Italiana, vol. IV, p.305 Some historical notes 1874 Wilhelm KÖRNER
10
To distinguish the observed different di-substituted benzenes, he proposed to distinguish them into ortho-, meta-, and para-. Some historical notes These can be considered the first 3 molecular descriptors 1874 Wilhelm KÖRNER
11
Based on these descriptors, 90 years later, Corwin Hansch proposed the first QSAR approach. Some historical notes Lipophilic, electronic and steric descriptors for ortho-, meta-, and para-substituents 1964 Corwin HANSCH
12
“The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment.” R. Todeschini and V. Consonni Definition of molecular descriptor Molecular descriptors
13
3300 molecular descriptors Molecular descriptors
14
lion forefeet eagle hind legs scorpion tail dragon head bull body unicorn snake neck Molecular descriptors
15
size symmetry branching steric shape cyclicity hydrophobicity H - bonding electronic aspects reactivity Molecular descriptors
16
size symmetry branching steric shape cyclicity hydrophobicity H - bonding electronic aspects several meanings in just one number reactivity Molecular descriptors
18
graph theory discrete mathematics physical chemistry information theory quantum chemistry organic chemistry differential topology algebraic topology derived from …. QSAR/QSPR medicinal chemistry pharmacology genomics drug design toxicology proteomics analytical chemistry environmetrics virtual screening library searching applied in …. statisticschemometricschemoinformatics processed by …. Molecular descriptors
19
molecule physico - chemical properties biological activities molecular descriptors Molecular descriptors
20
Historical note: fragment approach The biological activity of a molecule is the sum of its fragment properties common reference skeleton molecule properties gradually modified by substituents Congenericity principle QSAR styrategies can be applied ONLY to classes of similar compounds
21
Biological response = f 1 (L) + f 2 (E) + f 3 (S) + f 4 (M) Corvin Hansch, 1964 Historical note: Hansch approach Lipophilic properties Electronic properties Steric properties Other molecular properties 1 2 3 4
22
1 Congenericity approach 2 Linear additive scheme 3 Limited representation of global molecular properties 4 No 3D and conformational information Historical note: Hansch approach
23
boiling point melting point dipole moment molar refractivity parachor octanol/water partition coefficient vapor pressure density solubility............................. Physico-chemical properties The role of the molecular descriptors
24
binding affinity lethal dose inhibition concentration mutagenicity carcinogenicity................ Biological activities The role of the molecular descriptors
25
biodegradation bioconcentration BOD COD half - life time mobility atmospheric persistance......................... Environmental properties The role of the molecular descriptors
26
.... and more conductivity retention time reological behaviours......................... The role of the molecular descriptors
27
molecule molecular descriptors molecular structure representation a real object numbers Representations of a molecular structure
29
3D - geometrical 0D - counts Representations of a molecular structure Cl H H H H H H 2D - topochemical 2D - topostructural.. ·· ·· ·· ·· ·· ··........ C C C C C C CC C C CC C l H H H H H H 1D – fragment counts.. ·· ·· ·· ·· ·· ··........ C C C C C C CC C C CC C l H H H H H H
30
probes interaction energy value at each point for each probe steric steric electronic electronic hydrophobic hydrophobic Representations of a molecular structure 4D
31
molecular graph graph invariants topostructural descriptors topochemical descriptors topographic descriptors topological information indices 2D Atom list 0D countingsumming grid-based QSAR techniques interaction energy values 4D Substructure list 1D counting molecular geometry x, y, z coordinates geometrical descriptors quantum-chemical descriptors bulk descriptors molecular surface descriptors 3D structural keys
32
molecular graph graph invariants Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices............... Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices............... total information content on..... mean information content on..... total information content on..... mean information content on..... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors............... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors............... 3D-Wiener index 3D-Balaban index D/D index............... 3D-Wiener index 3D-Balaban index D/D index............... topological information indices topostructural descriptors topochemical descriptors molecular geometry x, y, z coordinates topographic descriptors
33
molecular geometry x, y, z coordinates geometrical descriptors interaction energy values grid-based QSAR techniques quantum-chemical descriptors gravitational indices 3D-Morse descriptors EVA descriptors EEVA descriptors WHIM descriptors GETAWAY descriptors.............. gravitational indices 3D-Morse descriptors EVA descriptors EEVA descriptors WHIM descriptors GETAWAY descriptors.............. CoMFA, GRID G-WHIM descriptors............ CoMFA, GRID G-WHIM descriptors............ van der Waals volume geometric volume........... van der Waals volume geometric volume........... charges electronegativities superdelocalizability hardness softness E LUMO E HOMO.............. charges electronegativities superdelocalizability hardness softness E LUMO E HOMO.............. solvent-accessible surface area CPSA descriptors molecular shape analysis Mezey 3D shape analysis........... solvent-accessible surface area CPSA descriptors molecular shape analysis Mezey 3D shape analysis........... molecular surface volume descriptors
34
Properties of a molecular descriptor Several scientists are involved in searching for new molecular descriptors able to catch new aspects of the molecular structure. This kind of reasearch involves creativity and imagination together with solid theoretical basis allowing to obtain numbers with some structural chemical meaning. "There are no restriction on the design of structural invariants, the limiting factor is one's own imagination." [1]. M. Randic (1996), Molecular bonding profiles, J. Math. Chem., 19, 375-392
35
Properties of a molecular descriptor invariance with respect to labeling and numbering of atoms invariance with respect to roto-translation an unambiguous algorithmically computable definition values in a suitable numerical range for the set of molecules where it is applicable to invariance with respect to labeling and numbering of atoms invariance with respect to roto-translation an unambiguous algorithmically computable definition values in a suitable numerical range for the set of molecules where it is applicable to a descriptor MUST have...
36
Properties of a molecular descriptor a descriptor should have... a structural interpretation a good correlation with at least one property no trivial correlation with other molecular descriptors gradual change in its values with gradual changes in the molecular structure not including in the definition experimental properties not restricted to a too small class of molecular structures preferably, some discrimination power among isomers preferably, not trivially including in the definition other molecular descriptors preferably, allowing reversible decoding (back from the descriptor value to the structure)
37
QSAR strategy regression models (quantitative response) classification models (qualitative response) ranking models (ordered response) regression models (quantitative response) classification models (qualitative response) ranking models (ordered response) models...
38
QSAR strategy - Regression
39
QSAR strategy - Classification
40
QSAR strategy - Ranking
41
QSAR strategy experimental responses molecular descriptors training set set of molecules MODEL SRC (QSAR, QSPR,... ) fitting molecular descriptors new molecules predicted new responses reversible decoding experimental responses molecular descriptors test set prediction power
42
QSAR strategy The true interest is in predictive power of the model Model validation Chemometrics
43
… towards conclusions …
44
FAQ - Frequently Asked Questions 1. What is the meaning of that descriptor ? 2. Why are there some models with the same prediction power but different molecular descriptors ? 3. Why use a huge number of molecular descriptors ?
45
FGA - our Frequently Given Answers 1. What is the meaning of that descriptor ? A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. R. Todeschini and V. Consonni
46
2. Why are there some models with the same prediction power but different molecular descriptors ? Molecular descriptors are often intercorrelated, therefore different molecular descriptors can, in turn, take part in a model. FGA - our Frequently Given Answers Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Hans Primas Hans Primas
47
3. Why use a huge number of molecular descriptors ? Complexity is not an intrinsic property of systems, but rather arises from the number of ways in which we are able (or desire) to interact with a system. A molecule is undoubtedly a complex system FGA - our Frequently Given Answers
48
www.moleculardescriptors.eu
49
Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group
51
coffee break
52
... since December 2006 www.moleculardescriptors.eu news software books tutorials and a forum news software books tutorials and a forum
54
Don’t forget your goal! An understanding of the behavior of a system does not always coincide with the prediction of the system’s future behavior! 4. Is a model explaining the known facts of a system better than a model predicting the future events of that system ? fitting versus prediction FGA - our Frequently Given Answers
55
QSAR strategy - Regression
56
"SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla." "SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla." Giacomo Ciamician Tratto dalla Prolusione all'Opera scientifica di Wilhelm KÖRNER, Milano 15 maggio 1910.
57
Fragment approach The biological activity of a molecule is the sum of its fragment properties Congeneric molecules, i.e. a common reference skeleton Substituent properties
58
Fragment approach Parametric approach (Hammett – Hansch,1964) Group approach (Free-Wilson and Fujita-Ban, 1976) DARC-PELCO approach (Dubois, 1966) Sterimol approach (Verloop, 1976)
59
Hansch molecular descriptors partition coefficients - logP, logKow chromatog. param. - Rf, RT, Solubility …. Hammett constants molar refraction dipole moment HOMO, LUMO Ionization potential …. molecular weight VDW volume molar volume surface area …. lipophilic properties steric properties electronic properties Hansch approach
60
The role of the molecular descriptors
61
Introduction
62
Conclusions A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. R. Todeschini and V. Consonni
63
Properties of a molecular descriptor
64
Conclusions Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Hans Primas Hans Primas
65
X
66
molecule physico - chemical properties biological activities molecular descriptors
67
1D1D.. ·· ·· ·· ·· ·· ··........ C C C C C C CC C C CC C l H H H H H H 3D3D 0D0D.. ·· ·· · · ·· ·· ··........ H H H H H H 2D2D Representations of a molecular structure
68
molecular structure ? Just a question …
69
“... : benchè certamente si traveggano già dei rapporti fra la costituzione chimica (composizione e struttura) e le proprietà fisiche loro, è ancor certamente di gran lunga troppo ristretto il numero dei fatti, per dedurne delle conseguenze, che oltre al carattere d’una semplice ipotesi possono pretendere anche quello della probabilità. In ogni caso tali rapporti non sono di natura tanto semplice come a priori forse era lecito aspettarsi. Di certo le proprietà fisiche dei corpi sono in primo luogo una funzione della composizione e struttura loro, sulla di cui forma nulla ancora si sa; funzione probabilmente molto complessa e per il di cui studio occorrerà un imprevedibile numero di fatti, onde poter sufficientemente restringere la cerchia delle rappresentazioni possibili.” Some historical notes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.