Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Slides:



Advertisements
Similar presentations
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Advertisements

13 May 2009Instructor: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction.
Substructures and Patterns in 2-D Chemical Space Danail Bonchev Department of Mathematics and Applied Mathematics and Center for the Study of Biological.
Introduction to Network Theory: Modern Concepts, Algorithms
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Basic Steps of QSAR/QSPR Investigations
Atomic and Molecular Orbitals l The horizontal rows of the periodic table are called Periods. l Each period represents a different quantum energy level.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Carbon and the Molecular Diversity of Life
Molecular Modeling: Statistical Analysis of Complex Data C372 Dr. Kelsey Forsythe.
Graph theory as a method of improving chemistry and mathematics curricula Franka M. Brückler, Dept. of Mathematics, University of Zagreb (Croatia) Vladimir.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
QSAR Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of.
Molecular Descriptors
Cayley’s Enumeration on the Structural Isomers of Alkanes Matthew P. Yeager.
CHE 311 Organic Chemistry I Dr. Jerome K. Williams, Ph.D. Saint Leo University.
1 Chapter 4 Carbon and the Molecular Diversity of Life.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Similarity Methods C371 Fall 2004.
Topic 1:Chemicals of life 1.Molecules and Atoms 2.Water 3.Carbon and Other elements.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
SDF File analysis Creation, composition, checking.
1 Chapter 4 Carbon and the Molecular Diversity of Life.
“Topological Index Calculator” A JavaScript application to introduce quantitative structure-property relationships (QSPR) in undergraduate organic chemistry.
Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Different centrality measures of.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Organic Chemistry Larry Scheffler Lincoln High School Portland, OR.
An Introduction to Organic Chemistry. Orgins Originally defined as the chemistry of living materials or originating from living sources Wohler synthesized.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Compounds of Carbon Chapter 9. Carbon Over seven million compounds containing carbon are known. Over seven million compounds containing carbon are known.
1 Chapter 4 Carbon and the Molecular Diversity of Life.
Organic Chemistry for Nursing students Chapter 1 Introduction into organic chemistry Bonding and isomerism 1.
1 Chapter Outline 4.1 Formal Charge Structural Formulas 4.2 Polar Covalent Bonds, Shape, and Polarity 4.3 Noncovalent Interactions 4.4 Alkanes 4.5 Constitutional.
Basic Notions on Graphs. The House-and-Utilities Problem.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Organic Chemistry Nathan Watson Lincoln High School Portland, OR.
Molecular Shapes and Molecular Polarity
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
Introduction to Organic Chemistry Section Organic Chemistry The chemistry of carbon compounds Not including metal carbonates and oxides Are varied.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Organic Chemistry The magic of the carbon atom. Organic Chemistry Objectives Bonding of the carbon atom.
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Compounds of Carbon Chapter 9.
Matrix Representation of Graph
Network analysis.
and the Molecular Diversity of Life
CARBON Organic compounds
Graph Operations And Representation
Topological Index Calculator III
Carbon and the Molecular Diversity of Life
Carbon and the Molecular Diversity of Life
Chapter 6 Bonding.
Organic Chemistry An Introduction.
Carbon and the Molecular Diversity of Life
Carbon and the Molecular Diversity of Life
Carbon and the Molecular Diversity of Life
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Organic Chemistry and the Importance of Carbon
Carbon and the Molecular Diversity of Life
Describing a crystal to a computer: How to represent and predict material structure with machine learning Keith T Butler.
Presentation transcript:

Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, Milano (Italy) Website: michem.unimib.it/chm/

Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Constitutional descriptors and graph invariants Iran - February 2009

Content  Counting descriptors  Empirical descriptors  Fragment descriptors  Molecular graphs  Topological descriptors

Counting descriptors Each descriptor represents the number of elements of some defined chemical quantity. For example: - the number of atoms or bonds - the number of carbon or chlorine atoms - the number of OH or C=O functional groups - the number of benzene rings - the number of defined molecular fragments

Counting descriptors... also a sum of some atomic / bond property is considered as a count descriptor, as well as its average For example: - molecular weight and average molecular weight - sum of the atomic electronegativities - sum of the atomic polarizabilities - sum of the bond orders

A counting descriptor n is semi-positive variable, i.e. n  0 Its statistical distribution is usually a Poisson distribution. Counting descriptors Main characteristics simple the most used local information high degeneracy discriminant modelling power

Empirical descriptors Descriptors based on specific structural aspects present in sets of congeneric compounds and usually not applicable (or giving a single default value) to compounds of different classes.

It is a descriptor dedicated to the modelling of the benzene rings and is defined as the sum of the six lengths joining the adjacent substituent groups. HH HH CH 3 Cl Index of Taillander Empirical descriptors Taillander et al., 1983

Empirical descriptors It is a descriptor dedicated to the modelling of hydrophilicity and is based on a function of the counting of hydrophilic groups (OH-, SH-, NH-,...) and carbon atoms. nHynumber of hydrophilic groups nCnumber of carbon atoms ntotal number of non-hydrogen atoms -1  Hy  3.64 Hydrophilicity index (Hy) Todeschini et al., 1999

Empirical descriptors CompoundnHynCnHy hydrogen peroxide carbonic acid water butanetetraol propanetriol ethanediol methanol ethanol decanediol propanol butanol pentanol methane nHy = 0 and nC = 000N0.00 decanol ethane pentane decane alcane with nC =

Fragment approach  Parametric approach (Hammett – Hansch,1964)  Substituent approach (Free-Wilson, Fujita-Ban, 1976)  DARC-PELCO approach (Dubois, 1966)  Sterimol approach (Verloop, 1976)

Fragment approach The biological activity of a molecule is the sum of its fragment properties common reference skeleton molecule properties gradually modified by substituents Congenericity principle QSAR styrategies can be applied ONLY to classes of similar compounds

Biological response = f 1 (L) + f 2 (E) + f 3 (S) + f 4 (M) Corvin Hansch, 1964 Hansch approach Lipophilic properties Electronic properties Steric properties Other molecular properties

Hansch approach 1 Congenericity approach 2 Linear additive scheme 3 Limited representation of global molecular properties 4 No 3D and conformational information

Free-Wilson approach 12

Free-Wilson, 1964 F Br I F Br I Pos. 1Pos. 2 I ks absence/presence of k-th subst. in the s-th site

Fragment approach Fingerprints binary vector presence of a fragmentabsence of a fragment similarity searching

Molecular graph

Mathematical object defined as G = ( V, E ) set V set V vertices et E set E edges atomsbonds

Usually in the molecular graph hydrogen atoms are not considered H - depleted molecular graph Molecular graph

A walk in G is a sequence of vertices w = (v 1, v 2, v 3,..., v k ) such that {v j, v j+1 }  E. The length of a walk is the number of edges traversed by the walk. A path in G is a walk without any repeated vertices. The length of a path (v 1, v 2, v 3,..., v k+1 ) is k. v 1 v 2 v 3 v 2 v 5 walk of length 4 v 1 v 2 v 3 v 4 v 5 path of length Molecular graph

The topological distance d ij is the length of the shortest path between the vertices v i and v j d 15 = 2 The detour distance  ij is the length of the longest path between the vertices v i and v j.  15 = 4

Molecular graph A self returning walk is a walk closed in itself, i.e. a walk starting and ending on the same vertex. A cycle is a walk with no repeated vertices other than its first and last ones (v 1 = v k ). v 1 v 2 v 3 v 2 v 1 Self returning walk of length v 2 v 3 v 4 v 5 v 2

Molecular graph The molecular walk (path) count MWCk (MPCk) of order k is the total number of walks (paths) of k-th length in the molecular graph. MWC0 = nSK (no. of atoms) MWC1 = nBO (no. of bonds)  Molecular size  Branching  Graph complexity DRAGON MWC1, MWC2, …, MWC10

Molecular graph The self-returning walk count SRWk of order k is the total number of self-returning walks of length k in the graph. spectral moments of the adjacency matrix, i.e. linear combinations of counts of certain fragments contained in the molecular graph, i.e. embedding frequencies. SRW1 = nSK SRW2 = nBO DRAGON SRW1, SRW2, …, SRW10

Molecular graph Local vertex invariants (LOVIs) are quantities associated to each vertex of a molecular graph. Graph invariants are molecular descriptors representing graph properties that are preserved by isomorphism. ® characteristic polynomial ® derived from local vertex invariants

Molecular graph and more Molecular graph Topological matrix Algebraic operator Local Vertex Invariants Graph invariants Molecular descriptors

molecular graph graph invariants Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices total information content on..... mean information content on..... total information content on..... mean information content on..... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors D-Wiener index 3D-Balaban index D/D index D-Wiener index 3D-Balaban index D/D index topological information indices topostructural descriptors topochemical descriptors molecular geometry x, y, z coordinates topographic descriptors

Molecule graph invariants Numerical chemical information extracted from molecular graphs. The mathematical representation of a molecular graph is made by the topological matrices: adjacency matrix atom connectivity matrix atom connectivity matrix distance matrix distance matrix edge distance matrix edge distance matrix incidence matrix incidence matrix... more than 60 matrix representations of the molecular structure

Local vertex invariants (LOVIs) are quantities associated to each vertex of a molecular graph. Examples: atom vertex degree atom vertex degree valence vertex degree valence vertex degree sum of the vertex distance degree sum of the vertex distance degree maximum vertex distance degree maximum vertex distance degree Local vertex invariants

Topological matrices Adjacency matrix Derived from a molecular graph, it represents the whole set of connections between adjacent pairs of atoms. a ij = 1 if atom i and j are bonded 0 otherwise

Bond number B It is the simplest graph invariant obtained from the adjacency matrix. It is the number of bonds in the molecular graph calculated as: where a ij is the entry of the adjacency matrix. Topological matrices

atom vertex degree It is the row sum of the vertex adjacency matrix Local vertex invariants

number of valence electrons of the i-th atom number of hydrogens bonded to the i-th atom valence vertex degree for atoms of the 2nd principal quantum number (C, N, O, F)

Local vertex invariants the vertex degree of the i-th atom is the count of edges incident with the i-th atom, i.e. the count of  bonds or  electrons. valence vertex degree

Local vertex invariants total number of electrons of the i-th atom (Atomic Number) for atoms with principal quantum number > 2

Topological descriptors Zagreb indices (Gutman, 1975)  i vertex degree of the i-th atom

Topological descriptors Kier-Hall connectivity indices (1986) Randic branching index (1975) They are based on molecular graph decomposition into fragments (subgraphs) of different size and complexity and use atom vertex degrees as subgraph weigth. is called edge connectivity

Topological descriptors mean Randic branching index

Topological descriptors atom connectivity indices of m-th order m Pnumber of m-th order paths qsubgraph type (Path, Cluster, Path/Cluster, Chain) n = mfor Chain (Ring) subgraph type n = m + 1 otherwise The immediate bonding environment of each atom is encoded by the subgraph weigth. The number of terms in the sum depends on the molecular structure. The connectivity indices show a good capability of isomer discrimination and reflect some features of molecular branching.

They encode atom identities as well as the connectivities in the molecular graph. valence connectivity indices of m-th order Topological descriptors

Kier-Hall electronegativity correlation with the Mulliken-Jaffe electronegativity: principal quantum number Kier-Hall relative electronegativity electronegativity of carbon sp 3 taken as zero

Distance matrix vertex distance matrix degree s i It is the row sum of the vertex distance matrix The distance d ij between two vertices is the smallest number of edges between them sisi ii s i is high for terminal vertices and low for central vertices

The eccentricity  i of the i-th atom is the upper bound of the distance d ij between the atom i and the other atoms j Local vertex invariants

Topological descriptors Petitjean shape index (1992) A simple shape descriptor I PJ = 0for structure strictly cyclic I PJ = 1for structure strictly acyclic and with an even diameter

Topological descriptors Wiener index (1947) high values for big molecules and for linear molecules low values for small molecules and for branched or cyclic molecules The Average Wiener index is independent from the molecular size. d ij topological distances

Topological descriptors Balaban distance connectivity index (1982) B number of bonds C number of cycles s i sum of the i-th row distances one of the most discriminant indices average sum of the i-th row distances number of atoms

Edge descriptors abc de f abcdef b a c d e f EsiEsi EiEi a b c d e f atom bond

Some geometrical descriptors are derived from the corresponding topological descriptors substituting the topological distances d st by the geometrical distances r st. topographic descriptors They are called topographic descriptors. Topographic descriptors For example, the 3D-Wiener index:

The geometry matrix G (or geometric distance matrix) is a square symmetric matrix whose entry r st is the geometric distance calculated as the Euclidean distance between the atoms s and t: Molecular geometry

Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group

coffee break

Goal

Molecular graph

Molecule graph invariants

Molecular graph

Hansch molecular descriptors partition coefficients - logP, logKow chromatog. param. - Rf, RT, Solubility …. Hammett constants molar refraction dipole moment HOMO, LUMO Ionization potential …. molecular weight VDW volume molar volume surface area …. lipophilic properties steric properties electronic properties Hansch approach

Molecular graph