Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, Milano (Italy) Website: michem.unimib.it/chm/
Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Autocorrelations, eigenvalue-based and information indices Iran - February 2009
Contents Autocorrelation descriptors Autocorrelation descriptors Molecule representation by matrices Molecule representation by matrices Eigenvalue-based descriptors Eigenvalue-based descriptors Information content Information content Information indices Information indices
- quadratic molecular property - quadratic molecular property with interaction terms w is the vector collecting the weights of each atom Autocorrelation on a molecular graph 1 = (1,A) (A,A) (A,1)
Moreau - Broto autocorrelation of a topological structure 1984 LAG Autocorrelation on a molecular graph
Example : 4-hydroxy-2-butanone Autocorrelation on a molecular graph
Eigenvalue descriptors are derived from the diagonalization of symmetric matrices derived from a molecular graph, such as: Adjacency matrix Vertex distance matrix Edge adjacency matrix Edge distance matrix Detour matrix Geometrical distance matrix Covariance matrix... and any weighted symmetric matrix Eigenvalue-based descriptors
Lovasz - Pelikan index (or leading eigenvalue) The largest eigenvalue derived from the adjacency matrix 1973
Eigenvalue-based descriptors General functions of eigenvalues
Eigenvalue-based descriptors The trace of the adjacency matrix (and of the distance matrix) is equal to zero.
Eigenvalue-based descriptors VAA indices (from adjacency matrix) Balaban et al., 1991
VEA indices (from adjacency matrix) Balaban et al., 1991 where A is largest negative eigenvalue derived from the adjacency matrix Eigenvector-based descriptors
VAD, VED and VRD indices (from distance matrix) Balaban et al., 1991 The same indices defined above are calculated on the topological distance matrix Eigenvalue-based descriptors
The geometry matrix G (or geometric distance matrix) is a square symmetric matrix whose entry r st is the geometric distance calculated as the Euclidean distance between the atoms s and t: Molecular geometry
Distance / distance matrix Distance / distance matrix (DD) Randic et al., 1994
Folding degree index Randic et al., 1994 This quantity tends to 1 for linear molecules (of infinite length) and decreases in correspondence with the folding of the molecule. The largest eigenvalue derived from the distance/distance matrix Eigenvalue-based descriptors
Conventional bond order single bond: * = 1 double bond: * = 2 triple bond: * = 3 conjugated bond: * = 1.5
Eigenvalue-based descriptors BCUT descriptors Burden - CAS - University of Texas eigenvalues BCUT descriptors Burden - CAS - University of Texas eigenvalues The largest absolute eigenvalues 1, 2, 3,..., L, derived from the following B matrix: 1997 * conventional bond order w atomic properties
Topological information indices Indices based on the information content and entropy measures derived from the molecular graphs.
Information content The information content of a system having n elements is a measure of the degree of diversity of the elements in the set. where G is the number of different equivalence classes and n g is the number of elements in the g-th class and
Information content Maximum information content Total information content
The Shannon entropy of a system having n elements is the mean information content of a set of elements where G is the number of different equivalence classes and p g is the probability of the g-th class and Information content
Maximum entropy Standardized entropy
Information content... on atoms n = 9 C = 7 F = 2 n = 9 C = 7 F = 1 Br = 1 I C = 7 log log 2 2 = = I T = – = I C = 7 log (1 log 2 1) = = I T = – = H = -(7/9) log 2 (7/9) + -(2/9) log 2 (2/9) = = H * = / = I MAX = 9 log 2 9 = H MAX = log 2 9 = H = -(7/9) log 2 (7/9) - 2 (1/9) log 2 (1/9) = x = H * = / = 0.311
Information content... on vertex degrees n = 9 V1 = 3 V2 = 3 V3 = on vertex degree magnitudes SV1 = 3 SV2 = 6 SV3 = 9 n = 18 V1 = 3 V2 = 6 V3 = 9 H = -(3/18) log 2 (3/18) - (6/18) log 2 (6/18) -(9/18) log 2 (9/18) = xxxx H = 3*[-(3/9) log 2 (3/9)] = xxx
Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group
X
X
X
X
X
Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Autocorrelations, eigenvalue-based and information indices Prof. Roberto Todeschini Dr. Davide Ballabio Dr. Viviana Consonni Dr. Alberto Manganaro Dr. Andrea Mauri
X
X
X
Autocorrelation ona molecular graph