Download presentation
1
Basic Steps of QSAR/QSPR Investigations
In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University
2
QSAR Qualitative Structure-Activity Relationships
Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of the molecule? In other, words, if one systematically changes a component, will it have a systematic effect on the activity?
3
What is QSAR? A QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics. QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.
4
Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions of benzene ring is 104 Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds.
6
QSXR X=A Activity X=P Property X=R Retention
X= bo+ b1D1+ b2D2+…..+ bnDn bi regression coefficient Di descriptors n number of descriptors
7
History
14
Early Examples Hammett (1930s-1940s)
15
Hammett (cont.) Now suppose have a related series
s reflect sensitivity to substituent r reflect sensitivity to different system
16
Free-Wilson Analysis Log 1/C = S ai + m where C=predicted activity,
ai= contribution per group, and m=activity of reference
17
Free-Wilson example Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br]
activity of analogs Log 1/C = [m-F] [m-Cl] [m-Br] [m-I] [m-Me] [p-F] [p-Cl] [p-Br] [p-I] [p-Me] Problems include at least two substituent position necessary and only predict new combinations of the substituents used in the analysis.
18
Hansch Analysis Log 1/C = a p + b s + c where p(x) = log PRX – log PRH
and log P is the water/octanol partition This is also a linear free energy relation
19
Applications of QSAR 1-Drug design 2-Prediction of Chemical toxicity
3-Prediction of environmental activity 4-Prediction of molecular properties 5-Investigation of retention mechanism
21
Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation
22
Data set selection 1-Structural similarity of studied molecules
2-Data collected in the same conditions 3-Data set would be as large as possible
24
Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation
25
INTRODUCTION to Molecular Descriptors
Molecular descriptors are numerical values that characterize properties of molecules Molecular descriptors encoded structural features of molecules as numerical descriptors Vary in complexity of encoded information and in compute time Examples: Physicochemical properties (empirical) Values from algorithms, such as 2D fingerprints
27
Classical Classification of Molecular Descriptors
Constitutional, Topological 2-D structural formula Geometrical 3-D shape and structure Quantum Chemical Physicochemical Hybrid descriptors
30
Topological Indexes: Example:
Wiener Index Counts the number of bonds between pairs of atoms and sums the distances between all pairs Molecular Connectivity Indexes Randić branching index Defines a “degree” of an atom as the number of adjacent non-hydrogen atoms Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. Branching index is the sum of the bond connectivities over all bonds in the molecule. Chi indexes – introduces valence values to encode sigma, pi, and lone pair electrons
32
Electronic descriptors
Electronic interactions have very important roles in controlling of molecular properties. Electronic descriptors are calculated to encode aspects of the structures that are related to the electrons Electronic interaction is a function of charge distribution on a molecule
35
Physicochemical Properties Used in this QSAR
Liquid solubility Sw,L in mg/L and mmol/m3 Octanol-water partition coefficient Kow Liquid Vapor Pressure Pv,L in Pa Henry’s Law constant Hc in Pa∙m3/mole Boiling point
36
Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation
37
Feature Selection E.g. comparing faces first requires the identification of key features. How do we identify these? The same applies to molecules. The second step of comparing items involves the selection of features. Many of our methods in molecular similarity are taken from psychology or computer science: I this example of face recognition, it would introduce much noise to compare every pixel of a number of features (which runs into tens of thousands) Instead, 20 characteristic points are selected which retain much of the information while discarding much of the noise The same step can be employed in the comparison of molecules
39
Objective feature selection
After descriptors have been calculated for each compound, this set must be reduced to a set of descriptors which is as information rich but as small as possible 1- Deleting of constant or near constant descriptors 2- Pair correlation cut-off selection 3- Cluster analysis 4- Principal component analysis 5- K correlation analysis
47
Variable reduction Principal Component Analysis
48
Principal Component PC1 = a1,1x1 + a1,2x2 + … + a1,nxn
Keep only those components that possess largest variation PC are orthogonal to each other
49
Subjective Feature Selection
The aim is to reach optimal model 1-Search all possible model (Best MLR) 2-Forward, Backward & Stepwise methods 3-Genetic algorithm 4-Mutation and selection uncover models 5-Cluster significance analysis 6-Leaps & bounds regression
50
Feature Selection: ACS
Most existing feature selection algorithms consist of : Starting point in the feature space Search procedure Evaluation function Criterion of stopping the search ACS
51
Feature Selection: ACS Starting point in the feature space
- no features - all features - random subset of features ACS
52
Forward Selection 1- variables are sequentially entered into the model. The first variable considered for entry into the equation is the one with the largest positive or negative correlation with the dependent variable. This variable is entered into the equation only if it satisfies the criterion for entry. 2-If the first variable is entered, the independent variable not in the equation that has the largest partial correlation is considered next. 3-The procedure stops when there are no variables that meet the entry criterion.
53
Forward Selection example
54
Backward Elimination 1- All variables are entered into the equation and then sequentially removed. 2-The variable with the smallest partial correlation with the dependent variable is considered first for removal. If it meets the criterion for elimination, it is removed. 3- After the first variable is removed, the variable remaining in the equation with the smallest partial correlation is considered next. 4-The procedure stops when there are no variables in the equation that satisfy the removal criteria.
55
Stepwise Stepwise. At each step, the independent variable not in the equation that has the smallest probability of F is entered, if that probability is sufficiently small. Variables already in the regression equation are removed if their probability of F becomes sufficiently large. The method terminates when no more variables are eligible for inclusion or removal.
56
Stepwise Example
57
Forward, Backward & Stepwise variable selection methods
Advantages Fast and simple Can do with very packages Limitation Risk of Local minima
58
Genetic algorithm Genetic Algorithm
59
Search Space
60
Definition Genetic algorithm is a general purpose search and optimization method based on genetic principles and Darwin’s law that applicable to wide variety of problems
61
Darvin’s rules Survival of fittest individuals Recombination Mutation
62
Biological background
Chromosome Gene Reproduction Mutation Fitness
63
GA basic operation Population generation (chromosome )
Selection (according to fitness ) Recombination and mutation (offspring) Repetition
64
GA flow chart Initialize population generation Evaluate
compute fitness for each chromosome Exploit perform natural selection Explore recombination & mutation operation
65
Every of chromosome is a string of bit 0 or 1
Binary Encoding Every of chromosome is a string of bit 0 or 1 Chromosome A Chromosome B
66
The best chromosome should survive and create new offspring.
Selection The best chromosome should survive and create new offspring. Roulette wheel selection Rank selection Steady state selection
67
Roulette wheel selection
Fitness 1> 2 > 3 >4
68
Crossover ( binary encoding )
*Single point = * Two point crossover =
69
Mutation * Bit inversion (binary encoding )
=> * Ordering change ( permutation encoding ) ( ) => ( )
70
Population generation
GA flow chart Start Population generation Fitness Selection Replace Crossover Mutation Test End
71
Parameters of GA Crossover rate Mutation rate Population size
Selection type Encoding Crossover and mutation type
72
Advantages of GA Parallelism Provide a group of potential solutions
Easy to implement Provide global optima
73
How many descriptors can be used in a QSAR model?
Rule of tumb: - Per descriptor at least 5 data point (molecule) must be exist in the model Otherwise possibility of finding coincidental correlation is too high
74
Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation
77
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.