Basic Steps of QSAR/QSPR Investigations

Slides:



Advertisements
Similar presentations
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Advertisements

1 Wendy Williams Metaheuristic Algorithms Genetic Algorithms: A Tutorial “Genetic Algorithms are good at taking large, potentially huge search spaces and.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Evolutionary Computational Intelligence
Introduction to Genetic Algorithms Yonatan Shichel.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Genetic Algorithm for Variable Selection
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.
Genetic Algorithms: A Tutorial
QSAR Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of.
Molecular Descriptors
Genetic Algorithm.
Evolutionary Intelligence
Similarity Methods C371 Fall 2004.
GENETIC ALGORITHMS AND GENETIC PROGRAMMING Ehsan Khoddam Mohammadi.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Intro. ANN & Fuzzy Systems Lecture 36 GENETIC ALGORITHM (1)
Genetic algorithms Prof Kang Li
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Fuzzy Genetic Algorithm
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
GENETIC ALGORITHMS.  Genetic algorithms are a form of local search that use methods based on evolution to make small changes to a popula- tion of chromosomes.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1 Genetic Algorithms and Ant Colony Optimisation.
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Edge Assembly Crossover
Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.
Genetic Algorithms What is a GA Terms and definitions Basic algorithm.
Genetic Algorithms. 2 Overview Introduction To Genetic Algorithms (GAs) GA Operators and Parameters Genetic Algorithms To Solve The Traveling Salesman.
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
Selecting Diverse Sets of Compounds C371 Fall 2004.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Innovative and Unconventional Approach Toward Analytical Cadastre – based on Genetic Algorithms Anna Shnaidman Mapping and Geo-Information Engineering.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Use of Machine Learning in Chemoinformatics
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
Genetic Algorithms.
Bulgarian Academy of Sciences
Comparing Genetic Algorithm and Guided Local Search Methods
Basics of Genetic Algorithms (MidTerm – only in RED material)
Virtual Screening.
Genetic Algorithms: A Tutorial
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Genetic Algorithms Chapter 3.
Basics of Genetic Algorithms
Artificial Intelligence CIS 342
Traveling Salesman Problem by Genetic Algorithm
Genetic Algorithm Soft Computing: use of inexact t solution to compute hard task problems. Soft computing tolerant of imprecision, uncertainty, partial.
Steady state Selection
Genetic Algorithms: A Tutorial
Presentation transcript:

Basic Steps of QSAR/QSPR Investigations In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir

QSAR Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of the molecule? In other, words, if one systematically changes a component, will it have a systematic effect on the activity?

What is QSAR? A QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics. QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.

Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions of benzene ring is 104 Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds.

QSXR X=A Activity X=P Property X=R Retention X= bo+ b1D1+ b2D2+…..+ bnDn bi regression coefficient Di descriptors n number of descriptors

History

Early Examples Hammett (1930s-1940s)

Hammett (cont.) Now suppose have a related series s reflect sensitivity to substituent r reflect sensitivity to different system

Free-Wilson Analysis Log 1/C = S ai + m where C=predicted activity, ai= contribution per group, and m=activity of reference

Free-Wilson example Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br] activity of analogs Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br] + 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl] + 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82 Problems include at least two substituent position necessary and only predict new combinations of the substituents used in the analysis.

Hansch Analysis Log 1/C = a p + b s + c where p(x) = log PRX – log PRH and log P is the water/octanol partition This is also a linear free energy relation

Applications of QSAR 1-Drug design 2-Prediction of Chemical toxicity 3-Prediction of environmental activity 4-Prediction of molecular properties 5-Investigation of retention mechanism

Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

Data set selection 1-Structural similarity of studied molecules 2-Data collected in the same conditions 3-Data set would be as large as possible

Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

INTRODUCTION to Molecular Descriptors Molecular descriptors are numerical values that characterize properties of molecules Molecular descriptors encoded structural features of molecules as numerical descriptors Vary in complexity of encoded information and in compute time Examples: Physicochemical properties (empirical) Values from algorithms, such as 2D fingerprints

Classical Classification of Molecular Descriptors Constitutional, Topological 2-D structural formula Geometrical 3-D shape and structure Quantum Chemical Physicochemical Hybrid descriptors

Topological Indexes: Example: Wiener Index Counts the number of bonds between pairs of atoms and sums the distances between all pairs Molecular Connectivity Indexes Randić branching index Defines a “degree” of an atom as the number of adjacent non-hydrogen atoms Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. Branching index is the sum of the bond connectivities over all bonds in the molecule. Chi indexes – introduces valence values to encode sigma, pi, and lone pair electrons

Electronic descriptors Electronic interactions have very important roles in controlling of molecular properties. Electronic descriptors are calculated to encode aspects of the structures that are related to the electrons Electronic interaction is a function of charge distribution on a molecule

Physicochemical Properties Used in this QSAR Liquid solubility Sw,L in mg/L and mmol/m3 Octanol-water partition coefficient Kow Liquid Vapor Pressure Pv,L in Pa Henry’s Law constant Hc in Pa∙m3/mole Boiling point

Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

Feature Selection E.g. comparing faces first requires the identification of key features. How do we identify these? The same applies to molecules. The second step of comparing items involves the selection of features. Many of our methods in molecular similarity are taken from psychology or computer science: I this example of face recognition, it would introduce much noise to compare every pixel of a number of features (which runs into tens of thousands) Instead, 20 characteristic points are selected which retain much of the information while discarding much of the noise The same step can be employed in the comparison of molecules

Objective feature selection After descriptors have been calculated for each compound, this set must be reduced to a set of descriptors which is as information rich but as small as possible 1- Deleting of constant or near constant descriptors 2- Pair correlation cut-off selection 3- Cluster analysis 4- Principal component analysis 5- K correlation analysis

Variable reduction Principal Component Analysis

Principal Component PC1 = a1,1x1 + a1,2x2 + … + a1,nxn Keep only those components that possess largest variation PC are orthogonal to each other

Subjective Feature Selection The aim is to reach optimal model 1-Search all possible model (Best MLR) 2-Forward, Backward & Stepwise methods 3-Genetic algorithm 4-Mutation and selection uncover models 5-Cluster significance analysis 6-Leaps & bounds regression

Feature Selection: ACS Most existing feature selection algorithms consist of : Starting point in the feature space Search procedure Evaluation function Criterion of stopping the search ACS

Feature Selection: ACS Starting point in the feature space - no features - all features - random subset of features ACS

Forward Selection 1- variables are sequentially entered into the model. The first variable considered for entry into the equation is the one with the largest positive or negative correlation with the dependent variable. This variable is entered into the equation only if it satisfies the criterion for entry. 2-If the first variable is entered, the independent variable not in the equation that has the largest partial correlation is considered next. 3-The procedure stops when there are no variables that meet the entry criterion.

Forward Selection example

Backward Elimination 1- All variables are entered into the equation and then sequentially removed. 2-The variable with the smallest partial correlation with the dependent variable is considered first for removal. If it meets the criterion for elimination, it is removed. 3- After the first variable is removed, the variable remaining in the equation with the smallest partial correlation is considered next. 4-The procedure stops when there are no variables in the equation that satisfy the removal criteria.

Stepwise Stepwise. At each step, the independent variable not in the equation that has the smallest probability of F is entered, if that probability is sufficiently small. Variables already in the regression equation are removed if their probability of F becomes sufficiently large. The method terminates when no more variables are eligible for inclusion or removal.

Stepwise Example

Forward, Backward & Stepwise variable selection methods Advantages Fast and simple Can do with very packages Limitation Risk of Local minima

Genetic algorithm Genetic Algorithm

Search Space

Definition Genetic algorithm is a general purpose search and optimization method based on genetic principles and Darwin’s law that applicable to wide variety of problems

Darvin’s rules Survival of fittest individuals Recombination Mutation

Biological background Chromosome Gene Reproduction Mutation Fitness

GA basic operation Population generation (chromosome ) Selection (according to fitness ) Recombination and mutation (offspring) Repetition

GA flow chart Initialize population generation Evaluate compute fitness for each chromosome Exploit perform natural selection Explore recombination & mutation operation

Every of chromosome is a string of bit 0 or 1 Binary Encoding Every of chromosome is a string of bit 0 or 1 Chromosome A 1 0 1 1 0 0 1 1 1 0 0 0 0 1 Chromosome B 0 0 1 0 0 1 1 1 0 1 0 0 1 1

The best chromosome should survive and create new offspring. Selection The best chromosome should survive and create new offspring. Roulette wheel selection Rank selection Steady state selection

Roulette wheel selection Fitness 1> 2 > 3 >4

Crossover ( binary encoding ) *Single point 11001011+11011111 = 11001111 * Two point crossover 11001011 + 11011111 = 11011111

Mutation * Bit inversion (binary encoding ) 11001001 => 10001001 * Ordering change ( permutation encoding ) (1 2 3 4 5 6 8 9 7) => (1 8 3 4 5 6 2 9 7)

Population generation GA flow chart Start Population generation Fitness Selection Replace Crossover Mutation Test End

Parameters of GA Crossover rate Mutation rate Population size Selection type Encoding Crossover and mutation type

Advantages of GA Parallelism Provide a group of potential solutions Easy to implement Provide global optima

How many descriptors can be used in a QSAR model? Rule of tumb: - Per descriptor at least 5 data point (molecule) must be exist in the model Otherwise possibility of finding coincidental correlation is too high

Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

Questions?