Leiden University. The university to discover. Enhancing Search Space Diversity in Multi-Objective Evolutionary Drug Molecule Design using Niching 1. Leiden Institute of Advanced Computer Science (LIACS) 2. Leiden/Amsterdam Center for Drug Research (LACDR) 3. NuTech Solutions, Inc. A. Aleman 1 A.P. IJzerman 2 E. van der Horst 2 M.T.M Emmerich 1 T. Bäck 1,3 J.W. Kruisselbrink 1 A. Bender 2
Leiden University. The university to discover. -Search for molecular structures with specific pharmacological or biological activity that influence the behavior of certain targeted cells -Objectives: Maximization of potency of drug (and minimization of side-effects) -Constraints: Stability, synthesizability, drug-likeness, etc. -A huge search space: drug-like molecules -Aim: provide the medicinal chemist a set of molecular structures that can be promising candidates for further research Scope: drug design and development
Leiden University. The university to discover. Molecule Evolution Fragments extracted from From Drug Databases While not terminate do Generate offspring O from P P t+1 = select from (P U O) Evaluate O Initialize population P 0 -‘Normal’ evolution cycle -Graph based mutation and recombination operators -Deterministic elitistic (μ+λ) parent selection (NSGA-II)
Leiden University. The university to discover. Molecule Evolution
Leiden University. The university to discover. Fitness Objectives: -activity predictors based on support vector machines: -f 1 : activity predictor based on ECFP6 fingerprints -f 2 : activity predictor based on AlogP2 Estate Counts -f 3 : activity predictor based on MDL Constraints: -a fuzzy constraint score based on Lipinski’s rule of five and bounds on the minimal energy confirmation:
Leiden University. The university to discover. Desirability indexes for modeling fuzzy constraints The degree of satisfaction can be measured on a scale between 0 and 1 Constraints can be modeled in the form of desirability values
Leiden University. The university to discover. Diversity for Molecule Evolution -A ‘normal’ search yields very similar molecular structures -Aim for a set of diverse candidate structures because: -Vague objective functions may result in finding structures that fail in practice -The chemist desires a set of promising structures rather than only one single solution -Explicit methods are required to enforce diversity in the search space; i.e. niching
Leiden University. The university to discover. All molecules are variations of the same theme! Typical output of a ‘normal’ evolutionary search
Leiden University. The university to discover. Niching in Multi-Objective EA -Explicitly aim for diversity in the decision space -Different than aiming for diversity in the objective space -Points that lie far apart in the objective space do not necessarily also lie far apart in the decision space
Leiden University. The university to discover. Niching-based NSGA-II A Niching-based NSGA-II algorithm as proposed by Shir et al.
Leiden University. The university to discover. Dynamic Niche Identification Peak individuals q=3 Individuals that do not belong to niche B.L. Miller, Shaw, M.J.: Genetic algorithms with dynamic niche sharing for multimodal function optimization, Proceedings of IEEE International Conference on EC, May 1996, Pages:
Leiden University. The university to discover. Similarity in Molecular Spaces -Molecules are represented by bitstrings identifying certain structural properties -A ‘1’ at position i denotes the presence of property i in the molecule, and ‘0’ at position i denotes the absence of property i -How to define a similarity measure for the graph-like molecular structures? -Idea: use molecular fingerprints
Leiden University. The university to discover. Distance based on fingerprints -The distance between two molecules A and B can be based on the four terms: -a: the number of properties only present in A -b: the number of properties only present in B -c: the number of properties present in both A and B -d: the number of properties not present in A and B -One possible distance measure can be created using the Jaccard coefficient (also known as Tanimoto coefficient): The Jaccard distance fullfills the triangular equation, as opposed to for example the cosine-distance!
Leiden University. The university to discover. Triangle inequality
Leiden University. The university to discover. Triangle inequality Why do we want to have a dissimilarity (distance) measure that obeys the triangle inequality? If we have very similar molecules, say molecule A is similar to B and molecule A is also similar to C, then we want to be able to say that B is similar to C.
Leiden University. The university to discover. Triangle inequality
Leiden University. The university to discover. Molecule Evolution with Niching
Leiden University. The university to discover. Experiments Aim: Compare the niching-based NSGA-II method with the normal NSGA-II method Two test-cases: -Find ligands for the Neuropeptide Y2 receptor (NPY2) -Find inhibitors for the Lipoxygenase (LOX) Two objectives: -Aggregated fitness score based on activity predictors -Aggregated constraints score function
Leiden University. The university to discover. Experimental setup -5 runs for each method on each test-case generations per runs -Normal NSGA-II: -50 parents -150 offspring -Niching-based NSGA-II: -10 niches -5 parents per niche -150 offspring -niche radius set to 0.85 (empirically set)
Leiden University. The university to discover. Average Pareto Fronts NPY2: LOX:
Leiden University. The university to discover. Average distance between the individuals in the final populations NPY2: LOX:
Leiden University. The university to discover. Output sets of a NPY2 run without and with niching
Leiden University. The university to discover. Output sets of a LOX run without and with niching
Leiden University. The university to discover. Multi-dimensional Scaling Plots No NichingNiching
Leiden University. The university to discover. The chemist’s view on the output Regarding the niching: -The molecules found with the niching method are clearly more diverse than the molecules found by the non- niching approach In general: -The molecules look reasonable overall, but: -Most molecules still possess unstable and/or toxic features that are not easy to synthesize in practice -Similar types of uncommon features seem to appear
Leiden University. The university to discover. Conclusions and Outlook Conclusions: -Applying niching using the Jaccard distance based on molecular fingerprints and is a way to enhance search space diversity in molecule evolution -It yields more diverse sets of molecules than a normal evolutionary algorithm for molecule evolution Future research: -Applying these methods on other (more sophisticated) models as well -In vitro testing of selected molecules found using this method -Incorporate more sophisticated measures for testing the synthesizability of candidate molecules
Leiden University. The university to discover. Thank you! Alexander Aleman Natural Computing Group LIACS, Universiteit Leiden