Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chemical Diversity Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space. In combinatorial.

Similar presentations


Presentation on theme: "Chemical Diversity Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space. In combinatorial."— Presentation transcript:

1 Chemical Diversity Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space. In combinatorial chemistry, we are interested in the diversity of a library.

2 Example 1: here we are looking at compounds that can possess up
to 2 functional groups. How do we define libraries that have different numbers of these cells occupied? How do we quantify those that have duplicates within cells?

3 Chemical Diversity based on properties
Example 2: We can try to define the diversity based on properties of the compounds. For example, we could look at the naturally occurring amino acids and span the space define by their pI. This gives a poor spread, so try pI and MW. Could go to higher dimensions by also looking at the number of H-bonds they make, the number of OH groups, their dipole moment, etc.

4 Why is Diversity Important?
Similar Property Principle Structurally similar compounds will exhibit similar physicochemical and biological properties Test only representative compounds, eliminate redundancies For lead discovery want a diverse space to locate all possible hits (actives) – called a diverse library For refining a lead into a drug (lead optimization), want to survey a range of similar compounds – called a focused library Diversity hypothesis Diverse reactants will lead to diverse products Potentially useful for library design Quantify whether a library can be supplemented by additions of other compounds, other libraries Beno, Drug Discovery Today, 2001, 6, Brown, JCICS, 1996, 36, 572 Gillet, JCICS, 1997, 37, 731

5 Types of Diversity A library with members that
sample chemical space evenly – an ideal situation for lead discovery A library that covers the same chemical space but the compounds cluster and leave large holes. A library with even sampling of space, but only with limited diversity – useful for modification of a lead. From Rose, Drug Discovery Today, 2002, 7, 133.

6 Quantifying Diversity
Need to define how similar (or dissimilar) two compounds are from each other Similarity indices Then need to determine the spread of the compounds throughout space Distance-based Cell-based partitioning Clustering Agrafiotis, Mol. Diversity, 1999, 4, 1

7 Defining Similarity Descriptors Structural keys Fingerprints
Property-based Structure-based 2D 3D Pharmacophore Structural keys Fingerprints Similarity/Distance Coefficients Beno, Drug Discovery Today, 2001, 6, 251 Willett, Curr. Opin. Biotechnology, 2000, 11, 85 Willett, JCICS, 1998, 38, 983 Daylight,

8 Structural Keys Boolean array expressing whether a pattern in present (TRUE) or not (FALSE) within a molecule This array is usually represented as a string of 1s (TRUE) or 0s (FALSE) – a bitmap So create a list of structural features and then set the corresponding bit to 1 if the feature is present Martin, J. Med. Chem., 1995, 38, 1431 Flower, JCICS, 1998, 38, 379

9 Fingerprints Problems with structural keys Solution – Fingerprint
Lack of generality Choice of structural keys is arbitrary and may not be appropriate for the search or question at hand List of structural keys can be very long and unwieldy to generate and test Solution – Fingerprint Also a bitmap but NO assigned meaning to any particular bit! Your fingerprint is characteristic of you, but there is no meaning to any particular fragment of it Generate patterns from the molecule itself, such as a pattern for Each atom Each atom with nearest neighbors Each group of atoms and bonds connected by up to 2 bonds long Continuing with paths up to 3, 4, 5, 6, and 7 bonds long (seven seems to be the longest typically employed) This list of patterns is exhaustive, meaning all are generated for every molecule

10 Fingerprints. II. Since the number of patterns is huge, not possible to assign a particular bit to each pattern Instead, each pattern is the input into a hash function that creates a number of set bits (typically 4-5 bits). These set bits are then added (with logical OR) to the fingerprint. Note that bit sets for different patterns may have some bits in common This conflict is not a problem since every bit set from some pattern (substructure) will be set in the molecule’s fingerprint. Each pattern (substructure) generates its particular set of bits, and it is unlikely that another pattern will set those exact same bits. So a search for that substructure simply means looking to see if those bits have been set. Fingerprint advantages No predefined set of patterns (structural keys) Structural keys are usually quite sparse, fingerprints are much more dense

11 Similarity Coefficients
a = S xjA number on bits in A b = S xjb number on bits in B c = S xjA xjB number on bits in both A and B D(A,B) is similarity of A and B using bits S(A,B) is similarity of A and B using continuous variables Euclidean Distance Tanimoto Coefficient Cosine Coefficient D(A,B) = [a + b – 2c]1/2 range 0 to n bits S(A,B) = [S (xjA – xjB)2 ]1/2 range 0 to infinity D(A,B) = c/[a + b – c] range 0 to 1 S(A,B) = S xjAxjB / [S xjA2 + S xjB2 + S xjAxjB] range to 1 D(A,B) = c/[ab]1/2 range 0 to 1 S(A,B) = S xjAxjB / [S xjA2 S xjB2 ]1/2 range –1 to 1 Willett, JCICS, 1998, 38, 983

12 Example: Bitmap for 2,2-dimethylbutane 1111011000000 a = 6
Ethylcyclobutane b = 9 c = 5 Euclid distance = (6+9-10)1/2 = 2.24 Tanimoto coefficient = 5/(6+9-5) = 0.5 Cosine coefficient = 5/(6*9)1/2 = 0.68

13 Problems with Tanimoto and related similarity indices
Flower, JCICS, 1998, 38, 379

14 Quantifying Diversity Rules for a diversity function
adding redundant molecules does not change the value of the diversity Adding non-redundant molecules always increases the value of the diversity Space-filling behavior should be preferred Perfect filling of space gives a finite value of the diversity As dissimilarity of a pair of compounds increases, the diversity should increase asymptotically Waldman, J. Mol. Graph. Model., 2000, 18, 412

15 Diversity definition 1 Where SIM(J,K) is some similarity measurement
between compounds A and B. Can use this to build up a compound selection procedure for creating the sublibrary with maximal diversity Find similarities of all compounds in the library Select compound that is most dissimilar from all other Select 2nd compound that is most dissimilar from the first Select 3rd compound that is most dissimilar from first 2 Continue until you have selected as many compounds as you desire

16 Cell-based Partitioning
Divide each dimension into a number of parts These divisions are called cells or bins Place compounds into appropriate bin based on the value of its properties and/or descriptors Can now create a sublibrary by choosing one compound from each bin, usually the one nearest the center of the bin Schematic representation of different sampling of diversity space (a) Maximize Euclidean distance to create maximum diversity (b) cell-based selection, choosing compound nearest center of each cell From Rose, Drug Discovery Today, 2002, 7, 133

17 Diversity definition 2 and 3
Suppose 10 molecules divided into 2 cells. Distribution 1: (5,5) – Dc2 = 0 Distribution 2: (7,3) - Dc2 = -8 So the more even distribution is scored as being more diverse. But this may actually go too far – Dc2(2,2,2) > Dc2 (4,1,1) = Dc2 (3,3,0) Makes these last two equivalent, but the (4,1,1) appears to be intuitively more diverse. This entropy-like definition ranks the three sets Dentropy(2,2,2) > Dentropy(4,1,1) > Dentropy(3,3,0) Waldman, J. Mol. Graph. Model., 2000, 18, 412

18 Clustering. I. Hierarchical clusters
Small clusters within larger clusters Typically some relationship between clusters Two procedures Agglomerative Start with singletons and move upwards Calculate all similarities of all pairs Merge two most similar into a cluster Continue until all only one cluster remains Divisive Start with one cluster and break into smaller clusters Calculate all dissimilarities of all pairs Take the pair of most dissimilar structures and assign all other structures to the least dissimilar of these initial cluster centers. Recursively select the cluster with the largest diameter and partition it intow two such that largest resulting cluster has the smallest diameter Repeat step (c) for a maximum of n-1 times Brown, JCICS, 1996, 36, 572

19 Clustering. II. Nonhierarchical clusters No relation between clusters
Jarvis-Patrick method calculate similarities of all pairs Record top n most similar structures to each structure (nearest-neighbor list) Assign compounds to clusters. A and B are in the same cluster if: A is in the top K nearest-neighbor list of B B is in the top K nearest-neighbor list of A A and B have at least Kmin of their top K nearest-neighbors in common Tends to produce lots of small clusters (singletons) under strict conditions or a few very large clusters under less strict conditions Brown, JCICS, 1996, 36, 572

20 Goals for Diversity Metrics
Insure the exploratory libraries are broad enough to locate active molecules Insure that focused (directed) libraries are both broad enough to sample space but compact enough to maintain activity Need to keep libraries small enough to readily manage – so want to insure that sublibraries separate actives from inactives

21 Other Diversity Comments
Krchnak, Mol. Diversity, 1996, 1, 193 ( General comments of combinatorial methods and diversity Good, JCICS, 1997, 40, 3926 Use of 3d pharmacophores demands selection of products not reagents, since they are not additive Martin, J. Comb. Chem., 1999, 1, 32 Beyond diversity, library construction should include MW, lipophilicity, ease of synthesis, pharmacophore features, reagent cost, solubility, complementarity to other libraries. Distance measures assess redundancy, coverage of space is better assessed with maps or binning procedures Diversity functions often overweight edges Oprea, J. Comb. Chem., 2001, 3, 157 Big numbers (lots of compounds) and serendipity are not enough Martin, J. Comb. Chem., 2001, 3, 231 Chemical similarity not always good predictor of bioproperties Unlikely that a few thousand compounds can span all of chemical space Just how much diversity is enough?


Download ppt "Chemical Diversity Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space. In combinatorial."

Similar presentations


Ads by Google