Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Predicting interactions between small molecules and proteins › Vital to the drug discovery process › Key to understanding biological processes  3 classes.

Similar presentations

Presentation on theme: " Predicting interactions between small molecules and proteins › Vital to the drug discovery process › Key to understanding biological processes  3 classes."— Presentation transcript:


2  Predicting interactions between small molecules and proteins › Vital to the drug discovery process › Key to understanding biological processes  3 classes of drug targets › G-protein-coupled receptors (GPCRs) › Enzymes › Ion channels

3  Consider each target independently from other proteins  Ligand-based approach › Compare to known ligands of the target › Requires knowledge about other ligands of a given target  Structure-based or docking approaches › Uses 3D structure of the target to determine how well a ligand can bind › Requires 3D structure of the target › Very time consuming  Cannot apply if no ligand or 3D structure is known for a given target

4  Chemical space: › set of all small molecules  Biological space: › set of all proteins or protein families  Mine the entire chemical space for interactions with the biological space  Knowledge of some ligands for a target can help to predict ligands for similar targets

5  Ligand-based chemogenomics › Look at families or subfamilies of proteins › Model ligands at the level of a family  Target-based chemogenomics › Cluster receptors based on ligand binding site similarity › Use known ligands for each cluster to infer shared ligands  Target-ligand approach › Use binding information for targets to predict ligands for another target in a single step

6  Bock and Gough (2005) › Describe ligand-receptor complexes by merging ligand and target descriptors › Use machine learning methods to predict if a ligand-receptor pair forms a complex  Erhan et al. (2006) › Merge a set of ligand descriptors with a set of receptor descriptors in a framework of neural networks and support vector machines › Offers a large flexibility in the choice of descriptors

7  Investigates different types of descriptors  Builds upon recent developments in kernel methods › In bio- and cheminformatics  Tests different methods for prediction of ligands › For 3 major classes of targets  Shows that the choice of representation greatly effects accuracy  New kernel based on hierarchies of receptors outperforms all other descriptors › Performs especially well for targets with few or no known ligands

8  Given n target/molecule pairs (t 1,c 1 ), …, (t n, c n ) known to form complexes or not › Each pair is represented by a vector  (t,c)  Estimate a linear function › f(t,c)=w ┬  (t,c)  Whose sign is used to predict if a chemical c can bind to a target t  The vector w is estimated from the training set

9  Represent a molecule c by a vector  lig (c)  R dc › Encode physiochemical and structural properties › Model interactions between small molecules and a single target  Represent a protein t by a vector  tar (t)  R dt › Capture properties of the proteins sequence or structure › Infer models that predict the structural or functional class of a protein  Need to represent a pair (c,t) in a single vector › Capture interactions between features of the molecule and protein that can be useful predictors › Multiply a descriptor of c with a descriptor of t

10   (c,t) =  lig (c)   tar (t)  Represent the set of all possible products of features of c and t  d c x d t vector › The (i,j)-th entry is the product of the i-th entry of  lig (c) by the j-th entry of  tar (t)  Size may be prohibitively large  Use kernel methods

11  Can process large- or infinite-dimensional patters if the inner product between any two patterns can be computed  Can factorize the inner product between two tensor product vectors › (  lig (c)   tar (t)) ┬ (  lig (c’)   tar (t’)) › =  lig (c) ┬  lig (c’) x  tar (t) ┬  tar (t’)  Obtain the inner product between two tensor products › K((c,c’),(t,t’))= K ligand (c,c’) x K target (t,t’)  K ligand (c,c’)=  lig (c) ┬  lig (c’)  K target (t,t’)=  tar (t) ┬  tar (t’)

12  Have been impressive advances in use of SVM in chemoinformatics  Kernels have been designed using: › Physiochemical properties of molecules › 2D or 3D fingerprints › Comparison of 2D and 3D structures of molecules  Detection of common substructures in 2D graphs  Encoding various properties of 3D structures  Used in single-target virtual screening and prediction of pharmacokinetics and toxicity

13  Classical choice  State-of-the-art performance  K ligand (c,c’) =  lig (c) ┬  lig (c’) / [  lig (c) ┬  lig (c) +  lig (c’) ┬  lig (c’) -  lig (c) ┬  lig (c’)]   lig (c) ┬ is a binary vector  Bits indicate if the 2D structure of c contains all linear paths of length l or less as a subgraph › Choose l=8  Used ChemCPP software to compute

14  SVM and kernel methods are widely used in bioinformatics  Various Kernels have been proposed based on: › Amino-acid sequence of proteins › 3D structures of proteins › Pattern of occurrences of proteins in multiple sequenced genomes  Used for various tasks related to structural or functional classification of proteins

15  K Dirac (t,t’) › = 1 if t = t’ › = 0 otherwise  Represents different targets as orthonormal vectors  Orthogonality between two proteins t and t’ implies orthogonality between all pairs (c,t) and (c’,t’) for any two molecules c and c’ › Learning is performed independently for each target protein › Does not share any information of known ligands between different targets

16  K multitask (t,t’) = 1 + K dirac (t,t’)  Removes the orthogonality  Combines target-specific properties of the ligands and general properties across all targets  Allows sharing of information during learning  Preserves the specificities of the ligands for each target  Does not weigh much how known interactions should contribute

17  Empirical observations suggest that molecules that bind to t are only likely to bind to t’ if they are similar in terms of structure or evolutionary history › Can be detected by comparing protein sequences  Mismatch kernel: › compares short sequences of amino acids up to some number of mismatches › Choose 3mers with a maximum of one mismatch  Local alignment kernel: › uses the alignment score between the primary sequences of proteins to measure their similarity

18  K hierarchy (t,t’)=(  h (t),  h (t’))   h (t) has a feature for each node in the hierarchy › Is set to 1 if the node is part of t’s hierarchy › Is set to 0 otherwise › Plus one feature is constantly set to 1  Use data from the target and data from other targets, giving it smaller weight  Performed the best in the experiments

19  Enzyme Commission numbers › International Union of Biochemistry and Molecular Biology (1992) › Classifies by the chemical reaction they catalyze › Four-level hierarchy  For example, › EC 1 includes oxidoreductases › EC 1.2 includes oxidoreductases that act on the aldehyde or oxo group of donors › EC 1.2.2 has NAD+ or NADP+ as an acceptor › EC caltalyze the oxidation of formate to bicarbonate  Enzymes that are close in the hierarchy should have similar ligands

20  GPCRs are grouped into four classes › Group A: rhodopsin family › Group B: secretin family › Group C: metabotropic family › Group D: regroups more divers receptors  KEGG database subdivides rhodopsin family into three subgroups › Amine receptors › Peptide receptors › Other receptors  And adds a second level of classification based on the type of ligands or known subdivisions

21  The KEGG database divides ion channels into 8 classes › Cys-loop superfamily › Glutamate-gated cation channels › Epithelial and related Na + channels › Voltage-gated cation channels › Related to voltage-gated cation channels › Related to inward rectifier K + channels › Chloride channels › Related to ATPase-linked transporters  Each class is further subdivided › By, for example, the type of ligands or type of ion passing through the channel

22  Extracted compound interaction data from KEGG BRITE database › Known compounds for each target › Type of interaction  Enzymes: inhibitor, cofactor, effector  GPCR: antagonist, full/partial agonist  Ion Channels: pore blocker, positive/negative allosteric modulator, agonist, antagonist  Did not take into account › Orthologs of targets › Enzymes with same EC number › Compounds with no molecular descriptor  Primarily peptides › Targets with no known compounds

23  Generated as many negative ligand-target pairs as known ligand-target pairs › Randomly chose ligands › Produced false negatives › Need experimentally confirmed negative pairs  2436 data points for enzymes › 675 enzymes, 524 compounds  798 data points for GPCRs › 100 receptors, 219 compounds  2230 data points for ion channels › 114 channels, 462 compounds

24 Distribution of the number of known ligands per target for enzymes, GPCR, and ion channel datasets  Each bar indicates the proportion of targets for which a given number of training points are available  Few compounds are known for most targets Jacob, L. et al. Bioinformatics 2008 24:2149-2156; doi:10.1093/bioinformatics/btn409

25  Experiment 1 › Trained an SVM classifier on  all points involving other targets of the family  plus a fraction of points involving t › Tested on the remaining data points for t › Assesses the accuracy for a given target when using ligands for other targets for training  Experiment 2 › Trained an SVM classifier using only interactions that did not involve t › Tested on data points that did involve t › Simulated making predictions for targets with no known ligands  Measured performance using the area under the ROC curve (AUC)

26 Mean AUC on each dataset with various target kernels  Hierarchy kernel shows significant improvements › Sharing information for known ligands of different targets › Incorporating prior information into the kernels K tar \ TargetEnzymesGPCRChannels Dirac0.646±0.0090.750±0.0230.770±0.020 Multitask0.931±0.0060.749±0.0220.873±0.015 Hierarchy0.955±0.0050.926±0.0150.925±0.012 Mismatch0.725±0.0090.805±0.0230.875±0.015 Local alignment0.676±0.0090.824±0.0210.901±0.013

27 Target kernel Gram matrices (K tar ) for ion channels with multitask, hierarchy, and local alignment kernels  Hierarchy kernel adds structure information  Local alignment kernel retains some substructures  For GPCR and enzymes, almost no structure is found by the sequence kernels Jacob, L. et al. Bioinformatics 2008 24:2149-2156; doi:10.1093/bioinformatics/btn409

28 Relative improvement of the hierarchy kernel against the Dirac kernel as a function of the number of known ligands for enzymes, GPCR, and ion channel datasets  Strong improvement when few ligands are known  Decreases when enough training points become available  After a certain point, performance is impaired Jacob, L. et al. Bioinformatics 2008 24:2149-2156; doi:10.1093/bioinformatics/btn409

29 Mean AUC on each dataset with various target kernels  Dirac kernel showed random behavior › Learning with no training data  Hierarchy kernel still gives reasonable results › 1.7%, 5.1%, 7.2% loss for enzymes, GPCR, and ion channels compared to the first experiment K tar \ TargetEnzymesGPCRChannels Dirac0.500±0.000 Multitask0.902±0.0080.576±0.0260.704±0.026 Hierarchy0.938±0.0060.875±0.0200.853±0.019 Mismatch0.602±0.0080.703±0.0270.729±0.024 Local alignment0.535±0.0050.751±0.0250.772±0.023

30 1. Rognan D: Chemogenomic approaches to rational drug design. Br J Pharmacol 2007, 152 :38-52. 2. Kanehisa M, Goto S, Kawashima S, Nakaya A: {The KEGG databases at GenomeNet}. Nucl. Acids Res. 2002, 30 :42-46. 3. Jacob L, Vert J: Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 2008, 24 :2149-2156. 4. Erhan D, L'Heureux P, Yue SY, Bengio Y: Collaborative Filtering on a Family of Biological Targets. Journal of Chemical Information and Modeling 2006, 46 :626-635. 5. Bock JR, Gough DA: Virtual Screen for Ligands of Orphan G Protein- Coupled Receptors. Journal of Chemical Information and Modeling 2005, 45 :1402-1414.

Download ppt " Predicting interactions between small molecules and proteins › Vital to the drug discovery process › Key to understanding biological processes  3 classes."

Similar presentations

Ads by Google