Presentation is loading. Please wait.

Presentation is loading. Please wait.

Takeda Pharmaceutical Inc.

Similar presentations


Presentation on theme: "Takeda Pharmaceutical Inc."— Presentation transcript:

1 Takeda Pharmaceutical Inc.
Integrating in Vitro Drug Sensitivity and Genomics Data for Identification of Novel Drug Pathway Associations Cong Li and Ray Liu Yale University and Takeda Pharmaceutical Inc. May 19, 2015 Presented at MBSW Muncie, IN, USA

2 Introduction Interests: indication selection; patient selection.
Experiments : cell lines drugs response assay; microarray assay Data: IC50 and microarray gene expression current analysis practice: stepwise and test-based Our goal: develop a method that analyze available data jointly and incorporate biological information Gene Change in gene expression IC50 Drug Drug Cell line Cell line

3 Microarrays

4

5 Questions Often Asked Design issues
Which genes are differentially expressed between the conditions? Which genes can be used to classify/predict? How? Can biological networks be inferred from these data? What are the biological stories in the data?

6 Drug Pathway Questions
Current drug development framework typically considers the effect of a compound on a single target Pathway-based approaches for drug discovery consider the therapeutic effects of compounds in the global physiological environment For many compounds, their target pathways and mechanism of action are still unknown How to infer the target pathways of drugs?

7 Motivating Data Sets Gene expression data: Affymetrix U133+2 arrays, mapped to ~19,000 genes across over 1000 cancer cell lines; among them, 480 cell lines have available drug response data. Use genes included in two lists: (1) 766 cancer-related genes (Chen, et al., 2008); (2) 8919 genes from the Integrated Druggable Genome Database (IDGD) Project (Hopkins and Groom, 2002; Russ and Lampel, 2005). Pathway association information: Retrieved from the KEGG MEDICUS database (Kanehisa, et al., 2010). 58 pathways which are either known to be related to cancer or have drug targets. Among the genes selected in step (1), 1863 genes are covered by these 58 pathways and constitute the final list of genes in our real data analysis. Drug response data: 24 drugs annotated in the CancerResource database (Ahmed, et al., 2011). log(Activity Area). 22 drugs with known targets covered by the 58 pathways.

8 Overview of the 22 drugs

9 Activity Area (shaded area)
Activity area is a combined measure of both drug potency and drug efficacy, whereas GI50 only measures drug potency.

10 Data Format Drug sensitivity values Basal gene expression levels
(e.g. Activity Area or GI50) Basal gene expression levels (before drug treatment) Cell line 1 Cell line 2 Cell line 3 .. gene1 gene2 gene3 gene4 …….. drug drug drug3

11 Model Description Spike-and-Slab mixture prior (West, 2003) for the factor loading matrix W1 and W2 to impose sparsity and utilize prior knowledge on gene-pathway and drug-pathway associations (matrix L1 and L2).

12

13 Instead of adopting a full Bayesian treatment, we use the following integrative Penalized Matrix Decomposition (iPaD) framework Note the notation differences from iFad: Y(1) is the drug response profile matrix Y(2) is gene expression profile matrix X is the pathway activity level matrix B(1) and B(2) are the pathway loading matrices for drug responses and gene expressions respectively The indexes of the non-zero elements in B(2) are known and denoted by Γ The major interest is to find the non-zero elements in B(1)

14 The algorithm The optimization problem in iPaD is actually a bi-convex problem, motivating the following block-wise optimization strategy: Step 1. Optimize over B(1) and B(2) while keeping X fixed Step 2. Optimize over X while keeping B(1) and B(2) fixed Step 3. Iterate between Step 1 and 2 until convergence

15 The algorithm When X is fixed, optimizing each column of B(1) is a LASSO problem When X is fixed, optimizing each column of B(2) is an ordinary least square (OLS) problem When B(1) and B(2) are fixed, X can optimized using an iteratively projected gradient descent algorithm

16 Dealing with missing values
A gene/drug or cell line that is completely missing can be excluded However, partially missing genes/drugs or cell lines shall be kept in the analysis In our block-wise algorithm, B(1) and B(2) can be optimized column by column with the missing values excluded However, optimizing X is less straightforward because neither its rows nor columns can be optimized separately

17 We use the following soft-impute algorithm to optimize X in the presence of missing values
Ω indexes the observed elements in a matrix and PΩ(*) is an operator that projecting a matrix onto the space of its observed elements.

18 Parameter tuning Significance test
There is a parameter λ that controls the sparsity of B(1) One way to use the method is to apply a decreasing sequence of λ’s to obtain a sequence of solutions for B(1) We can also perform cross-validation on the drug response profile matrix Y(1) Green: training data; Black: testing data Significance test After finding an appropriate λ value, we can perform permutation tests to establish the significance of the identified drug-pathway associations Permute the cell lines (rows) in Y(1) while keeping Y(2) unchanged

19 Simulations We performed the following four sets of simulations (the 58 pathways in the real data were used; the number of drugs d = 22) N η SNR1 SNR2 Sample Size 120 0.1 0.5 240 360 480 Sparsity of B(1) 0.02 0.05 0.2 Signal-to-Noise Ratio 0.25 1 Unbalanced Signal-to-Noise Ratio The simulated data sets were analyzed by both iFad and iPaD. Their performances were evaluated by Area Under the ROC curve (AUC)

20 The performances between the two methods are similar

21 However, iPaD is much faster
The performances between the two methods are similar (cont.) However, iPaD is much faster 1000 iteration in iFad costs 4~5 days Solving a sequence of λ’s takes only ~6 minutes

22 Real Data Analysis We analyzed the CCLE data set described earlier with both iFad and iPaD iFad: 2,000 MCMC iterations; iPaD: 10-fold CV followed by 2,000 permutations (null distribution was approximated using a mixture of a normal distribution and a point mass at zero) We call a drug-pathway association validated if the pathway contains at least one protein targeted by the drug Among the 58 x 22 = 1276 drug-pathway pairs, 195 pairs are validated associations (195/1276 = 15.3%) Considering the randomness in the algorithms, we ran five repeats Among the top 50 drug-pathway association pairs identified by iFad, 7.0 (averaged over five repeats) pairs were validated; 16.6 for iPaD The top associations identified by iPaD were relatively consistent over the five repeats; but not consistent for iFad (probably did not converge) Running time: 2,000 MCMC iterations cost ~230 hours on a standard laptop computer (2.4GHz dual core CPU with 8G memory running on Mac OS X 10.9); 2,000 permutations cost ~6 hours for iPaD

23 The Chronic Myeloid Leukemia Pathway

24 The ErbB Signaling Pathway

25 Limitations/Future Work
Relatively simple additive models Limited and unreliable information on pathways Pathway network topology not considered Other sources of information Tradeoff between model simplicity, computational feasibility, and real biological complexity

26 Thank you!


Download ppt "Takeda Pharmaceutical Inc."

Similar presentations


Ads by Google