Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variable Selection for Tailoring Treatment

Similar presentations


Presentation on theme: "Variable Selection for Tailoring Treatment"— Presentation transcript:

1 Variable Selection for Tailoring Treatment
L. Gunter, J. Zhu & S.A. Murphy ASA, Nov 11, 2008 VARIABLE SELECTION FOR TAILORING TREATMENT. Susan A. Murphy1, Lacey Gunter1, Ji Zhu1, 1University of Michigan, Ann Arbor, Michigan United States In order to tailor treatment to individuals we should collect pretreatment variables that are useful in deciding which treatment to provide to whom. To decide which variables are most likely to be in future, we might use a combination of theory and statistical variable selection methods with presently available data. However most current variable selection methods are focused on finding risk and protective variables. While these variables may be useful in predicting whether an individual needs treatment, they do not necessarily tell us which treatment to provide. We will discuss the necessary characteristics of variables that are useful for tailoring treatment. Our method searches over through pretreatment variables in order to find those variables that satisfy these characteristics. We apply this method to ascertain which variables collected in a clinical trial of two depression treatments might be useful in tailoring the type of treatment to the individual.

2 Outline Motivation Need for Variable Selection
Characteristics of a Tailoring Variable A New Technique for Finding Tailoring Variables Comparisons Discussion

3 Motivating Example

4 50+ baseline covariates, both categorical and continuous
Simple Example Nefazodone - CBASP Trial Nefazodone Randomization Nefazodone + Cognitive Behavioral Analysis System of Psychotherapy (CBASP) From Wikipedia: Nefazodone is not considered to be an SSRI, MAOI or tricyclic antidepressant. It is not chemically related to either bupropion/amfebutamone, or venlafaxine. Nefazodone hydrochloride (trade name Serzone) is an antidepressant drug marketed by Bristol-Myers Squibb. Its sale was discontinued in 2003 in some countries, due to the small possibility of hepatic (liver) injury, which could lead to the need for a liver transplant, or even death. The incidence of severe liver damage is approximately one in 250,000 to 300,000 patient-years.[1] On May 20, 2004, Bristol-Myers Squibb discontinued the sale of Serzone in the United States. Several generic formulations of nefazodone are still available.[2][3] 50+ baseline covariates, both categorical and continuous

5 Which variables in X are important for tailoring the treatment?
Simple Example Nefazodone - CBASP Trial Which variables in X are important for tailoring the treatment? X patient’s medical history, severity of depression, current symptoms, etc. A Nefazodone OR Nefazodone + CBASP R depression symptoms post treatment R is inverse coded HAMD (coded so high is good)

6 Optimization We want to select the treatment that “optimizes” R
The optimal choice of treatment may depend on X

7 Optimization The optimal treatment(s) is given by The value of d is

8 Need for Variable Selection
In clinical trials many pretreatment variables are collected to improve understanding and inform future treatment Yet in clinical practice, only the most informative variables for tailoring treatment can be collected. A combination of theory, clinical experience and statistical variable selection methods can be used to determine which variables are important. Cost in monetary terms, burden to clinical staff and patient requires that only a few variables are used in tailoring.

9 Current Statistical Variable Selection Methods
Current statistical variable selection methods focus on finding good predictors of the response Also need variables to help determine which treatment is best for which types of patients, e.g. tailoring variables Experts typically have knowledge on which variables are good predictors, but intuition about tailoring variables is often lacking Tailoring variable==prescriptive variables

10 What is a Tailoring Variable?
Tailoring variables help us determine which treatment is best Tailoring variables qualitatively interact with the treatment; different values of the tailoring variable result in different best treatments. No Interaction Non-qualitative Interaction Qualitative interaction High R is good

11 Qualitative Interactions
Qualitative interactions have been discussed by many within stat literature (e.g. Byar & Corle,1977; Peto, 1982; Shuster & Van Eys, 1983; Gail & Simon, 1985; Yusuf et al., 1991; Senn, 2001; Lagakos, 2001) Many express skepticism concerning validity of qualitative interactions when found in studies Our approach for finding qualitative interactions should be robust to finding spurious results Skepticism due to Rarity of qualitative interactions especially in drug trials with very homogenous sample, (a least in a one time point setting, may be less common in multiple time point settings when gathering intermediate outcomes); indeed sample does not represent a well-defined population. Data fishing without controlling family-wise error rate Tendency of journals to only publish significant results

12 Qualitative Interactions
We focus on two important factors The magnitude of the interaction between the variable and the treatment indicator The proportion of patients for whom the best choice of treatment changes given knowledge of the variable big interaction small interaction big interaction big proportion big proportion small proportion Green curves represent variable distribution

13 Ranking Score S Ranking Score: where
S estimates the quantity described by Parmigiani (2002) as the value of information. These are linear regressions Green Ticks represent observations, yellow shaded area represents area S-score is estimating

14 Ranking Score S Higher S scores correspond to higher evidence of a qualitative interaction between X and A We use this ranking in a variable selection algorithm to select important tailoring variables. Avoid over-fitting in due to large number of X variables Consider variables jointly You can’t use the Ranking Score by itself as it treats each variable in isolation. X1 may only be a useful tailoring variable if tailoring variable X2 is not collected. Also there are sooooo many pretreatment variables that we will overfit the model. Overfitting the model means that we get a very good model fit on a particular data set but we can not replicate our result as much of our model is actual fit to noise specific to the data set we are using. This problem arises often when one has many covariates. We don’t want to say a variable is a potential tailoring variable if S is high just due to the noise in the data (not due to underlying structure). The following algorithm helps us avoid overfitting. In Latex code: \begin{eqnarray} U_j&=&\left(\frac{D_j - \min_{1\leq k\leq p} D_k} {\max_{1\leq k\leq p} D_k-\min_{1\leq k\leq p} D_k}\right)\left(\frac{P_j - \min_{1\leq k\leq p} P_k} {\max_{1\leq k\leq p} P_k - \min_{1\leq k\leq p} P_k}\right) \end{eqnarray}

15 Variable Selection Algorithm
Select important predictors of R from (X, X*A) using Lasso -- Select tuning parameter using BIC Select all X*A variables with nonzero S. -- Use predictors from 1. to form linear regression estimator of to form S. For step 1 we used Lasso with penalty parameter chosen by BIC:We chose Bayesian Information Criterion to select the penalty parameter (Zou, Hastie and Tibshirani, 2007) because of its conservative nature to ensure only strong predictors enter the model. We used Bic over CV Lasso as CV Lasso selects too many variables; the estimated value or probably more precisely the estimated optimal policy seems to be very sensitive to the inclusion of spurious interactions. CV Lasso tends to include several spurious interactions, which when used in step 1 caused the U and S scores performance to suffer noticeably. But we still wanted to include interactions when selecting predictive variables, so we had to use a method in step 1 that would be far more conservative in its selection of interaction variables. For step 2 we included the variables selected in step 1 to decrease variability of estimates S is calculated one interaction variable at a time; we look for qualitative interactions individually using an approach which rates each variable in X based on its potential for a qualitative interaction with the action. (using linear models)

16 Lasso Lasso on (X, A, XA) (Tibshirani, 1996)
Lasso minimization criterion: where Zi is the vector of predictors for patient i, λ is a penalty parameter Coefficient for A not penalized Value of λ chosen by Bayesian Information Criterion (BIC) (Zou, Hastie & Tibshirani, 2007)

17 Variable Selection Algorithm
Rank order (X, X*A) variables selected in steps 1 & 2 using a weighted Lasso -- Weight is 1 if variable is not an interaction -- Otherwise weight for kth interaction is is a small positive number. -- Produces a combined ranking of the selected (X, X*A) variables (say p variables). In experimentation we found setting epsilon =# of interactions with non-zero S score (all variables which indicate different subjects should get different treatments) divided by sample size to work well. For step 3 we used a weighted Lasso to obtain a ranking over the variables in steps 1 and 2 to create the nested subsets. The weighting scheme gave main effects a weight of 1 and gave interactions a weight between 0 and 1 that was a non-increasing function of the ranking score S. We need step 3 because 2. just looks at interaction variables individually. It may be that once variable j is used, then we no longer need variable k. Step 3 forms a combined ranking that takes this into account. W = 1 if the variable is not an interaction W = 1 - (S)/[max(S)+ H/n] w = 1-\frac{S}{max(S)+\frac{H}{n}} For step 4 our criterion was the AGV criterion where in Latex code: \begin{equation*} AGV_k=\frac{(\hat{V}_k-\hat{V}_0)/k} {(\hat{V}_{m}-\hat{V}_0)/m}, \end{equation*} for $k = 1,...,m$, where $m = \argmax_k \hat{V}_k- \hat{V}_0$ and $\hat{V}_0$ is the estimated Value of the policy $\hat{\pi}^*_0 = \argmax_a

18 Variable Selection Algorithm
Choose between variable subsets using a criterion that trades off maximal value of information and complexity. -- The ordering of the p variables creates p subsets of variables. Estimate the value of information for each of the p subsets -- Select the subset, k with largest criterion that trades off between the complexity and the observed mean response (unconditional mean response) of each of the models It is similar in idea to the adjusted R2 value. The model with j¤ = arg max_k V_k variables is akin to a saturated model, because the addition of more variables does not improve the Value of the model. Thus the denominator is the observed maximum gain in value, among the different variable subsets, divided by j¤, an estimate of the degrees of freedom used to achieve that gain in Value. The numerator then measures the gain in Value of the intermediate model, the model with k variables, divided by k, the estimated degrees of freedom needed to achieve that gain in Value.

19 Simulations Data simulated under wide variety of realistic decision making scenarios (with and without qualitative interactions) Used X from the CBASP study, generated new A and R Compared: New method: S with variable selection algorithm Standard method: BIC Lasso on (X, A, XA) 1000 simulated data sets: recorded percentage of time each variable’s interaction with treatment was selected for each method Data Generation: we randomly selected rows with replacement from the observation matrix from the Nefazodone CBASP trial data. We then generated new actions and new responses. For each generative model, we used main effect coefficients for X estimated in an analysis of the real data set. Interaction variables were randomly selected. The treatment, qualitative interaction and non-qualitative interaction coefficients were set using a variant of Cohen's D: ( in Latex code) \begin{equation} D = \frac{\beta \sqrt{Var(R|X,A)}}{\sqrt{Var(X_j)}} \end{equation} We maintained the definitions of `small' and `moderate' effect sizes suggested by Cohen as D = 0.2 and D=0.5 respectively.

20 Simulation Results Generative Model 0.5 -0.03 0.1 0.00 1.1 0.23 0.2
Ave # of Spurious Interactions Selected over BIC LASSO Ave % increase in Value over BIC LASSO* No Interactions 0.5 -0.03 Non-qualitative Interactions Only 0.1 0.00 Qualitative Interaction Only 1.1 0.23 Both Qualitative and Non-qualitative Interactions 0.2 0.39 interaction effect is always small as defined by cohen’s d beta=.2 *std(x_j)/residual std generative models 2,3,5,6 (2) Main e®ects of X, moderate treatment e®ect and no interactions with treatment (3) Main e®ects of X, moderate treatment e®ect, multiple medium to small non-qualitative interactions with treatment, no qualitative interaction with treatment (5) Main e®ects of X, small treatment e®ect, small qualitative interaction with a continuous variable, no non-qualitative interactions (6) Main e®ects of X, small treatment e®ect, multiple moderate to small non-qualitative interactions with treatment, small to moderate qualitative interaction with a binary variable and treatment When there are both qualitative and non-qualitative interactions bic lasso picks up the non-qualitative interactions. All "Ave % increase in E[R] over BIC LASSO" are significant except for the non-qualitative interactions only model 1)There are 1000 simulated data sets, each of size n=440 like the Nefazodone CBASP data set. The criterion for selecting a tailoring variable depends on the method. For Lasso it is the variables with non-zero coefficients for the chosen penalty parameter. For the new method S it is the variables in the subset chosen by the AGV criterion. A chosen interaction is deemed spurious if it was not in the generative model. I did not test for significance in Ave difference in spurious variables selected (I'm pretty sure they are probably all signififcant) * Over the total possible increase; data sets each of size 440

21 Simulation Results Pros: when the model contained qualitative interactions, the new method gave significant increases in expected response over BIC-Lasso Cons: the new method resulted in a slight increase in the number of spurious interactions over BIC-Lasso

22 Nefazodone - CBASP Trial
Aim of the Nefazodone CBASP trial – to compare efficacy of three alternate treatments for major depressive disorder (MDD): Nefazodone, Cognitive behavioral-analysis system of psychotherapy (CBASP) Nefazodone + CBASP Which variables might help tailor the depression treatment to each patient?

23 Nefazodone - CBASP Trial
For our analysis we used data from 440 patients with X 61 baseline variables A Nefazodone vs. Nefazodone + CBASP R Hamilton’s Rating Scale for Depression score, post treatment R=34-hamd high R is good

24 Method Application and Confidence Measures
When applying new method to real data it is desirable to have a measure of reliability and to control family-wise error rate We used bootstrap sampling to assess reliability On each of 1000 bootstrap samples: Run variable selection method Record the interaction variables selected Calculate selection percentages over bootstrap samples

25 Error Rate Thresholds To help control family-wise error rate, compute the following inclusion thresholds for selection percentages: Repeat 100 times Permute interactions to remove effects from the data Run method on 1000 bootstrap samples of permuted data Calculate selection percentages over bootstrap samples Record largest selection percentage over the p interactions Threshold: (1-α)th percentile over 100 max selection percentages Select all interactions with selection percentage greater than threshold For threshold we permuted X*A with in the (X,A,X*A) data matrix Only interaction variables with selection percentages above the thresholds should be selected The latest update on the threshold simulations for new method S is: 26.5% of the simulations had a variable over the 70% threshold 20% of the simulations had a variable over the 80% threshold 10% of the simulations had a variable over the 90% threshold The latest update on the threshold simulations for BIC Lasso is: 25% of the simulations had a variable over the 70% threshold 10.5% of the simulations had a variable over the 90% threshold This is from 102 simulated runs.

26 Error Rate Thresholds When tested in simulations using new method, error rate threshold effectively controlled family-wise error rate This augmentation of bootstrap sampling and thresholding was also tested on BIC Lasso and effectively controlled family-wise error rate in simulations

27 Nefazodone - CBASP Trial
OCD ALC ALC OCD Blue solid line is 90% threshold, green dotted line 80% threshhold (e.g. 80% of time no spurious variable has this high of a % selected) Selection percentages shown are actually the adjusted selection percentages: the absolute value of the number of times an interaction is selected with a positive coefficient minus the number of times an interaction is selected with a negative coefficient BIC Lasso Selected Obsessive Compulsive Disorder when using 80% threshold New Method selected past Alcohol Abuse when using 80% threshold Regression Analyses illustrate the potential qualitative interaction with the variable “past alcohol dependence” But not with the variable that appears second highest in the standard method “past OCD”

28 Interaction Plot Green bars show density of subjects.
Here R is coded so High is good. (R=34-Hamd) It is not clear why the people with alcohol abuse/dependence history would do better on med than people without this history (red line slopes up to right). This may be due to how people were enrolled in trial (of the people with alcohol abuse/dependence history only the really motivated enrolled in trial whereas of the people with no history of alcohol abuse/dependence both motivated and unmotivated people enrolled) Also note that from the error bars, the response to medication is about the same for people with and without a prior alcohol abuse/dependence history. (blue line is flat)

29 Interaction Plot Green bars show density of subjects.
Here R (34-hamd) is coded so High is good.

30 Discussion This method provides a list of potential tailoring variables while reducing the number of false leads. Replication is required to confirm the usefulness of a tailoring variable. Our long term goal is to generalize this method so that it can be used with data from Sequential, Multiple Assignment, Randomized Trials as illustrated by STAR*D.

31 Email Susan Murphy at samurphy@umich.edu for more information!
This seminar can be found at ASA ppt Support: NIDA P50 DA10075, NIMH R01 MH and NSF DMS Thanks for technical and data support go to A. John Rush, MD, Betty Jo Hay Chair in Mental Health at the University of Texas Southwestern Medical Center, Dallas Martin Keller and the investigators who conducted the trial `A Comparison of Nefazodone, the Cognitive Behavioral-analysis System of Psychotherapy, and Their Combination for Treatment of Chronic Depression’

32 Interaction Plot Green bars show density of subjects.
Here R is coded so High is good. (R=34-Hamd) It is not clear why the people with alcohol abuse/dependence history would do better on med than people without this history. This may be due to how people were enrolled in trial (of the people with alcohol abuse/dependence history only the really motivated enrolled in trial whereas of the people with no history of alcohol abuse/dependence both motivated and unmotivated people enrolled) Also note that from the error bars, the response to medication is about the same for people with and without a prior alcohol abuse/dependence history.

33 Interaction Plot Green bars show density of subjects.
Here R (34-hamd) is coded so High is good.

34 Lasso Weighting Scheme
Lasso minimization criterion equivalent to: so smaller wj means greater importance Weights where vj = for predictive variables vj = for prescriptive variables epsilon=H/n

35 AGV Criterion For a subset of k variables, X{k} the Average Gain in Value ( AGV) criterion is where The criterion selects the subset of variables with the maximum proportion of increase in E[R] per variable a* = max(a)E[R|A=a]

36 Simulation Results (S-score)
× Qualitative Interaction  Spurious Interaction Plots are of the selection percentages for the interaction variables across the 1000 samples Top Model: main effects of X, small treatment effect, multiple moderate to small non-qualitative interactions with treatment, small to moderate qualitative interaction with a binary variable and treatment Bottom Model: main effects of X, small treatment effect, multiple moderate to small non-qualitative interactions with treatment, small to moderate qualitative interaction with a continuous variable and treatment Selection percentages shown are actually the adjusted selection percentages: the absolute value of the number of times an interaction is selected with a positive coefficient minus the number of times an interaction is selected with a negative coefficient × Qualitative Interaction  Non-qualitative Interaction  Spurious Interaction


Download ppt "Variable Selection for Tailoring Treatment"

Similar presentations


Ads by Google