Variable Selection for Optimal Decision Making Susan Murphy & Lacey Gunter University of Michigan Statistics Department Artificial Intelligence Seminar.

Slides:

Advertisements

Similar presentations

Experimental Design, Response Surface Analysis, and Optimization

Advertisements

Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.

Model Assessment, Selection and Averaging

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

 Confounders are usually controlled with the “standard” response regression model.  The standard model includes confounders as covariates in the response.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.

Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.

Statistics for Managers Using Microsoft® Excel 5th Edition

Statistics for Managers Using Microsoft® Excel 5th Edition

Structural uncertainty from an economists’ perspective

Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.

Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.

Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.

Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.

Ensemble Learning: An Introduction

Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.

Variable Selection for Tailoring Treatment

A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.

Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.

Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.

Meta-analysis & psychotherapy outcome research

Chapter 11 Multiple Regression.

Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.

Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.

Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.

Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.

1 Variable Selection for Tailoring Treatment S.A. Murphy, L. Gunter & J. Zhu May 29, 2008.

Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.

Determining How Costs Behave

2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,

Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Specification of a CRM model Ken Cheung Department of Biostatistics, Columbia University (joint work with Shing Columbia)

Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.

Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.

Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.

Motivation Using SMART research designs to improve individualized treatments Alena Scott 1, Janet Levy 3, and Susan Murphy 1,2 Institute for Social Research.

CpSc 881: Machine Learning

Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.

1 Simulation Scenarios. 2 Computer Based Experiments Systematically planning and conducting scientific studies that change experimental variables together.

Abstract VARIABLE SELECTION FOR DECISION MAKING IN MENTAL HEALTH Lacey Gunter 1,2, Ji Zhu 1, and Susan Murphy 1,2 Departments of Statistics 1 and Institute.

Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.

Does the brain compute confidence estimates about decisions?

Ensemble Classifiers.

Chapter 15 Multiple Regression Model Building

The simple linear regression model and parameter estimation

Zhipeng (Patrick) Luo December 6th, 2016

Determining How Costs Behave

Boosting and Additive Trees (2)

CSE 4705 Artificial Intelligence

Meta-analysis of joint longitudinal and event-time outcomes

What is Regression Analysis?

Linear Model Selection and regularization

CRISP: Consensus Regularized Selection based Prediction

DESIGN OF EXPERIMENTS by R. C. Baker

Presentation transcript:

Variable Selection for Optimal Decision Making Susan Murphy & Lacey Gunter University of Michigan Statistics Department Artificial Intelligence Seminar Joint work with Ji Zhu

Simple Motivating Example Nefazodone - CBASP Trial R Nefazodone Nefazodone + Cognative Behavioral-analysis System of Psychotherapy (CBASP) 50+ baseline covariates, both categorical and continuous

Complex Motivating Example

Outline Framework and notation for decision making Need for variable selection Variables that are important to decision making Introduce a new technique Simulated and real data results Future work

Optimal Decision Making 3 components: observations X = (X 1, X 2,…, X p ), action, A, and reward, R A policy, π, maps observations, X, to actions, A Policies compared via expected mean reward, V π = E π [R], called the Value of π (Sutton & Barto,1998) Long Term Goal: find a policy, π *, for which

Some Working Assumptions Data collection is difficult and expensive limited number of trajectories (<1000) training set with randomized actions many observations Finite horizon (only 1-4 time points) we will initially work with just one time point Noisy data with little knowledge about underlying system dynamics Little knowledge about which variables are most important for decision making

Simple Example A clinical trial to test two alternative drug treatments The goal: to discover which treatment is optimal for any given future patient Components X baseline variables such as patient's background, medical history, current symptoms, etc. A assigned treatment R patient's condition and symptoms post treatment

Variable Selection Multiple reasons for variable selection in decision making, for example Better performance: avoid inclusion of spurious variables that lead to bad policies Limited resources: only small number of variables can be collected when enacting policies in a real world setting Interpretability: policies with fewer variables are easier to understand

What are people currently using? Variable selection for reinforcement learning in medical settings predominantly guided by expert opinion Predictive selection techniques, such as Lasso (Loth et al., 2006) and decision trees (Ernst et al., 2005) have been proposed Good predictive variables are useful in decision making, but are only a small part of the puzzle Need variables that help determine optimal actions, variables that qualitatively interact with the action

Qualitative Interactions What is a qualitative interaction? X qualitatively interacts with A if at least two distinct, non-empty sets exist within the space of X for which the optimal action is different (Peto, 1982) No Interaction Non-qualitative Interaction Qualitative interaction Qualitative interactions tell us which actions are optimal

Qualitative Interactions We focus on two important factors The magnitude of the interaction between the variable and the action The proportion of patients whose optimal choice of action changes given knowledge of the variable big interaction small interaction big interaction big proportion big proportion small proportion

Variable Ranking for Qualitative Interactions We propose ranking the variables in X based on potential for a qualitative interaction with A We give a score for ranking the variables Given data on i = 1,.., n subjects with j = 1,…,p variables in X, along with an action, A, and a reward, R, for each subject For Ê[R| A=a] an estimator of E[R| A=a], define

Variable Ranking Components Ranking score based on 2 usefulness factors Interaction Factor: max = 1 – 0 = 1 min = 0.3 – 0.7 = D j = 1 – ( -.4) = 1.4

Variable Ranking Components Proportion Factor: 2 out of 7 subjects would change choice of optimal action given X j

Ranking Score Ranking Score: Score, U j, j=1,…,p can be used to rank the p variables in X based on their potential for a qualitative interaction with A

Variable Selection Algorithm 1. Select important main effects of X on R using some predictive variable selection method a. Choose tuning parameter value that gives best predictive model 2. Rank variables in X using score U j ; select top k in rank 3. Again use a predictive variable selection method, this time selecting among main effects of X from step 1, main effect of A, and ranked interactions from step 2 a. Choose tuning parameter value such that the total subset of variables selected leads to a policy with the highest estimated Value

Simulation Data simulated under wide variety of scenarios (with and without qualitative interactions) Used observation matrix, X, and actions, A, from a real data set Generated new rewards, R, based on several different realistic models Compared new ranking method U j versus a standard method 1000 repetitions: recorded percentage of time each interaction was selected for each method

Methods Used in Simulation Standard Method: Lasso on (X, A, X  A) (Tibshirani, 1996) The standard Lasso minimization criterion is where Z i is the vector of predictors for observation i and λ is a penalty parameter Coefficient for A, β p+1, not included in penalty term Value of λ chosen by cross-validation on the prediction error

Methods Used in Simulation New Method: 1. Select important main effects of X on R using Lasso a. Choose λ value by cross-validation on prediction error 2. Rank variables in X using score U j ; select top k in rank 3. Use Lasso to select among main effects of X chosen in step 1, main effect of A, and interactions chosen in step 2 a. Choose λ value such that the total subset of variables selected leads to a policy with the highest estimated Value

Simulation Results × Continuous Qualitative Interaction  Spurious Interaction × Binary Qualitative Interaction  Spurious Interaction

Simulation Results × Binary Qualitative Interaction  Non-qualitative Interaction  Spurious Interaction × Continuous Qualitative Interaction  Non-qualitative Interaction  Spurious Interaction

Depression Study Analysis Data from a randomized controlled trial comparing treatments for chronic depression (Keller et al., 2000) n = 440 patients, p = 64 observation variables in X, actions, A = Nefazodone or A = Nefazodone + Cognitive psychotherapy (CBASP), Reward, R = Hamilton Rating Scale for Depression score

Depression Study Results Ran both methods on 1000 bootstrap samples Resulting selection percentages: ALC2 ALC1 Som Anx OCD ALC2

Inclusion Thresholds Based on previous plots, which variables should we select? Need inclusion thresholds Idea: remove effect of X on R from data, then run algorithm to determine maximum percentage of selections this tells us the noise threshold variables with percentages above this threshold are selected

Inclusion Thresholds Do 100 times Randomly assign the observed rewards to different subjects given a particular action Run the methods on new data Record the variables that were selected by each method Threshold: largest percentage of time a variable was selected over the 100 iterations

Thresholds for Depression Study We should disregard any interactions selected 6% of the time or less when using either method

Threshold on Results New method U includes 2 indicator variables for Alcohol problems and Somatic Anxiety Score Standard Lasso includes 39 variables! ALC2 ALC1 Som Anx

Future Work Extend algorithm to select variables for multiple time points How best to do this? What rewards to use at each time point Do we need to adjust the distribution of our X based on prior actions What order should variable selection be done

Other Issues To Think About Do we need to account for variability in our estimate of E[R| X j, A=a] over different X j Can we reasonably estimate the value of a derived policy from a fixed data set collected under random actions when the number of time points gets larger? Any other issues?

References & Acknowledgements For more information see: L. Gunter, J. Zhu, S.A. Murphy (2007). Variable Selection for Optimal Decision Making. Technical Report, Department of Statistics, University of Michigan. This work was partially supported by NIH grants: R21 DA019800,K02 DA15674,P50 DA10075 Technical and data support A. John Rush, MD, Betty Jo Hay Chair in Mental Health at the University of Texas Southwestern Medical Center, Dallas Martin Keller and the investigators who conducted the trial `A Comparison of Nefazodone, the Cognitive Behavioral- analysis System of Psychotherapy, and Their Combination for Treatment of Chronic Depression’

Addressing Concerns Many Biostat literature discourage looking for qualitative interactions and are very skeptical when new interactions are found, why is this? Qualitative interactions are hard to find, have small effects Too many people fishing without disclosing Strict entry criteria for most clinical trials, thus small variability in X precludes looking at avoid looking at interesting subgroups How are we addressing these concerns? Testing new algorithms in multiple settings where no qualitative interactions exist

No Interaction: What can we expect? No Qualitative Interactions No relationship between (X, A, X*A) and R Main effects of X only Main effects of X & moderate effect of A only Everything but qualitative interactions

Estimating the Value 1. Fit selected variables into chosen estimator, Ê 2. Estimate optimal policy: 3. Estimate Value of by:

Estimating the Value (2 time points) 1. Estimate of the optimal policy: 2. Estimate Value of by: