1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008.

Slides:



Advertisements
Similar presentations
Assessing the Effects of Time-varying Predictors or Treatments: A Conceptual Discussion Daniel Almirall VA Medical Center, HSRD Duke Medical Center, Dept.
Advertisements

Piloting and Sizing Sequential Multiple Assignment Randomized Trials in Dynamic Treatment Regime Development 2012 Atlantic Causal Inference Conference.
Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.
Experimental Trials. Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early.
Inference for Clinical Decision Making Policies D. Lizotte, L. Gunter, S. Murphy INFORMS October 2008.
Using Clinical Trial Data to Construct Policies for Guiding Clinical Decision Making S. Murphy & J. Pineau American Control Conference Special Session.
Experimenting to Improve Clinical Practice S.A. Murphy AAAS, 02/15/13 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
 Confounders are usually controlled with the “standard” response regression model.  The standard model includes confounders as covariates in the response.
Methodology for Adaptive Treatment Strategies for Chronic Disorders: Focus on Pain S.A. Murphy NIH Pain Consortium 5 th Annual Symposium on Advances in.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan JSM: August, 2005.
SMART Designs for Constructing Adaptive Treatment Strategies S.A. Murphy 15th Annual Duke Nicotine Research Conference September, 2009.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
Screening Experiments for Developing Dynamic Treatment Regimes S.A. Murphy At ICSPRAR January, 2008.
1 Dynamic Treatment Regimens S.A. Murphy PolMeth XXV July 10, 2008.
SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have CPDD June, 2005.
Dynamic Treatment Regimes: Challenges in Data Analysis S.A. Murphy Survey Research Center January, 2009.
Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Sizing a Trial for the Development of Adaptive Treatment Strategies Alena I. Oetting The Society for Clinical Trials, 29th Annual Meeting St. Louis, MO.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy NIDA DESPR February, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan PSU, October, 2005 In Honor of Clifford C. Clogg.
Planning Survival Analysis Studies of Dynamic Treatment Regimes Z. Li & S.A. Murphy UNC October, 2009.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy RWJ Clinical Scholars Program, UMich April, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Herbert E. Robbins Collegiate Professorship in Statistics.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
Methodology for Adaptive Treatment Strategies R21 DA S.A. Murphy For MCATS Oct. 8, 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Yale: November, 2005.
Methods for Estimating the Decision Rules in Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IBC/ASC: July, 2004.
1 Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy MD Anderson December 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy ISCTM, 2007.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.
1 Dynamic Treatment Regimes: Interventions for Chronic Conditions (such as Poverty or Criminality?) S.A. Murphy Univ. of Michigan In Honor of Clifford.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy Symposium on Causal Inference Johns Hopkins, January, 2006.
Experiments and Dynamic Treatment Regimes S.A. Murphy At NIAID, BRB December, 2007.
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.
Adaptive Treatment Strategies S.A. Murphy CCNIA Proposal Meeting 2008.
Practical Application of Adaptive Treatment Strategies in Trial Design and Analysis S.A. Murphy Center for Clinical Trials Network Classroom Series April.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Adaptive Treatment Design and Analysis S.A. Murphy TRC, UPenn April, 2007.
Adaptive Treatment Strategies: Challenges in Data Analysis S.A. Murphy NY State Psychiatric Institute February, 2009.
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.
Overview of Adaptive Treatment Regimes Sachiko Miyahara Dr. Abdus Wahed.
SMART Case Studies Module 3—Day 1 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work, Chicago, Illinois, June
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have NDRI April, 2006.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy NIDA Meeting on Treatment and Recovery Processes January, 2004.
Machine Learning 5. Parametric Methods.
Designing An Adaptive Treatment Susan A. Murphy Univ. of Michigan Joint with Linda Collins & Karen Bierman Pennsylvania State Univ.
SMART Trials for Developing Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Designs NCDEU, 2006.
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008

2 Outline –Dynamic Treatment Regimes –Advances –Inferential Challenges Incomplete, primitive, mechanistic models Measures of Confidence

3

4 Dynamic treatment regimes (e.g. policies) are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. k Stages for one individual Observation available at j th stage Action at j th stage (usually a treatment)

5 k Stages History available at j th decision “Reward” following j th decision point (r j is a known function) Primary Outcome:

6 Goal : Construct decision rules that input information in the history at each stage and output a recommended action; these decision rules should lead to a maximal mean Y. The dynamic treatment regime (policy) is the sequence of decision rules. In future one selects actions as:

7 Types of Data Large Observational Data Sets –Noise in data –Actions are not manipulated by scientist (causal inference methods required) –Actions are measured with error –Moderate to small number of variables Small Randomized Clinical Trials – Actions are manipulated by scientist –Unknown causes requires causal inference methods –Moderate to large number of variables

8 Reality

9 Constructing Dynamic Treatment Regimes –Why is this more than a standard control problem? High quality mechanistic models are often unavailable. (Unknown, complex, system dynamics) Even when such models are available often they do not adequately simulate the interrelationships between observations and how the actions might impact the observations because there are strong behavioral and contextual influences.

10 Advances Methods for Constructing Dynamic Treatment Regimes Likelihood-based (model conditional distribution of observations in each state given past history) Late stage cancer (Thall et al. 2000, 2002, 2007) Some HIV/AIDS (Davidian & colleagues, 2007)

11 Constructing Dynamic Treatment Regimes Why is this more than a standard reinforcement learning problem? Unknown causes of observations in system dynamics (violates POMDP assumptions) Large data sets in which actions are manipulated are unavailable. POMDP: Partially Observed Markov Decision Process (used in “medical decision making.”)

12 Advances Methods for Constructing Dynamic Treatment Regimes Q-Learning (Watkins, 1989) (a popular method from reinforcement learning) ---generalization of regression ---may be misleading when actions are not randomized or there are unknown causes

13 Advances Methods for Constructing Dynamic Treatment Regimes (Deal with some causal inference issues in large observational data sets) A-Learning (Murphy, 2003; Robins, 2004) ---regression on a mean zero space Weighting (Murphy, et al., 2002; Tsiatis & coauthors, 2004, 2006; Hernan et al. 2006) ---weighted mean

14 Advances Experimental Design (Very applied, rather primitive) Adaptive Trial Design (Thall & colleagues, 2000, 2002) Sequential, Multiple Assignment, Randomized Trials (Murphy & colleagues, 2005, 2006, 2007) General Trial Design Issues (Lavori & Dawson, 1998, 2003, 2004)

15 STAR*D There was not statistical expertise available at the time the trial was designed. This trial is over and one can apply for access to this data One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.

16

17 ExTENd Ongoing study at U. Pennsylvania Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.

18 Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

19 Measures of Confidence We would like measures of confidence for the following: –Aid in dynamic treatment regime construction To assess if there is sufficient evidence that a particular observation (e.g. output of a biological test) should be part of the dynamic treatment regime. To assess if there is sufficient evidence that a group of actions lead to equivalent outcomes for a given observation. –To compare the mean outcome of two estimated dynamic treatment regimes (both estimated using the same data).

20 Challenges Measures of Confidence –Measures of confidence are essential Need to know when a subset of actions are equivalent –that is, when there is no or little evidence that one of the actions leads to a better outcome. It is important to minimize the number of observations that must be collected in the future clinical setting. –Randomized Clinical Trials

21 Measures of Confidence Traditional methods for constructing measures of conference require some form of differentiability (if frequentist properties are desired). Non-differentiable operations are used to construct dynamic treatment regimes. The mean of the outcome Y following use of a dynamic treatment regime is a non- differentiable function of the regime.

22 Example: Q-learning Generalization of regression to multistage decisions. Move backward through time as in dynamic programming. H j is the history available at stage j k=2 stages in following

23 (k=2) Q-learning

24 Q-learning with Data Assume actions, A j are randomized. S j is a summary of the information available at and prior to stage j Binary actions in the following

25 Approximate A Simple Version of Q-Learning –binary actions Stage 2 regression: Use least squares with outcome, Y, and covariates to obtain Set Stage 1 regression: Use least squares with outcome, and covariates to obtain

26 Decision Rules:

27 Inference is a non-regular problem

28 When do we have non-regularity?

29 Non-regularity (Bootstrap & Taylor series-based estimators of standard errors & Bayesian methods have poor frequentist properties......)

30 Simulation Example Generative Model: Y =1 + γ 1 A 1 + γ 2 A 2 + γ 3 A 1 A 2 + N (0, 1); A 1, A 2 coded {0, 1}; no S j ’s Parameter of interest: β 1 in E[ max a 2 E[Y| A 1, A 2 =a 2 ]]=α 1 + β 1 A 1 β 1 = γ 1 + (γ 2 + γ 3 ) + - γ simulated data sets

31 Sample SizeAsymptotic Normality Percentile Bootstrap Bayesian Confidence Rate γ 1 =γ 2 = γ 3 =0 so β 1 =0 Confidence rate should be.95

32 A Challenge! The goal is to conduct inference on parameters in the dynamic treatment regime. I’ve worked on this problem for 3 years (!) and every solution I’ve formulated has unsatisfactory drawbacks. Can you produce a good solution; a solution that can be used in REAL LIFE to analyze clinical trial data?!

33 Measures of Confidence We would like measures of confidence for the following: –Aid in Policy Construction To assess if there is sufficient evidence that a particular observation (e.g. output of a biological test) should be part of the policy. To assess if there is sufficient evidence that a subset of the actions lead to better rewards for a given observation than the remaining actions. –To compare the mean outcomes of two estimated policies (both estimated using the same data).

34 Single Stage (k=1) Find a prediction interval for the mean outcome if a particular estimated policy (here one decision rule) is employed. Action A is binary in {-1,1}. Suppose the decision rule is of form We do not assume the Bayes decision boundary is linear.

35 Single Stage (k=1) Mean outcome following this policy is is the randomization probability

36 Classification Misclassification rate for a given decision rule (classifier) where V is defined by (A is the {-1,1} classification; O 1 is the observation; β T O 1 is a linear classification boundary)

37 Prediction Interval for Two problems V(β) is not necessarily smooth in β. We don’t know V so V must be estimated as well. Data set is small so overfitting is a problem.

38 Simulation Example Population: Ionosphere data from the UCI repository, 351 samples, O is composed of 9 covariates, A is binary Use least squares to form classification rule

39 “95% Prediction Intervals” Sample Size Percentile Bootstrap Adjusted Bootstrap Naïve Binomial Our Method Confidence rate should be.95

40 Prediction Interval for Our method Obtains a prediction interval for a smooth upper bound on Our method is generally too conservative

41 A Challenge! Statistical methods for constructing the policy/classifier and providing an evaluation of the policy/classifier should use same small data set. Can you make an advance?

42 Discussion These are real problems and the need for advances in statistical methods is great.

43 Discussion: Further Open Problems These are real problems and the need for advances in statistical methods is great. High level of interest in clinical medicine research. Developing methods for variable selection in decision making (in addition to variable selection for prediction)

44 Discussion: Further Open Problems Model selection when goal is constructing good policies. Feature Construction Methods for producing composite outcomes (Y) –High quality elicitation of functionality

45 This seminar can be found at: seminars/ICSPRAR01.08Plenary.ppt me with questions or if you would like a copy:

46 Studies under review H. Jones study of drug-addicted pregnant women (goal is to reduce cocaine/heroin use during pregnancy and thereby improve neonatal outcomes) J. Sacks study of parolees with substance abuse disorders (goal is reduce recidivism and substance use)

47 Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse

48 Sack’s Study of Adaptive Transitional Case Management Standard Services Standard TCM Random assignment: 4 wks Response Standard TCM Augmented TCM Standard TCM Nonresponse

49 Adaptive Treatment for ADHD Ongoing study at the State U. of NY at Buffalo (B. Pelham) Goal is to learn how best to help children with ADHD improve functioning at home and school.

50 ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase dose of medication with monthly changes as needed Random assignment: B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications based on impairment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Add medication; bemod remains stable but medication dose may vary Random assignment: A3. Increase intensity of bemod with adaptive modifi- cations based on impairment Yes No Random assignment:

51 A class of “solutions”

52 Soft-max F is a distribution function (e.g. logistic) and λ is a tuning parameter. The choice of a data based tuning parameter λ is difficult.