11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.

Slides:



Advertisements
Similar presentations
Assessing the Effects of Time-varying Predictors or Treatments: A Conceptual Discussion Daniel Almirall VA Medical Center, HSRD Duke Medical Center, Dept.
Advertisements

Piloting and Sizing Sequential Multiple Assignment Randomized Trials in Dynamic Treatment Regime Development 2012 Atlantic Causal Inference Conference.
Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
Experimental Trials. Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early.
Using Clinical Trial Data to Construct Policies for Guiding Clinical Decision Making S. Murphy & J. Pineau American Control Conference Special Session.
Experimenting to Improve Clinical Practice S.A. Murphy AAAS, 02/15/13 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008.
Methodology for Adaptive Treatment Strategies for Chronic Disorders: Focus on Pain S.A. Murphy NIH Pain Consortium 5 th Annual Symposium on Advances in.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan JSM: August, 2005.
SMART Designs for Constructing Adaptive Treatment Strategies S.A. Murphy 15th Annual Duke Nicotine Research Conference September, 2009.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
Screening Experiments for Developing Dynamic Treatment Regimes S.A. Murphy At ICSPRAR January, 2008.
1 Dynamic Treatment Regimens S.A. Murphy PolMeth XXV July 10, 2008.
SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have CPDD June, 2005.
Dynamic Treatment Regimes: Challenges in Data Analysis S.A. Murphy Survey Research Center January, 2009.
Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Sizing a Trial for the Development of Adaptive Treatment Strategies Alena I. Oetting The Society for Clinical Trials, 29th Annual Meeting St. Louis, MO.
Screening Experiments for Dynamic Treatment Regimes S.A. Murphy At ENAR March, 2008.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy NIDA DESPR February, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan PSU, October, 2005 In Honor of Clifford C. Clogg.
Planning Survival Analysis Studies of Dynamic Treatment Regimes Z. Li & S.A. Murphy UNC October, 2009.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy RWJ Clinical Scholars Program, UMich April, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
Methodology for Adaptive Treatment Strategies R21 DA S.A. Murphy For MCATS Oct. 8, 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Yale: November, 2005.
Methods for Estimating the Decision Rules in Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IBC/ASC: July, 2004.
1 Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy MD Anderson December 2006.
Exploratory Analyses Aimed at Generating Proposals for Individualizing and Adapting Treatment S.A. Murphy BPRU, Hopkins September 22, 2009.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy ISCTM, 2007.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.
Susan Murphy, PI University of Michigan Acknowledgements: MCAT network and NIH The Goal To facilitate methodological collaborations necessary for producing.
1 Dynamic Treatment Regimes: Interventions for Chronic Conditions (such as Poverty or Criminality?) S.A. Murphy Univ. of Michigan In Honor of Clifford.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy Symposium on Causal Inference Johns Hopkins, January, 2006.
Experiments and Dynamic Treatment Regimes S.A. Murphy At NIAID, BRB December, 2007.
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.
Adaptive Treatment Strategies S.A. Murphy CCNIA Proposal Meeting 2008.
Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Strategies Convergence, 2008.
Practical Application of Adaptive Treatment Strategies in Trial Design and Analysis S.A. Murphy Center for Clinical Trials Network Classroom Series April.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Adaptive Treatment Design and Analysis S.A. Murphy TRC, UPenn April, 2007.
Adaptive Treatment Strategies: Challenges in Data Analysis S.A. Murphy NY State Psychiatric Institute February, 2009.
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.
Overview of Adaptive Treatment Regimes Sachiko Miyahara Dr. Abdus Wahed.
SMART Case Studies Module 3—Day 1 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work, Chicago, Illinois, June
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have NDRI April, 2006.
Motivation Using SMART research designs to improve individualized treatments Alena Scott 1, Janet Levy 3, and Susan Murphy 1,2 Institute for Social Research.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy NIDA Meeting on Treatment and Recovery Processes January, 2004.
SMART Trials for Developing Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Designs NCDEU, 2006.
Secondary Aims Using Data Arising from a SMART Module 6—Day 2 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work,
Presentation transcript:

11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

2 Outline Dynamic Treatment Regimes Example Experimental Designs & Challenges Q-Learning & Challenges

3 Dynamic treatment regimes are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize clinical practice. k Stages for one individual Observation available at j th stage Action at j th stage (usually a treatment)

4 Example of a Dynamic Treatment Regime Adaptive Drug Court Program for drug abusing offenders. Goal is to minimize recidivism and drug use. Marlowe et al. (2008, 2011)

5 Adaptive Drug Court Program

6 Goal : Construct decision rules that input information available at each stage and output a recommended decision; these decision rules should lead to a maximal mean Y where Y is a function of The dynamic treatment regime is a sequence of two decision rules: k=2 Stages

7 Outline Dynamic Treatment Regimes Example Experimental Designs & Challenges Q-Learning & Challenges

8 Data for Constructing the Dynamic Treatment Regime: Subject data from sequential, multiple assignment, randomized trials. At each stage subjects are randomized among alternative options. A j is a randomized action with known randomization probability. binary actions with P[A j =1]=P[A j =-1]=.5

9 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase dose of medication with monthly changes as needed Random assignment: B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications based on impairment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Add medication; bemod remains stable but medication dose may vary Random assignment: A3. Increase intensity of bemod with adaptive modifi- cations based on impairment Yes No Random assignment:

10 Oslin’s ExTENd Study Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

11 Kasari Autism Study B. JAE + AAC 12 weeks Assess- Adequate response? B!. JAE+AAC B2. JAE +AAC ++ No A. JAE+ EMT 12 weeks Assess- Adequate response? JAE+EMT JAE+EMT+++ Random assignment: JAE+AAC Yes No Random assignment: Yes

Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse

13 Challenges Goal of trial may differ –Experimental designs for settings involving new drugs (cancer, ulcerative colitis) –Experimental designs for settings involving already approved drugs/treatments Choice of Primary Hypothesis/Analysis & Secondary Hypothesis/Analysis Longitudinal/Survival Primary Outcomes Sample Size Formulae

14 Outline Dynamic Treatment Regimes Example Experimental Designs & Challenges Q-Learning & Challenges

15 Secondary Analysis: Q-Learning Q-Learning (Watkins, 1989; Ernst et al., 2005; Murphy, 2005) (a popular method from computer science) Optimal nested structural mean model (Murphy, 2003; Robins, 2004) The first method is an inefficient version of the second method when (a) linear models are used, (b) each stages’ covariates include the prior stages’ covariates and (c) the treatment variables are coded to have conditional mean zero.

16 Goal : Construct for which is maximal. k=2 Stages is called the value and the maximal value is denoted by

17 Idea behind Q-Learning

18 There is a regression for each stage. Simple Version of Q-Learning – Stage 2 regression: Regress Y on to obtain Stage 1 regression: Regress on to obtain

19 for subjects entering stage 2: is a predictor of is the predicted end of stage 2 response when the stage 2 treatment is equal to the “best” treatment. is the dependent variable in the stage 1 regression for patients moving to stage 2

20 A Simple Version of Q-Learning – Stage 2 regression, (using Y as dependent variable) yields Arg-max over a 2 yields

21 A Simple Version of Q-Learning – Stage 1 regression, (using as dependent variable) yields Arg-max over a 1 yields

22 Confidence Intervals

23 Non-regularity Limiting distribution of is non-regular (Robins, 2004) Problematic area in parameter space is around for which Standard asymptotic approaches invalid without modification (see Shao, 1994; Andrews, 2000).

24 Non-regularity – Problematic term in is where This term is well-behaved if is bounded away from zero. We want to form an adaptive confidence interval.

25 Idea from Econometrics In nonregular settings in Econometrics there are a fixed number of easily identified “bad” parameter values at which you have nonregular behavior of the estimator Use a pretest (e.g. an hypothesis test) to test if you are near a “bad” parameter value; if the pretest rejects, use standard critical values to form confidence interval; if the pretest accepts, use the maximal critical value over all possible local alternatives. (Andrews & Soares, 2007; Andrews & Guggenberger, 2009)

Construct smooth upper and lower bounds so that for all n The upper/lower bounds use a pretest: Embed in the formula for a pretest of based on 26 Our Approach

27 Non-regularity The upper bound adds to :

28 Confidence Interval Let and be the bootstrap analogues of and is the probability with respect to the bootstrap weights. is the quantile of ; is the quantile of. Theorem: Assume moment conditions, invertible var- covariance matrices and, then

29 Adaptation Theorem: Assuming finite moment conditions and invertible var-covariance matrices are invertible,, then each converge, in distribution, to the same limiting distribution (the last two in probability).

30 Some Competing Methods Soft-thresholding (ST) Chakraborty et al. (2009) Centered percentile bootstrap (CPB) Plug-in pretesting estimator (PPE) Generative Models Nonregular (NR): Nearly Nonregular (NNR): Regular (R): n=150, 1000 Monte Carlo Reps, 1000 Bootstrap Samples Empirical Study

31 Example –Two Stages, two treatments per stage TypeCPBSTPPEACI NR.93*.95 (.34).93*.99(.50) R.94(.45).92*.93*.95(.48) NR.93*.76*.90*.96(.49) NNR.93*.76*.90*.97(.49) Size(width)

32 Example –Two Stages, three treatments in stage 2 TypeCPBPPEACI NR.93* 1.0(.72) R.94(.56).92*.96(.63) NR.89*.86*.97(.67) NNR.90*.86*.97(.67) Size(width)

33 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase dose of medication with monthly changes as needed Random assignment: B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications based on impairment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Add medication; bemod remains stable but medication dose may vary Random assignment: A3. Increase intensity of bemod with adaptive modifi- cations based on impairment Yes No Random assignment:

34 (X 1, A 1, R 1, X 2, A 2, Y) –Y = end of year school performance –R 1 =1 if responder; =0 if non-responder –X 2 includes the month of non-response, M 2, and a measure of adherence in stage 1 (S 2 ) –S 2 =1 if adherent in stage 1; =0, if non-adherent –X 1 includes baseline school performance, Y 0, whether medicated in prior year (S 1 ), ODD (O 1 ) –S 1 =1 if medicated in prior year; =0, otherwise. ADHD Example

35 Stage 2 regression for Y: Stage 1 outcome: ADHD Example

36 Stage 1 regression for Interesting stage 1 contrast: is it important to know whether the child was medicated in the prior year (S 1 =1) to determine the best initial treatment in the sequence? ADHD Example

37 Stage 1 treatment effect when S 1 =1: Stage 1 treatment effect when S 1 =0: ADHD Example 90% ACI (-0.48, 0.16) (-0.05, 0.39)

38 IF medication was not used in the prior year THEN begin with BMOD; ELSE select either BMOD or MED. IF the child is nonresponsive and was non- adherent, THEN augment present treatment; ELSE IF the child is nonresponsive and was adherent, THEN select intensification of current treatment. Dynamic Treatment Regime Proposal

39 There are multiple ways to form ; what are the pros and cons? Improve adaptation by a pretest of High dimensional data; investigators want to collect real time data Feature construction & Feature selection Many stages or infinite horizon Challenges

40 This seminar can be found at: seminars/Bristol ppt Eric Laber or me for questions: or