Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.

Slides:



Advertisements
Similar presentations
Piloting and Sizing Sequential Multiple Assignment Randomized Trials in Dynamic Treatment Regime Development 2012 Atlantic Causal Inference Conference.
Advertisements

Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.
Experimental Trials. Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early.
Using Clinical Trial Data to Construct Policies for Guiding Clinical Decision Making S. Murphy & J. Pineau American Control Conference Special Session.
Experimenting to Improve Clinical Practice S.A. Murphy AAAS, 02/15/13 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
1 Developing Dynamic Treatment Regimes for Chronic Disorders S.A. Murphy Univ. of Michigan RAND: August, 2005.
1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008.
Methodology for Adaptive Treatment Strategies for Chronic Disorders: Focus on Pain S.A. Murphy NIH Pain Consortium 5 th Annual Symposium on Advances in.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan JSM: August, 2005.
SMART Designs for Constructing Adaptive Treatment Strategies S.A. Murphy 15th Annual Duke Nicotine Research Conference September, 2009.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
Screening Experiments for Developing Dynamic Treatment Regimes S.A. Murphy At ICSPRAR January, 2008.
1 Dynamic Treatment Regimens S.A. Murphy PolMeth XXV July 10, 2008.
SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have CPDD June, 2005.
Dynamic Treatment Regimes: Challenges in Data Analysis S.A. Murphy Survey Research Center January, 2009.
Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Sizing a Trial for the Development of Adaptive Treatment Strategies Alena I. Oetting The Society for Clinical Trials, 29th Annual Meeting St. Louis, MO.
Screening Experiments for Dynamic Treatment Regimes S.A. Murphy At ENAR March, 2008.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy NIDA DESPR February, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan PSU, October, 2005 In Honor of Clifford C. Clogg.
Planning Survival Analysis Studies of Dynamic Treatment Regimes Z. Li & S.A. Murphy UNC October, 2009.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy RWJ Clinical Scholars Program, UMich April, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Herbert E. Robbins Collegiate Professorship in Statistics.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
Methodology for Adaptive Treatment Strategies R21 DA S.A. Murphy For MCATS Oct. 8, 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Yale: November, 2005.
Methods for Estimating the Decision Rules in Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IBC/ASC: July, 2004.
1 Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy MD Anderson December 2006.
Exploratory Analyses Aimed at Generating Proposals for Individualizing and Adapting Treatment S.A. Murphy BPRU, Hopkins September 22, 2009.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy ISCTM, 2007.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.
Susan Murphy, PI University of Michigan Acknowledgements: MCAT network and NIH The Goal To facilitate methodological collaborations necessary for producing.
1 Dynamic Treatment Regimes: Interventions for Chronic Conditions (such as Poverty or Criminality?) S.A. Murphy Univ. of Michigan In Honor of Clifford.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy Symposium on Causal Inference Johns Hopkins, January, 2006.
Experiments and Dynamic Treatment Regimes S.A. Murphy At NIAID, BRB December, 2007.
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.
Adaptive Treatment Strategies S.A. Murphy CCNIA Proposal Meeting 2008.
Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Strategies Convergence, 2008.
Practical Application of Adaptive Treatment Strategies in Trial Design and Analysis S.A. Murphy Center for Clinical Trials Network Classroom Series April.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Adaptive Treatment Design and Analysis S.A. Murphy TRC, UPenn April, 2007.
Adaptive Treatment Strategies: Challenges in Data Analysis S.A. Murphy NY State Psychiatric Institute February, 2009.
Overview of Adaptive Treatment Regimes Sachiko Miyahara Dr. Abdus Wahed.
SMART Case Studies Module 3—Day 1 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work, Chicago, Illinois, June
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.
Sequential, Multiple Assignment, Randomized Trials Module 2—Day 1 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work,
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have NDRI April, 2006.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy NIDA Meeting on Treatment and Recovery Processes January, 2004.
SMART Trials for Developing Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Designs NCDEU, 2006.
Secondary Aims Using Data Arising from a SMART Module 6—Day 2 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work,
Presentation transcript:

Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

2 Outline Treatment Policies Data Sources Q-Learning Confidence Intervals

3 Treatment Policies are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize sequential decisions in clinical practice. k Stages for each individual Observations available at j th stage Action at j th stage (usually a treatment)

4 Example of a Treatment Policy Adaptive Drug Court Program for drug abusing offenders. Goal is to minimize recidivism and drug use. Marlowe et al. (2008, 2009, 2011)

5 Adaptive Drug Court Program

6 k=2 Stages The treatment policy is a sequence of two decision rules: Goal: Use a data set of n trajectories, each of the form (a trajectory per subject) to construct a treatment policy. The treatment policy should produce maximal reward,

7 Why should a Machine Learning Researcher be interested in Treatment Policies? The dimensionality of the data available for constructing decision rules accumulates at an exponential rate with the stage. Need both feature construction as well as feature selection.

8 Outline Treatment Policies Data Sources Q-Learning Confidence Intervals

9 Experimental Data Data from sequential, multiple assignment, randomized trials: n subjects each yielding a trajectory. For 2 stages, the trajectory for each subject is of the form (Exploration, no exploitation.) A j is a randomized treatment action with known randomization probability. Here binary actions with P[A j =1]=P[A j =-1]=.5

10 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase intensity of present treatment Random assignment: B3. Augment with other treatment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Augment with other treatment Random assignment: A3. Increase intensity of present treatment Yes No Random assignment:

11 Oslin’s ExTENd Study Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse

13 Kasari Autism Study B. JAE + AAC 12 weeks Assess- Adequate response? B!. JAE+AAC B2. JAE +AAC ++ No A. JAE+ EMT 12 weeks Assess- Adequate response? JAE+EMT JAE+EMT+++ Random assignment: JAE+AAC Yes No Random assignment: Yes

14 Newer Experimental Designs Using Smart phones to collect data, X i ’s, in real time and to provide treatments, A i ’s, in real time to n subjects. The treatments, A i ’s, are randomized among a feasible set of treatment options. –The number of treatment stages is very large—want a Markovian property –Feature construction of states in Markov process

15 Observational data Longitudinal Studies Patient Registries Electronic Medical Record Data

16 Outline Treatment Policies Data Sources Q-Learning/ Fitted Q-Iteration Confidence Intervals

17 Secondary Data Analysis: Q-Learning Q-Learning, Fitted Q-Iteration, Approximate Dynamic Programming (Watkins, 1989; Ernst et al., 2005; Murphy, 2003; Robins, 2004) This results in a proposal for an optimal treatment policy. A subsequent randomized trial would evaluate the proposed treatment policy.

18 Goal: Use data to construct for which the average value,, is maximal. 2 Stages—Terminal Reward Y The maximal average value is

19 Idea behind Q-Learning/Fitted Q

20 Optimal Treatment Policy The optimal treatment policy is where

21 Use regression at each stage to approximate Q-function. Simple Version of Fitted Q-iteration – Stage 2 regression: Regress Y on to obtain Arg-max over a 2 yields

22 Value for subjects entering stage 2: is a predictor of is the dependent variable in the stage 1 regression for patients who moved to stage 2

23 Simple Version of Fitted Q-iteration – Stage 1 regression: Regress on to obtain Arg-max over a 1 yields

Decision Rules: 24

25 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase intensity of present treatment Random assignment: B3. Augment with other treatment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Augment with other treatment Random assignment: A3. Increase intensity of present treatment Yes No Random assignment:

trajectories of form: (X 1, A 1, R 1, X 2, A 2, Y) X 1 includes baseline school performance, Y 0, whether medicated in prior year (S 1 ), ODD (O 1 ) –S 1 =1 if medicated in prior year; =0, otherwise. R 1 =1 if responder; =0 if non-responder X 2 includes the month of non-response, M 2, and a measure of adherence in stage 1 (S 2 ) –S 2 =1 if adherent in stage 1; =0, if non-adherent Y = end of year school performance ADHD

27 Stage 2 regression for Y: Estimated decision rule is “ if child is non-responding then intensify initial treatment if, otherwise augment” Q-Learning using data on children with ADHD

28 Decision rule is “if child is non-responding then intensify initial treatment if., otherwise augment” Q-Learning using data on children with ADHD Decision Rule for Non-responding Children Initial Treatment =BMOD Initial Treatment=MED AdherentIntensify Not AdherentAugment

29 Stage 1 regression for Decision rule is, “Begin with BMOD if., otherwise begin with MED” ADHD Example

30 Decision rule is “Begin with BMOD if., otherwise begin with MED” Q-Learning using data on children with ADHD Initial Decision Rule Initial Treatment Prior MEDSMEDS No Prior MEDSBMOD

31 The treatment policy is quite decisive. We developed this treatment policy using a trial on only 138 children. Is there sufficient evidence in the data to warrant this level of decisiveness?????? Would a similar trial obtain similar results? There are strong opinions regarding how to treat ADHD. One solution –use confidence intervals. ADHD Example

32 Outline Treatment Policies Data Sources Q-Learning Confidence Intervals

33 ADHD Example Treatment Decision for Non-responders. Positive Treatment Effect  Intensify 90% Confidence Interval Adherent to BMOD(-0.08, 0.69) Adherent to MED(-0.18, 0.62) Non-adherent to BMOD(-1.10, -0.28) Non-adherent to MED(-1.25, -0.29)

34 ADHD Example Initial Treatment Decision: Positive Treatment Effect  BMOD 90% Confidence Interval Prior MEDS(-0.48, 0.16) No Prior MEDS(-0.05, 0.39)

35 IF medication was not used in the prior year THEN begin with BMOD; ELSE select either BMOD or MED. IF the child is nonresponsive THEN IF child was non-adherent, THEN augment present treatment; ELSE IF child was adherent, THEN select either intensification or augmentation of current treatment. Proposal for Treatment Policy

Constructing confidence intervals concerning treatment effects at stage 2 and stage 1. The stage 2 is classical regression (at least if is low dimensional); constructing confidence intervals is standard. Constructing confidence intervals for the treatment effects at stage 1 is challenging. 36 Confidence Intervals

Challenge: Stage 2 estimated value, is non- smooth in the estimators from the stage 2 regression--due to non-differentiability of the maximization: 37 Confidence Intervals for Stage 1 Treatment Effects

38 Non-regularity The estimated policy can change abruptly from training set to training set. Standard approximations used to construct confidence intervals perform poorly (Shao, 1994; Andrews, 2000). Problematic area in parameter space is around for which We combined a local generalization-type error bound with standard statistical confidence interval to produce a valid confidence interval.

39 Why is this non-smoothness, and the resulting inferential problems, relevant to high dimensional machine learning research? Sparsity assumptions in high dimensional data analysis  Thresholding Nonsmoothness at important parameter values

40 Where are we going? Increasing use of wearable computers (e.g smart phones, etc.) to both collect real time data and provide real time treatment. We are working on the clinical trial designs involving randomization (soft-max or epsilon- greedy choice of actions) so as to develop/ continually improve treatment policies. Need confidence measures for infinite horizon problems

41 This seminar can be found at: seminars/UAlberta pdf This seminar is based on work with many collaborators, some of which are: L. Collins, E. Laber, M. Qian, D. Almirall, K. Lynch, J. McKay, D. Oslin, T. Ten Have, I. Nahum-Shani & B. Pelham. with questions or if you would like a copy: