Download presentation
Presentation is loading. Please wait.
Published byWhitney Cooper Modified over 9 years ago
1
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
2
2 Outline Treatment Policies Sequential Multiple Assignment Randomized Trials, “SMART Studies” Q-Learning/ Fitted Q Iteration Where we are going……
3
3 Treatment Policies are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize sequential decisions in clinical practice. k Stages for each individual Observation available at j th stage Action at j th stage (usually a treatment)
4
4 Example of a Treatment Policy Adaptive Drug Court Program for drug abusing offenders. Goal is to minimize recidivism and drug use. Marlowe et al. (2008, 2009, 2011)
5
5 Adaptive Drug Court Program
6
6 Usually k=2 Stages (Finite Horizon=2) Goal: Use a training set of n trajectories, each of the form (a trajectory per subject) to construct a treatment policy that outputs the actions, a i. The treatment policy should maximize total reward. The treatment policy is a sequence of two decision rules:
7
7 Randomized Trials What is a sequential, multiple assignment, randomized trial (SMART)? Each subject proceeds through multiple stages of treatment; randomization takes place at each stage. Exploration, no exploitation. Usually 2-3 treatment stages
8
8 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase intensity of present treatment Random assignment: B3. Augment with other treatment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Augment with other treatment Random assignment: A3. Increase intensity of present treatment Yes No Random assignment:
9
9 Oslin’s ExTENd Study Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse
10
Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse
11
11 Usually 2 Stages (Finite Horizon=2) Goal: Use a training set of n trajectories, each of the form (a trajectory per subject) to construct treatment policy. The treatment policy should maximize total reward. A j is a randomized action with known randomization probability. Here binary actions with P[A j =1]=P[A j =-1]=.5
12
12 Secondary Data Analysis: Q-Learning Q-Learning, Fitted Q Iteration, Approximate Dynamic Programming (Watkins, 1989; Ernst et al., 2005; Murphy, 2003; Robins, 2004) This results in a proposal for an optimal treatment policy. A subsequent randomized trial would evaluate the proposed treatment policy.
13
13 Goal: Use training set to construct for which the average value,, is maximal. 2 Stages—Terminal Reward Y The maximal average value is
14
14 Idea behind Q-Learning/Fitted Q
15
15 Use regression at each stage to approximate Q-function. Simple Version of Fitted Q-iteration – Stage 2 regression: Regress Y on to obtain Arg-max over a 2 yields
16
16 Value for subjects entering stage 2: is a predictor of is the dependent variable in the stage 1 regression for patients moving to stage 2
17
17 Simple Version of Fitted Q-iteration – Stage 1 regression: Regress on to obtain Arg-max over a 1 yields
18
Decision Rules: 18
19
19 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase intensity of present treatment Random assignment: B3. Augment with other treatment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Augment with other treatment Random assignment: A3. Increase intensity of present treatment Yes No Random assignment:
20
20 138 trajectories of form: (X 1, A 1, R 1, X 2, A 2, Y) Y = end of year school performance R 1 =1 if responder; =0 if non-responder X 2 includes the month of non-response, M 2, and a measure of adherence in stage 1 (S 2 ) –S 2 =1 if adherent in stage 1; =0, if non-adherent X 1 includes baseline school performance, Y 0, whether medicated in prior year (S 1 ), ODD (O 1 ) –S 1 =1 if medicated in prior year; =0, otherwise. ADHD
21
21 Stage 2 regression for Y: Decision rule is “ if child is non- responding then intensify initial treatment if, otherwise augment” Q-Learning using data on children with ADHD
22
22 Decision rule is “if child is non-responding then intensify initial treatment if., otherwise augment” Q-Learning using data on children with ADHD Decision Rule for Non-responding Children Initial Treatment =BMOD Initial Treatment=MED AdherentIntensify Not AdherentAugment
23
23 Stage 1 regression for Decision rule is, “Begin with BMOD if., otherwise begin with MED” ADHD Example
24
24 Decision rule is “Begin with BMOD if., otherwise begin with MED” Q-Learning using data on children with ADHD Initial Decision Rule Initial Treatment Prior MEDSMEDS No Prior MEDSBMOD
25
25 The treatment policy is quite decisive. We developed this treatment policy using a trial on only 138 children. Is there sufficient evidence in the data to warrant this level of decisiveness?????? Would a similar trial obtain similar results? There are strong opinions regarding how to treat ADHD. One solution –use confidence intervals. ADHD Example
26
26 ADHD Example Treatment Decision for Non-responders. Positive Treatment Effect Intensify 90% Confidence Interval Adherent to BMOD(-0.08, 0.69) Adherent to MED(-0.18, 0.62) Non-adherent to BMOD(-1.10, -0.28) Non-adherent to MED(-1.25, -0.29)
27
27 ADHD Example Initial Treatment Decision: Positive Treatment Effect BMOD 90% Confidence Interval Prior MEDS(-0.48, 0.16) No Prior MEDS(-0.05, 0.39)
28
28 IF medication was not used in the prior year THEN begin with BMOD; ELSE select either BMOD or MED. IF the child is nonresponsive and was non- adherent, THEN augment present treatment; ELSE IF the child is nonresponsive and was adherent, THEN select either intensification or augmentation of current treatment. Proposal for Treatment Policy
29
29 Where are we going?...... Increasing use of wearable computers (e.g smart phones, etc.) to both collect real time data and provide real time treatment. We are working on the design of studies involving randomization (soft-max or epsilon- greedy choice of actions) to develop/ continually improve treatment policies. Need confidence measures for infinite horizon problems
30
30 This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/MUCMD.08.10.12.pdf This seminar is based on work with many collaborators, some of which are: L. Collins, E. Laber, M. Qian, D. Almirall, K. Lynch, J. McKay, D. Oslin, T. Ten Have, I. Nahum-Shani & B. Pelham. Email with questions or if you would like a copy: samurphy@umich.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.