Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.

Similar presentations


Presentation on theme: "Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint."— Presentation transcript:

1 Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

2 2 Outline Treatment Policies Data Sources Q-Learning Confidence Intervals

3 3 Treatment Policies are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize sequential decisions in clinical practice. k Stages for each individual Observations available at j th stage Action at j th stage (usually a treatment)

4 4 Example of a Treatment Policy Adaptive Drug Court Program for drug abusing offenders. Goal is to minimize recidivism and drug use. Marlowe et al. (2008, 2009, 2011)

5 5 Adaptive Drug Court Program

6 6 k=2 Stages The treatment policy is a sequence of two decision rules: Goal: Use a data set of n trajectories, each of the form (a trajectory per subject) to construct a treatment policy. The treatment policy should produce maximal reward,

7 7 Why should a Machine Learning Researcher be interested in Treatment Policies? The dimensionality of the data available for constructing decision rules accumulates at an exponential rate with the stage. Need both feature construction as well as feature selection.

8 8 Outline Treatment Policies Data Sources Q-Learning Confidence Intervals

9 9 Experimental Data Data from sequential, multiple assignment, randomized trials: n subjects each yielding a trajectory. For 2 stages, the trajectory for each subject is of the form (Exploration, no exploitation.) A j is a randomized treatment action with known randomization probability. Here binary actions with P[A j =1]=P[A j =-1]=.5

10 10 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase intensity of present treatment Random assignment: B3. Augment with other treatment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Augment with other treatment Random assignment: A3. Increase intensity of present treatment Yes No Random assignment:

11 11 Oslin’s ExTENd Study Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

12 Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse

13 13 Kasari Autism Study B. JAE + AAC 12 weeks Assess- Adequate response? B!. JAE+AAC B2. JAE +AAC ++ No A. JAE+ EMT 12 weeks Assess- Adequate response? JAE+EMT JAE+EMT+++ Random assignment: JAE+AAC Yes No Random assignment: Yes

14 14 Newer Experimental Designs Using Smart phones to collect data, X i ’s, in real time and to provide treatments, A i ’s, in real time to n subjects. The treatments, A i ’s, are randomized among a feasible set of treatment options. –The number of treatment stages is very large—want a Markovian property –Feature construction of states in Markov process

15 15 Observational data Longitudinal Studies Patient Registries Electronic Medical Record Data

16 16 Outline Treatment Policies Data Sources Q-Learning/ Fitted Q-Iteration Confidence Intervals

17 17 Secondary Data Analysis: Q-Learning Q-Learning, Fitted Q-Iteration, Approximate Dynamic Programming (Watkins, 1989; Ernst et al., 2005; Murphy, 2003; Robins, 2004) This results in a proposal for an optimal treatment policy. A subsequent randomized trial would evaluate the proposed treatment policy.

18 18 Goal: Use data to construct for which the average value,, is maximal. 2 Stages—Terminal Reward Y The maximal average value is

19 19 Idea behind Q-Learning/Fitted Q

20 20 Optimal Treatment Policy The optimal treatment policy is where

21 21 Use regression at each stage to approximate Q-function. Simple Version of Fitted Q-iteration – Stage 2 regression: Regress Y on to obtain Arg-max over a 2 yields

22 22 Value for subjects entering stage 2: is a predictor of is the dependent variable in the stage 1 regression for patients who moved to stage 2

23 23 Simple Version of Fitted Q-iteration – Stage 1 regression: Regress on to obtain Arg-max over a 1 yields

24 Decision Rules: 24

25 25 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase intensity of present treatment Random assignment: B3. Augment with other treatment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Augment with other treatment Random assignment: A3. Increase intensity of present treatment Yes No Random assignment:

26 26 138 trajectories of form: (X 1, A 1, R 1, X 2, A 2, Y) X 1 includes baseline school performance, Y 0, whether medicated in prior year (S 1 ), ODD (O 1 ) –S 1 =1 if medicated in prior year; =0, otherwise. R 1 =1 if responder; =0 if non-responder X 2 includes the month of non-response, M 2, and a measure of adherence in stage 1 (S 2 ) –S 2 =1 if adherent in stage 1; =0, if non-adherent Y = end of year school performance ADHD

27 27 Stage 2 regression for Y: Estimated decision rule is “ if child is non-responding then intensify initial treatment if, otherwise augment” Q-Learning using data on children with ADHD

28 28 Decision rule is “if child is non-responding then intensify initial treatment if., otherwise augment” Q-Learning using data on children with ADHD Decision Rule for Non-responding Children Initial Treatment =BMOD Initial Treatment=MED AdherentIntensify Not AdherentAugment

29 29 Stage 1 regression for Decision rule is, “Begin with BMOD if., otherwise begin with MED” ADHD Example

30 30 Decision rule is “Begin with BMOD if., otherwise begin with MED” Q-Learning using data on children with ADHD Initial Decision Rule Initial Treatment Prior MEDSMEDS No Prior MEDSBMOD

31 31 The treatment policy is quite decisive. We developed this treatment policy using a trial on only 138 children. Is there sufficient evidence in the data to warrant this level of decisiveness?????? Would a similar trial obtain similar results? There are strong opinions regarding how to treat ADHD. One solution –use confidence intervals. ADHD Example

32 32 Outline Treatment Policies Data Sources Q-Learning Confidence Intervals

33 33 ADHD Example Treatment Decision for Non-responders. Positive Treatment Effect  Intensify 90% Confidence Interval Adherent to BMOD(-0.08, 0.69) Adherent to MED(-0.18, 0.62) Non-adherent to BMOD(-1.10, -0.28) Non-adherent to MED(-1.25, -0.29)

34 34 ADHD Example Initial Treatment Decision: Positive Treatment Effect  BMOD 90% Confidence Interval Prior MEDS(-0.48, 0.16) No Prior MEDS(-0.05, 0.39)

35 35 IF medication was not used in the prior year THEN begin with BMOD; ELSE select either BMOD or MED. IF the child is nonresponsive THEN IF child was non-adherent, THEN augment present treatment; ELSE IF child was adherent, THEN select either intensification or augmentation of current treatment. Proposal for Treatment Policy

36 Constructing confidence intervals concerning treatment effects at stage 2 and stage 1. The stage 2 is classical regression (at least if is low dimensional); constructing confidence intervals is standard. Constructing confidence intervals for the treatment effects at stage 1 is challenging. 36 Confidence Intervals

37 Challenge: Stage 2 estimated value, is non- smooth in the estimators from the stage 2 regression--due to non-differentiability of the maximization: 37 Confidence Intervals for Stage 1 Treatment Effects

38 38 Non-regularity The estimated policy can change abruptly from training set to training set. Standard approximations used to construct confidence intervals perform poorly (Shao, 1994; Andrews, 2000). Problematic area in parameter space is around for which We combined a local generalization-type error bound with standard statistical confidence interval to produce a valid confidence interval.

39 39 Why is this non-smoothness, and the resulting inferential problems, relevant to high dimensional machine learning research? Sparsity assumptions in high dimensional data analysis  Thresholding Nonsmoothness at important parameter values

40 40 Where are we going?...... Increasing use of wearable computers (e.g smart phones, etc.) to both collect real time data and provide real time treatment. We are working on the clinical trial designs involving randomization (soft-max or epsilon- greedy choice of actions) so as to develop/ continually improve treatment policies. Need confidence measures for infinite horizon problems

41 41 This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/UAlberta.09.28.12.pdf This seminar is based on work with many collaborators, some of which are: L. Collins, E. Laber, M. Qian, D. Almirall, K. Lynch, J. McKay, D. Oslin, T. Ten Have, I. Nahum-Shani & B. Pelham. Email with questions or if you would like a copy: samurphy@umich.edu


Download ppt "Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint."

Similar presentations


Ads by Google