Inference for Clinical Decision Making Policies D. Lizotte, L. Gunter, S. Murphy INFORMS October 2008
2 Sequenced Treatment Alternatives to Relieve Depression STAR*D A goal of the clinical trial is construct good treatment sequences for patients suffering from treatment resistant depression. The goal is to achieve remission.
3 SER, BUP, VEN CIT+BUS, CIT+BUP MIRT, NTPL2+Li, L2+THY TCP, MIRT+VEN Level 2 Max 12 Weeks Level 3 Max 12 Weeks Level 4 12 Weeks QIDS > 5 CIT Level 1 Max 12 Weeks QIDS > 5 QIDS ≤ 5 Follow-up QIDS ≤ 5 Follow-up QIDS ≤ 5 Follow-up QIDS ≤ 5 Follow-up Preference to Switch Preference to Augment
4 STAR*D Level 1 Observation: QIDS: Quick Inventory of Depressive Symptoms 16 Items. Score range: Self-reported. Preference for type of Level 2 treatment: Switch or Augment Level 2 Treatment Action: If Level 1 preference is Switch then switch to either Ser, Bup or Ven; if Level 1 preference is Augment then augment with Bup or Bus. Level 2 Observation: QIDS Preference for type of Level 3 treatment: Switch or Augment Level 3 Treatment Action: If Level 2 preference is Switch then switch to either Mirt or Ntp: if Level 2 preference is Augment then augment with Li or Thy Level 3 Observation: QIDS Patients exit to follow-up if remission is achieved (QIDS ≤ 5).
5 Construct the policy to maximize average sum of rewards Reward: Convert Level 2 and Level 3 QIDS scores to standardized percentiles → %QIDS Reward: R j =1-(%QIDS j -%5)/100 for j=2,3 If a patient remits in Level 2, R 2 =1+1, R 3 =0. Construct policy so as to maximize E[R 2 +R 3 ]
6 Batch version of Q-learning for finite horizon problems Approximate Q 3 by regressing R 3 on Levels 1 & 2 QIDS within each (present action, preference, past action category. The best level 3 action is and the value is
7 Batch version of Q-learning for finite horizon problems: Approximate Q 2 by regressing R 2 + V 3 on Level 1 QIDS within each present action x preference category. The best level 2 action is and the value is
8 Use voting across bootstrap samples to assess confidence 100 bootstrap samples Each sample produces a Q 2 ; for each level 1 QIDS score we calculate the level 2 action that maximizes Q 2 (o,a). This is a vote by this bootstrap sample for the action.
9
10 Conclusion If level 1 QIDS is >12 then Ven is best treatment action at level 2 If level 1 QIDS is <11 then Ser is best treatment action at level 2 If level 1 QIDS is around 11 or 12 then Ven and Ser are best treatment actions at level 2.
11 The Problem Many patients dropout of the study. Level 2Level 3 Remit38336 Move to next level Dropout Sum
12 Two Approaches to Study Dropout Complete Case Analysis (Remove all patients with incomplete data from the analysis)--- gross assumptions on why people do or do not dropout. N=1201→N=679. Use a Bayesian method: Multiple Imputation. This method multiply imputes the missing data. Intuitively, an imputation model is used to group similar patients. Data from similar patients who remain in the study is used to construct the imputations for the missing data of dropouts.
13
14
15 Conclusion If level 1 QIDS is > 20 then Ven and Bup are best treatment actions at level 2 If level 1 QIDS is <12 then Ven and Ser are best treatment actions at level 2 If level 1 QIDS is around 12 to 20 then Ven is best treatment action at level 2.
16 Discussion If reinforcement learning and modern day control methods are to be used with clinical trial data then these methods must be combined with modern missing data methods and methods for assessing confidence. The multiple imputation + bootstrap we used is likely conservative in terms of the assessment of confidence. We are developing more principled methods of assessing confidence.
17 This seminar can be found at: seminars/INFORMS10.08.ppt me with questions or if you would like a copy!
18