1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

Slides:



Advertisements
Similar presentations
Piloting and Sizing Sequential Multiple Assignment Randomized Trials in Dynamic Treatment Regime Development 2012 Atlantic Causal Inference Conference.
Advertisements

Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Univ. of Michigan Oberlin College, Feb. 20, 2006.
Inference for Clinical Decision Making Policies D. Lizotte, L. Gunter, S. Murphy INFORMS October 2008.
Using Clinical Trial Data to Construct Policies for Guiding Clinical Decision Making S. Murphy & J. Pineau American Control Conference Special Session.
Experimenting to Improve Clinical Practice S.A. Murphy AAAS, 02/15/13 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008.
Methodology for Adaptive Treatment Strategies for Chronic Disorders: Focus on Pain S.A. Murphy NIH Pain Consortium 5 th Annual Symposium on Advances in.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan JSM: August, 2005.
SMART Designs for Constructing Adaptive Treatment Strategies S.A. Murphy 15th Annual Duke Nicotine Research Conference September, 2009.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
Screening Experiments for Developing Dynamic Treatment Regimes S.A. Murphy At ICSPRAR January, 2008.
1 Dynamic Treatment Regimens S.A. Murphy PolMeth XXV July 10, 2008.
SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have CPDD June, 2005.
Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Sizing a Trial for the Development of Adaptive Treatment Strategies Alena I. Oetting The Society for Clinical Trials, 29th Annual Meeting St. Louis, MO.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy NIDA DESPR February, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan PSU, October, 2005 In Honor of Clifford C. Clogg.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
Planning Survival Analysis Studies of Dynamic Treatment Regimes Z. Li & S.A. Murphy UNC October, 2009.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy RWJ Clinical Scholars Program, UMich April, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Herbert E. Robbins Collegiate Professorship in Statistics.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
Methodology for Adaptive Treatment Strategies R21 DA S.A. Murphy For MCATS Oct. 8, 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Yale: November, 2005.
Methods for Estimating the Decision Rules in Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IBC/ASC: July, 2004.
1 Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy ISCTM, 2007.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.
1 Dynamic Treatment Regimes: Interventions for Chronic Conditions (such as Poverty or Criminality?) S.A. Murphy Univ. of Michigan In Honor of Clifford.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy Symposium on Causal Inference Johns Hopkins, January, 2006.
Experiments and Dynamic Treatment Regimes S.A. Murphy At NIAID, BRB December, 2007.
Adaptive Treatment Strategies S.A. Murphy CCNIA Proposal Meeting 2008.
Practical Application of Adaptive Treatment Strategies in Trial Design and Analysis S.A. Murphy Center for Clinical Trials Network Classroom Series April.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
1 Variable Selection for Tailoring Treatment S.A. Murphy, L. Gunter & J. Zhu May 29, 2008.
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Adaptive Treatment Design and Analysis S.A. Murphy TRC, UPenn April, 2007.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Univ. of Michigan In Honor of Clifford C. Clogg.
6-1 Chapter Six DESIGN STRATEGIES. 6-2 What is Research Design? A plan for selecting the sources and types of information used to answer research questions.
Variables and Measurement (2.1) Variable - Characteristic that takes on varying levels among subjects –Qualitative - Levels are unordered categories (referred.
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have NDRI April, 2006.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Designing An Adaptive Treatment Susan A. Murphy Univ. of Michigan Joint with Linda Collins & Karen Bierman Pennsylvania State Univ.
SMART Trials for Developing Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Designs NCDEU, 2006.
Variables and Measurement (2.1)
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
Presentation transcript:

1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008

2 Outline Goal: Improving Clinical Decision Making Using Data –Clinical Decision Making –Types of Training Data –Incomplete Mechanistic Models –Clinical Trials –Some Open Problems –Example

3

4

5 Questions Patient Evaluation Screen with MSE

6

7 Policies are individually tailored treatments, with treatment type and dosage changing according to the patient’s outcomes. k Stages for each patient Observation available at j th stage Action at j th stage (usually a treatment)

8 k Stages History available at j th stage Reward following j th stage (r j is a known function) Primary Outcome:

9 Goal : Use training data to construct decision rules, d 1,…, d k that input information in the history at each stage and output a recommended action; these decision rules should lead to a maximal mean Y (cumulative reward). The policy is the sequence of decision rules, d 1,…, d k. In implementation of the policy the actions are set to:

10 Some Characteristics of a Clinical Decision Making Policy The learned policy should be decisive only when warranted. The learned policy should not require excessive data collection in order to implement. The learned policy should be justifiable to clinical scientists.

11 Types of Data Clinical Trial Data –Actions are manipulated (randomized) Large Databases or Observational Data Sets –Actions are not manipulated by scientist Bench research on cells/animals/humans

12 Clinical Trial Data Sets Experimental trials conducted for research purposes –Scientists decide proactively which data to collect and how to collect this data –Use scientific knowledge to enhance the quality of the proxies for observation, reward –Actions are manipulated (randomized) by scientist –Short Horizon (less than 5) –Hundreds of subjects.

13 Observational Data Sets Observational data collected for research purposes –Use scientific knowledge to pinpoint high quality proxies for observation, action, reward –Scientists decide proactively which proxies to collect and how to collect this data –Actions are not manipulated by scientist –Moderate Horizon –Hundreds to thousands of subjects.

14 Observational Data Sets Clinical databases or registries– (an example in the US would be the VA registries) –Data was not collected for research purposes –Only gross proxies are available to define observation, action, reward –Moderate to Long Horizon –Thousands to Millions of subjects

15 Mechanistic Models In many areas of RL, scientists can use mechanistic theory, e.g., physical laws, to model or simulate the interrelationships between observations and how the actions might impact the observations. Scientists know many (the most important) of the causes of the observations and know a model for how the observations relate to one another.

16 Low Availability of Mechanistic Models Clinical scientists have recourse to only crude, qualitative models Unknown causes create problems. Scientists who want to use observational data to construct policies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations.

17 Conceptual Structure in the Clinical Sciences (observational data)

18 Unknown, Unobserved Causes (Incomplete Mechanistic Models)

19 Unknown, Unobserved Causes (Incomplete Mechanistic Models) Problem: Non-causal associations between treatment (here counseling) and rewards are likely. Solutions: –Collect clinical trial data in which treatment actions are randomized. This breaks the non-causal associations yet permits causal associations. –Participate in the observational data collection; proactively brainstorm with domain experts to ascertain and measure the main determinants of treatment selection. Then take advantage of causal inference methods designed to utilize this information.

20 Conceptual Structure in the Clinical Sciences (experimental trial data)

21 STAR*D The statistical expertise relevant for policy construction was unavailable at the time the trial was designed. This trial is over and one can apply for access to this data One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.

22

23 ExTENd Ongoing study at U. Pennsylvania Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.

24 Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

25 Clinical Trials Data from the --short horizon– clinical trials make excellent test beds for combinations of supervised/unsupervised and reinforcement learning methods. –In the clinical trial large amounts of data are collected at each stage of treatment –Small number of finite horizon patient trajectories –The learned policy can vary greatly from one training set to another.

26 Open Problems 1)Equivalent Actions –Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence. 2)Evaluation –Need to assess the quality of the learned policy (or compare policies) using training data

27 Open Problems 3)Variable Selection –To reduce the large number of variables to those most useful for decision making –Once a small number of variables is identified, we need to know if there is sufficient evidence that a particular variable (e.g. output of a biological test) should be part of the policy.

28 Measures of Confidence A statistician’s approach: use measures of confidence to address these three challenges –Pinpointing equivalent actions –Pinpointing necessary patient inputs to the policy –Evaluating the quality of a learned policy

29 Evaluating the quality of a learned policy using the training data Traditional methods for constructing measures of conference require differentiability (to assess the variation in the policy from training set to training set). The mean outcome following use of a policy (the value of the policy) is a non-differentiable function of the policy.

30 Example: Single Stage (k=1) Find a prediction interval for the mean outcome if a particular estimated policy (here one decision rule) is employed. Action A is binary in {-1,1}. Suppose the decision rule is of form We do not assume the Bayes decision boundary is linear.

31 Single Stage (k=1) Mean outcome following this policy is is the randomization probability

32 Prediction Interval for Two problems V(β) is not necessarily smooth in β. We don’t know V so V must be estimated as well. Data set is small so overfitting is a problem.

33 Similar Problem in Classification Misclassification rate for a given decision rule (classifier) where V is defined by (A is the {-1,1} classification; O 1 is the observation; β T O 1 is a linear classification boundary)

34 Jittering is non-smooth. Toy Example: The unknown Bayes classifier has quadratic decision boundary. We fit, by least squares, a linear decision boundary f(o)= sign(β 0 + β 1 o)

35 Jittering of N=100 N=30

36 Simulation Example Data Sets from the UCI repository Use squared error loss to form classification rule Sample 30 examples from each data set; for each sample construct prediction interval. Assess coverage using remaining examples. Repeat 1000 times

37 “95% Prediction Intervals” Data SetPercentile Bootstrap Adjusted Bootstrap Ionosphere Heart Simulated Confidence rate should be ≥.95

38 Prediction Interval for Our method obtains a prediction interval for a smooth upper bound on is training error.

39 Prediction Interval for where is the set of close to in terms of squared error loss. Form a percentile bootstrap interval for this smooth upper bound. This method is generally too conservative

40 “95% Prediction Intervals” Data SetCUDCVInverse Binomial Ionosphere (width) 1.00 (.4).99 (.5).75 (.3) Heart (width) 1.00 (.5) 1.00 (.4).46 (.4) Simulated (width).99 (.5).76 (.5).95 (.4) Confidence rate should be ≥.95

41 A Challenge! Methods for constructing the policy (or classifier) and providing an evaluation of the policy (or classifier) must use same small data set. How might you better address this problem?

42 Discussion 1)Equivalent Actions: Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence. 2)Evaluating the usefulness of a particular variable in the learned policy. 3)Methods for producing composite rewards. –High quality elicitation of functionality 4)Feature construction for decision making in addition to prediction

43 This seminar can be found at: seminars/Benelearn08.ppt me with questions or if you would like a copy:

44 Unknown, Unobserved Causes (Incomplete Mechanistic Models)

45 Unknown, Unobserved Causes (Incomplete Mechanistic Models)

46 Questions Patient Evaluation Screen with MSE