Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.

Slides:



Advertisements
Similar presentations
CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.
Advertisements

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
POINT ESTIMATION AND INTERVAL ESTIMATION
Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan JSM: August, 2005.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
Screening Experiments for Developing Dynamic Treatment Regimes S.A. Murphy At ICSPRAR January, 2008.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Screening Experiments for Dynamic Treatment Regimes S.A. Murphy At ENAR March, 2008.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan PSU, October, 2005 In Honor of Clifford C. Clogg.
Planning Survival Analysis Studies of Dynamic Treatment Regimes Z. Li & S.A. Murphy UNC October, 2009.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Yale: November, 2005.
Methods for Estimating the Decision Rules in Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IBC/ASC: July, 2004.
Visual Recognition Tutorial
Discussion of Profs. Robins’ and M  ller’s Papers S.A. Murphy ENAR 2003.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.
1 Dynamic Treatment Regimes: Interventions for Chronic Conditions (such as Poverty or Criminality?) S.A. Murphy Univ. of Michigan In Honor of Clifford.
Experiments and Dynamic Treatment Regimes S.A. Murphy At NIAID, BRB December, 2007.
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Adaptive Treatment Strategies: Challenges in Data Analysis S.A. Murphy NY State Psychiatric Institute February, 2009.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Target Tracking with Binary Proximity Sensors: Fundamental Limits, Minimal Descriptions, and Algorithms N. Shrivastava, R. Mudumbai, U. Madhow, and S.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.
Optimal Bayes Classification
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Biointelligence Laboratory, Seoul National University
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
Inference for Proportions
Pawan Lingras and Cory Butz
Computational Learning Theory
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Computational Learning Theory
Presentation transcript:

Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004

Outline Dynamic Treatment Regimes Optimal Q-functions and Q-learning The Problem & Goal Finite Sample Bounds Outline of Proof Shortcomings and Open Problems

---- Multi-stage decision problems: repeated decisions are made over time on each patient Used in the management of Addictions, Mental Illnesses, HIV infection and Cancer Dynamic Treatment Regimes

k Decisions Observations made prior to t th decision Action at t th decision Primary Outcome:

A dynamic treatment regime is a vector of decision rules, one per decision If the regime is implemented then

Goal : Estimate the decision rules that maximize mean Data : Data set of n finite horizon trajectories, each with randomized actions. are randomization probabilities.

Optimal Q-functions and Q-learning: Definition: denotes expectation when the actions are chosen according to the regime

Q-functions: The Q-functions for optimal regime, are given recursively by For t=k,k-1,….

Q-functions: The optimal regime is given by

Q-learning: Given a model for the Q-functions, minimize over Set

Q-learning: For each t=k-1,…,1 minimize over And set and so on.

Q-Learning: The estimated regime is given by

The Problem & Goal: Most learning (e.g. estimation) methods utilize a model for all or parts of the multivariate distribution of implicitly constrains the class of possible decision rules in the dynamic treatment regime: call this constrained class, is a vector with many components (high dimensional) thus the model is likely incorrect; view and as approximation classes.

Goal: Given a learning method and approximation classes assess the ability of learning method to produce the best decision rules in the class. Ideally construct an upper bound for where is the estimator of the regime denotes expectation when the actions are chosen according to the rule

Goal: Given a learning method, model and approximation class construct a finite sample upper bound for This upper bound should be composed of quantities that are minimized in the learning method. Learning Method is Q-learning.

Finite Sample Bounds: Primary Assumptions: (1) for L>1. (2) Number of possible actions is finite.

Definition: where E, without a subscript, denotes expectation when the actions are randomized.

Results: Approximation Error: The minimum is over with

Define The estimation error involves the complexity of this space.

Estimation Error: For with probability at least 1- δ for n satisfying

If is finite then n needs only to satisfy that is,

Outline of Proof: The Q-functions for regime are given by

Proof Outline (1)

Proof Outline (2) It turns out that also

Proof Outline (3)

Shortcomings and Open Problems

Recall Estimation Error: For with probability at least 1- δ for n satisfying

Open Problems Is there a learning method that can learn the best decision rule in an approximation class given a data set of n finite horizon trajectories? Sieve Estimators or Regularized Estimators? Dealing with high dimensional X-- feature extraction- --feature selection.

This seminar can be found at: ims_bernoulli_0704.ppt The paper can be found at : Qlearning.pdf

Recall Proof Outline (2) It turns out that also

Recall Proof Outline (1)