A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits Negar Hassanpour and Russ Greiner Department of Computing.

Slides:



Advertisements
Similar presentations
METHODOLOGY FOR META- ANALYSIS OF TIME TO EVENT TYPE OUTCOMES TO INFORM ECONOMIC EVALUATIONS Nicola Cooper, Alex Sutton, Keith Abrams Department of Health.
Advertisements

Mywish K. Maredia Michigan State University
HSRP 734: Advanced Statistical Methods July 24, 2008.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
1. 2 Implementing and Evaluating of an Evidence Based Nursing into Practice Prepared By Dr. Nahed Said El nagger Assistant Professor of Nursing H.
Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.
Decision Analysis as a Basis for Estimating Cost- Effectiveness: The Experience of the National Institute for Health and Clinical Excellence in the UK.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Overview of Adaptive Treatment Regimes Sachiko Miyahara Dr. Abdus Wahed.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
© Nuffield Trust 22 June 2015 Matched Control Studies: Methods and case studies Cono Ariti
Validation / citations. Validation u Expert review of model structure u Expert review of basic code implementation u Reproduce original inputs u Correctly.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 Using dynamic path analysis to estimate direct and indirect effects of treatment and other fixed covariates in the presence of an internal time-dependent.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Confidential and Proprietary Business Information. For Internal Use Only. Statistical modeling of tumor regrowth experiment in xenograft studies May 18.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Bootstrap and Model Validation
Methods to Handle Noncompliance
for Overall Prognosis Workshop Cochrane Colloquium, Seoul
Mining Utility Functions based on user ratings
Analysis for Designs with Assignment of Both Clusters and Individuals
The comparative self-controlled case series (CSCCS)
Is High Placebo Response Really a Problem in Clinical Trials?
Lurking inferential monsters
Guillaume-Alexandre Bilodeau
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Sofus A. Macskassy Fetch Technologies
Zhipeng (Patrick) Luo December 6th, 2016
Applied Biostatistics: Lecture 2
Statistical Approaches to Support Device Innovation- FDA View
William Greene Stern School of Business New York University
Supervised Time Series Pattern Discovery through Local Importance
Boosting and Additive Trees
A Growth Curve Analysis Participant Baseline Characteristics
Importance Weighted Active Learning
A practical trial design for optimising treatment duration
Association between risk-of-bias assessments and results of randomized trials in Cochrane reviews: the ROBES study Jelena Savović1, Becky Turner2, David.
S1316 analysis details Garnet Anderson Katie Arnold
Use of Real-World Data in Clinical Drug Development
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
Friends of Cancer Research
What is Regression Analysis?
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Microeconometric Modeling
WHICH PSM METHOD TO USE? THE ASSOCIATION BETWEEN CHOSEN PROPENSITY SCORE METHOD AND OUTCOMES OF RETROSPECTIVE REAL-WORLD TREATMENT COMPARISIONS: EVALUATION.
George Bebis and Wenjing Li Computer Vision Laboratory
Intelligent Contextual Data Stream Monitoring
Model generalization Brief summary of methods
Improving Overlap Farrokh Alemi, Ph.D.
Multivariate Methods Berlin Chen
Decision trees MARIO REGIN.
Acupuncture for Chronic Pain
Education Modeling Collaboration: LANL Initial Model Implementation
Lecture 16. Classification (II): Practical Considerations
Kaplan-Meier survival curves and the log rank test
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Logical Inference on Treatment Efficacy When Subgroups Exist
Assessing Similarity to Support Pediatric Extrapolation
Introduction to Systematic Reviews
Presentation transcript:

A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits Negar Hassanpour and Russ Greiner Department of Computing Science University of Alberta

Problem Statement ID X Gender Age BMI ID X T Y1 Y0 Gender Age BMI ID X Mr. Smith Male 35 20 ID X T Y1 Y0 Gender Age BMI Mr. Smith Male 35 20 1 15 Mr. Green 22 32 Ms. Jones Female 23 31 . . . ID X T Gender Age BMI Mr. Smith Male 35 20 ID X T Y0 Y1 Gender Age BMI Mr. Smith Male 35 20 1 15 T=0 T=0 T=1 Great nitro! After setting T=0, have the T=1 blue pill fade – so emphasize only

Problem Statement Goal: Challenge: Find counterfactual outcomes ID X Gender Age BMI Mr. Smith Male 35 20 ID X T Y0 Y1 Gender Age BMI Mr. Smith Male 35 20 1 15 ID X T Gender Age BMI Mr. Smith Male 35 20 ID X T Y1 Y0 Gender Age BMI Mr. Smith Male 35 20 1 15 Mr. Green 22 32 Ms. Jones Female 23 31 . . . 25 29 18 Goal: Find counterfactual outcomes to calculate the individual treatment effects (ITE) … in order to facilitate finding the best personalized treatment option for a novel patient Challenge: In real-world datasets, the counterfactual outcomes are not observable i.e., no time-machine invented yet! Great nitro! After setting T=0, have the T=1 blue pill fade – so emphasize only

Types of Real-world Datasets Randomized Controlled Trial Observational Study Need Realistic Synthetic Datasets [Counterfactuals Available] t = 1 t = 0 x outcome t = 1 t = 0 x outcome Same dist’n of X for Red, and Blue Different dist’n of X for Red, and Blue

Outline Motivation Existing Approach Proposed Approach Experiments [Beygelzimer and Langford, 2009] [Swaminathan and Joachims, 2015a] Proposed Approach Experiments Conclusions

Pipeline multi-binary-label Supervised Dataset X T* x11 x12 x13 1 x21 x21 x22 x23 x31 x32 x33 . . . T 1 . . . Logging Policy h0(t|x) Sample new T 5% Jaccard Score X T Y x11 x12 x13 1 3 x21 x22 x23 x31 x32 x33 . . . Contextual Bandit Dataset

Shortcomings How to map a binary multi-label {0,1}k to a medical treatment? (?? drug interactions ??) Using the Jaccard score to define outcomes  equal importance to various treatment options Assumes known underlying mechanism of treatment selection (e.g., in ad-placement); i.e., h0(t|x) value is given as part of the dataset (input)

Outline Motivation Existing Approach Proposed Approach Experiments Conclusions

Synthesize bandit datasets Synthesize Bandit dataset Pipeline Original RCT Dataset (RCT*) ID X T Y0 Y1 Sex Age BMI Mr. Smith M 35 20 15 Mr. Green 22 32 Ms. Jones F 23 1 31 ID X Y0 Y1 Sex Age BMI Mr. Smith M 35 20 15 25 Mr. Green 22 32 29 Ms. Jones F 23 18 31 Completion (match ATE and R2) ID X T Y0 Y1 Sex Age BMI Mr. Smith M 35 20 1 25 Mr. Green 22 32 Ms. Jones F 23 18 ID X T Y0 Y1 Sex Age BMI Mr. Smith M 35 20 15 Mr. Green 22 32 1 29 Ms. Jones F 23 18 ID X T Y0 Y1 Sex Age BMI Mr. Smith M 35 20 1 25 Mr. Green 22 32 29 Ms. Jones F 23 18 α α Synthesize bandit datasets Synthesize Bandit dataset β β IPS-CRM DR-ERM IPS-ERM Logger Output Prediction SN-ERM α β Evaluation

RCT*’s Characteristics Goal: To generate synthetic bandit datasets as similar as possible to a real bandit dataset Match: □ Average Treatment Effect □ Coefficients of Determination

Derive Counterfactual Functions Fit Gaussian Process to each treatment arm: Set counterfactual functions to: Find kt and εt such that each new sampled RCT (i.e., synthetic) has ATE and R2 similar to those of RCT*

Note that the y0 and y1 dist’n for RCT* = Syn RCT Synthetic RCT Note that the y0 and y1 dist’n for RCT* = Syn RCT

Induce Selection Bias Logging policy hα,β(t|x) is a sigmoid with α and β parameters; z=z(x) induces the sample selection bias. z could be: any covariate such as age, gender, etc. First Principal Component (PC1) … Removed: is function of x that

Effect of α and β on Selection Bias α = 0 α = 1, β = 0 α = 1, β = 1 α = 1, β = -1

Outline Motivation Existing Approach Proposed Approach Experiments Conclusions

RCT* Datasets Acupuncture benefit of acupuncture (in addition to standard care) for treating chronic headache disorders 18 features, 295 subjects Hypericum acute efficacy of St. John’s Wort in treatment of patients with major depression disorder 278 features, 161 subjects Original RCT Synthesized RCTs ATE*= −6.15 ATE = −6.17(0.76) R20*= 0.68 R20 = 0.60(0.05) R21* = 0.33 R21 = 0.36(0.07) Original RCT Synthesized RCTs ATE*= −2.25 ATE = −2.65(0.97) R20* = 0.24 R20 = 0.15(0.10) R21* = 0.00 R21 = 0.04(0.09)

Methods Evaluated Off-policy learning methods in contextual bandits: Outcome Prediction (OP) Inverse Propensity Scoring (IPS) Doubly Robust (DR) [Dudík et al., 2011] Self-Normalized (SN) [Swaminathan and Joachims, 2015b] Each uses an objective function defined by either: Empirical Risk Minimization (ERM) principle OR Counterfactual Risk Minimization (CRM) principle [Swaminathan and Joachims, 2015c]

Results 1

Results 1

Results 2

Acupuncture

Hypericum

Outline Motivation Existing Approach Proposed Approach Experiments Conclusions

Future Directions Increase the pool size of possible treatments from |T| = 2 (i.e., t ∈ {0,1}). Cover combining several medications or other medical interventions Explore ways to extend this evaluation methodology for sequential observational studies. Not trivial since the space of all possible decisions grows exponentially as we progress through the course of treatment Create observational datasets that contain censored samples i.e., when only a lower bound of the survival time of some of the patients is available.  Can use such datasets to develop and evaluate survival prediction methods that can handle sample selection bias

Conclusion Our analyses show that different off-policy learning methods exhibit different competencies under different levels of sample selection bias Our proposed evaluation framework enables us to tease out these differences and select the appropriate method for each real-world application Such analysis was not possible with the existing evaluation methodology as it could not generate diverse observational datasets (in terms of selection bias)

Negar Hassanpour hassanpo@ualberta.ca Questions? Negar Hassanpour hassanpo@ualberta.ca

References [Beygelzimer and Langford, 2009] Beygelzimer, A., Langford, J.: The offset tree for learning with partial labels. In: Proceedings of the 15th ACM SIGKDD, ACM (2009) [Dudík et al., 2011] Dudík, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning. International Conference on Machine Learning (2011) [Swaminathan and Joachims, 2015a] Swaminathan, A., Joachims, T.: Batch learning from logged bandit feedback through counterfactual risk minimization. JMLR 16 (2015) [Swaminathan and Joachims, 2015b] Swaminathan, A., Joachims, T.: The self-normalized estimator for counterfactual learning. In: Advances in Neural Information Processing Systems. (2015) [Swaminathan and Joachims, 2015c] Swaminathan, A., Joachims, T.: Counterfactual risk minimization: Learning from logged bandit feedback. In: International Conference on Machine Learning. (2015)