Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC.

Slides:



Advertisements
Similar presentations
Naive Bayes Classifiers, an Overview By Roozmehr Safi.
Advertisements

Copyright 2005 ACNielsen Shopper Trends 1 March 2005 SHOPPER TRENDS 2004.
CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.
Beyond random..  A stratified sample results when a population is separated into two or more subgroups, called strata, and simple random samples are.
Chapter 14 Comparing two groups Dr Richard Bußmann.
Comparing Two Groups’ Means or Proportions Independent Samples t-tests.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1. Identify the variable(s) of interest (the focus) and the population of the study. 2. Develop a detailed plan for collecting data. Make sure sample.
Influence of Detailed Photographs of Product on Customer’s Purchase Decision Sanjay Kumar Ranganayakulu Nikhil Bendre Shaunak Natu.
© The McGraw-Hill Companies, Inc., by Marc M. Triola & Mario F. Triola SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD.
QM Spring 2002 Business Statistics SPSS: A Summary & Review.
Prediction Methods Mark J. van der Laan Division of Biostatistics U.C. Berkeley
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
Descriptive statistics (Part I)
Section 4.4 Creating Randomization Distributions.
For our statistics group project, we selected the exhalation study to utilize different sampling methods of data. Part one of the assignment was to complete.
Ensemble Learning (2), Tree and Forest
Inference for Two Proportions Chapter 22
Inference for regression - Simple linear regression
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele, University of.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Final Study Guide Research Design. Experimental Research.
Probability & Statistics – Bell Ringer  Make a list of all the possible places where you encounter probability or statistics in your everyday life. 1.
Exam 3 Sample Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
Discussion of Hujer, R. and S. Thomsen (2006). “How Do Employment Effects of Job Creation Schemes Differ with Respect to the Foregoing Unemployment Duration?”
PSY 307 – Statistics for the Behavioral Sciences Chapter 1.
Introduction Biostatistics Analysis: Lecture 1 Definitions and Data Collection.
Sampling Methods.
Summary Statistics Review
Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
Study Session Experimental Design. 1. Which of the following is true regarding the difference between an observational study and and an experiment? a)
1. Identify the variable(s) of interest (the focus) and the population of the study. 2. Develop a detailed plan for collecting data. Make sure sample.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Math 145 September 20, Review Methods of Acquiring Data: 1. Census – obtaining information from each individual in the population. 2. Sampling –
Elementary Statistics (Math 145) September 8, 2010.
AP Statistics Chapter 24 Comparing Means.
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
EXTRA PRACTICE WITH ANSWERS
Two-Sample Proportions Inference. Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing.
Statistical Inference Module 7. Inferential Statistics We start with a question about a group or groups. The group(s) we are interested in is(are) called.
Math 145 January 29, Outline 1. Recap 2. Sampling Designs 3. Graphical methods.
Education 793 Class Notes Cross-tabulations Presentation 3.
1.3 Experimental Design What is the point of a statistical study? Is the way you design the study important when reaching conclusions or making decisions?
Elementary Statistics (Math 145) June 19, Statistics is the science of collecting, analyzing, interpreting, and presenting data. is the science.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Descriptive and Inferential Statistics Descriptive Statistics – consists of the collection, organization, and overall summery of the data presented. Inferential.
Unit 2 Review. Developing a Thesis A thesis is a question or statement that the research will answer When writing a thesis, ask: Is it specific? Are the.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 1 Section 3 – Slide 1 of 28 Chapter 1 Section 3 Other Effective Sampling.
Two-Sample Proportions Inference. Conditions TwoindependentTwo independent SRS’s (or randomly assigned treatments) Populations > 10n Both sampling dist.’s.
ACC 421 Week 2 Individual Assignment P1, P2, P3, P4 To purchase this material link Week-2-Individual-Assignment.
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
Elizabeth R McMahon 14 April 2017
INTRODUCTION AND DEFINITIONS
Math 145 May 27, 2009.
Arrangements or patterns for producing data are called designs
Math 145 June 25, 2013.
Predict whom survived the Titanic Disaster
Arrangements or patterns for producing data are called designs
Math 145.
STAT 145.
Math 145 January 28, 2015.
Narrative Reviews Limitations: Subjectivity inherent:
Sampling Methods.
STAT 245.
Independent Samples: Comparing Proportions
Decision trees MARIO REGIN.
Math 145 September 3, 2008.
Math 145 May 23, 2016.
Presentation transcript:

Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC

Two Separate Objectives Select prospects for a sales promotion ◦ Model: rank prospects by utility of promotion ◦ Black-box prediction Identify subgroups that a drug will help ◦ Model: plausible characterization ◦ Understandable (simple) description

Schism of Inference Rules Discover and replicate ◦ No p-values ◦ Requires more data Pre-specify hypothesis ◦ Multiple-testing limits variety of ideas ◦ Requires less data

Modeling Assume randomly assigned treatments Separate models for treated, untreated ◦ Focus on response blurs differential response ◦ Differential response is a weak signal Tree-based models most common ◦ Focus on differential response

TITANIC DECISION TREE TITANIC DECISION TREE N=788 P=35% N=288 P=67% Female N=156 P=89% 1 st & 2 nd Class N=132 P=41% 3 rd Class N=500 P=16% Male N=12 P=75% Age  10 N=488 P=15% Age > 10

TITANIC DECISION TREE 2 TITANIC DECISION TREE 2 N=801 P=34% N=279 P=66% Women N=144 P=89% 1 st & 2 nd Class N=135 P=41% 3 rd Class N=522 P=17% Men N=118 P=39% 1 st Class N=404 P=11% 2 nd & 3 rd Class

FICTITIOUS TITANIC DECISION TREE FICTITIOUS TITANIC DECISION TREE Randomized Treatment: Life Jackets

Two Splitting Criteria

Simulation to Compare Criteria

MineThatData Data Kevin Hillstrom’s 2008 challange Data (N=42,693): ◦ Customers who purchased within last year Treatment (N=21,387): ◦ Promotion of Women’s merchandise Response: ◦ Customer visited website in next two weeks Challenge: ◦ Rank customers by effect of treatment

MineThatData Covariates

Random Forest Average prediction over many trees To create different trees: ◦ Use different samples ◦ Exclude variables from a split search

Data Roles Data 100.0% N=42,693 ◦ Model 50.0% N=21,347  Train 25.0% N=10,673  Out-Of-Bag 12.5% N=5,336  Prune 12.5% N=5,337 ◦ Test 50.0% N=21,346

Cumulative Lift 1. Use treatment test data 2. Use forest to predict treatment effect 3. Sort by predicted treatment effect 4. Cumulate count of responders 5. Plot count as proportion vs percent cases

 Predict Treatment Effect 2 Sort by Prediction 3 Cumulate Y Percent of treatment test cases  Predicted Good -- Predicted Poor  Cumulative Lift of Treatment Test Cases

Treated population Untreated population Uplift (difference) Uplift from random prediction Percent of population Cumulative Response and Uplift in Test Data

113 Subgroups (leaves) 51 Trees 10 Clusters Treatment Effect Overall Cluster 3 Train: Test: Cluster OOB Treatment Effect vs Cluster of Subgroups

Thank you for your attention!