Download presentation
Presentation is loading. Please wait.
Published byEsteban Camm Modified over 9 years ago
1
Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC
2
Two Separate Objectives Select prospects for a sales promotion ◦ Model: rank prospects by utility of promotion ◦ Black-box prediction Identify subgroups that a drug will help ◦ Model: plausible characterization ◦ Understandable (simple) description
3
Schism of Inference Rules Discover and replicate ◦ No p-values ◦ Requires more data Pre-specify hypothesis ◦ Multiple-testing limits variety of ideas ◦ Requires less data
4
Modeling Assume randomly assigned treatments Separate models for treated, untreated ◦ Focus on response blurs differential response ◦ Differential response is a weak signal Tree-based models most common ◦ Focus on differential response
5
TITANIC DECISION TREE TITANIC DECISION TREE N=788 P=35% N=288 P=67% Female N=156 P=89% 1 st & 2 nd Class N=132 P=41% 3 rd Class N=500 P=16% Male N=12 P=75% Age 10 N=488 P=15% Age > 10
6
TITANIC DECISION TREE 2 TITANIC DECISION TREE 2 N=801 P=34% N=279 P=66% Women N=144 P=89% 1 st & 2 nd Class N=135 P=41% 3 rd Class N=522 P=17% Men N=118 P=39% 1 st Class N=404 P=11% 2 nd & 3 rd Class
7
FICTITIOUS TITANIC DECISION TREE FICTITIOUS TITANIC DECISION TREE Randomized Treatment: Life Jackets
8
Two Splitting Criteria
10
Simulation to Compare Criteria
12
MineThatData Data Kevin Hillstrom’s 2008 challange Data (N=42,693): ◦ Customers who purchased within last year Treatment (N=21,387): ◦ Promotion of Women’s merchandise Response: ◦ Customer visited website in next two weeks Challenge: ◦ Rank customers by effect of treatment
13
MineThatData Covariates
14
Random Forest Average prediction over many trees To create different trees: ◦ Use different samples ◦ Exclude variables from a split search
15
Data Roles Data 100.0% N=42,693 ◦ Model 50.0% N=21,347 Train 25.0% N=10,673 Out-Of-Bag 12.5% N=5,336 Prune 12.5% N=5,337 ◦ Test 50.0% N=21,346
16
Cumulative Lift 1. Use treatment test data 2. Use forest to predict treatment effect 3. Sort by predicted treatment effect 4. Cumulate count of responders 5. Plot count as proportion vs percent cases
17
Predict Treatment Effect 2 Sort by Prediction 3 Cumulate Y Percent of treatment test cases Predicted Good -- Predicted Poor Cumulative Lift of Treatment Test Cases
18
Treated population Untreated population Uplift (difference) Uplift from random prediction Percent of population Cumulative Response and Uplift in Test Data
19
113 Subgroups (leaves) 51 Trees 10 Clusters Treatment Effect Overall Cluster 3 Train: 0.046 0.075 Test: 0.044 0.071 Cluster OOB Treatment Effect vs Cluster of Subgroups
20
Thank you for your attention! Padraic.Neville@SAS.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.