Presentation is loading. Please wait.

Presentation is loading. Please wait.

Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC.

Similar presentations


Presentation on theme: "Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC."— Presentation transcript:

1 Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC

2 Two Separate Objectives Select prospects for a sales promotion ◦ Model: rank prospects by utility of promotion ◦ Black-box prediction Identify subgroups that a drug will help ◦ Model: plausible characterization ◦ Understandable (simple) description

3 Schism of Inference Rules Discover and replicate ◦ No p-values ◦ Requires more data Pre-specify hypothesis ◦ Multiple-testing limits variety of ideas ◦ Requires less data

4 Modeling Assume randomly assigned treatments Separate models for treated, untreated ◦ Focus on response blurs differential response ◦ Differential response is a weak signal Tree-based models most common ◦ Focus on differential response

5 TITANIC DECISION TREE TITANIC DECISION TREE N=788 P=35% N=288 P=67% Female N=156 P=89% 1 st & 2 nd Class N=132 P=41% 3 rd Class N=500 P=16% Male N=12 P=75% Age  10 N=488 P=15% Age > 10

6 TITANIC DECISION TREE 2 TITANIC DECISION TREE 2 N=801 P=34% N=279 P=66% Women N=144 P=89% 1 st & 2 nd Class N=135 P=41% 3 rd Class N=522 P=17% Men N=118 P=39% 1 st Class N=404 P=11% 2 nd & 3 rd Class

7 FICTITIOUS TITANIC DECISION TREE FICTITIOUS TITANIC DECISION TREE Randomized Treatment: Life Jackets

8 Two Splitting Criteria

9

10 Simulation to Compare Criteria

11

12 MineThatData Data Kevin Hillstrom’s 2008 challange Data (N=42,693): ◦ Customers who purchased within last year Treatment (N=21,387): ◦ Promotion of Women’s merchandise Response: ◦ Customer visited website in next two weeks Challenge: ◦ Rank customers by effect of treatment

13 MineThatData Covariates

14 Random Forest Average prediction over many trees To create different trees: ◦ Use different samples ◦ Exclude variables from a split search

15 Data Roles Data 100.0% N=42,693 ◦ Model 50.0% N=21,347  Train 25.0% N=10,673  Out-Of-Bag 12.5% N=5,336  Prune 12.5% N=5,337 ◦ Test 50.0% N=21,346

16 Cumulative Lift 1. Use treatment test data 2. Use forest to predict treatment effect 3. Sort by predicted treatment effect 4. Cumulate count of responders 5. Plot count as proportion vs percent cases

17  Predict Treatment Effect 2 Sort by Prediction 3 Cumulate Y Percent of treatment test cases  Predicted Good -- Predicted Poor  Cumulative Lift of Treatment Test Cases

18 Treated population Untreated population Uplift (difference) Uplift from random prediction Percent of population Cumulative Response and Uplift in Test Data

19 113 Subgroups (leaves) 51 Trees 10 Clusters Treatment Effect Overall Cluster 3 Train: 0.046 0.075 Test: 0.044 0.071 Cluster OOB Treatment Effect vs Cluster of Subgroups

20 Thank you for your attention! Padraic.Neville@SAS.com


Download ppt "Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC."

Similar presentations


Ads by Google