Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC.

Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC

Two Separate Objectives Select prospects for a sales promotion ◦ Model: rank prospects by utility of promotion ◦ Black-box prediction Identify subgroups that a drug will help ◦ Model: plausible characterization ◦ Understandable (simple) description

Schism of Inference Rules Discover and replicate ◦ No p-values ◦ Requires more data Pre-specify hypothesis ◦ Multiple-testing limits variety of ideas ◦ Requires less data

Modeling Assume randomly assigned treatments Separate models for treated, untreated ◦ Focus on response blurs differential response ◦ Differential response is a weak signal Tree-based models most common ◦ Focus on differential response

TITANIC DECISION TREE TITANIC DECISION TREE N=788 P=35% N=288 P=67% Female N=156 P=89% 1 st & 2 nd Class N=132 P=41% 3 rd Class N=500 P=16% Male N=12 P=75% Age  10 N=488 P=15% Age > 10

TITANIC DECISION TREE 2 TITANIC DECISION TREE 2 N=801 P=34% N=279 P=66% Women N=144 P=89% 1 st & 2 nd Class N=135 P=41% 3 rd Class N=522 P=17% Men N=118 P=39% 1 st Class N=404 P=11% 2 nd & 3 rd Class

FICTITIOUS TITANIC DECISION TREE FICTITIOUS TITANIC DECISION TREE Randomized Treatment: Life Jackets

Two Splitting Criteria

Simulation to Compare Criteria

MineThatData Data Kevin Hillstrom’s 2008 challange Data (N=42,693): ◦ Customers who purchased within last year Treatment (N=21,387): ◦ Promotion of Women’s merchandise Response: ◦ Customer visited website in next two weeks Challenge: ◦ Rank customers by effect of treatment

MineThatData Covariates

Random Forest Average prediction over many trees To create different trees: ◦ Use different samples ◦ Exclude variables from a split search

Data Roles Data 100.0% N=42,693 ◦ Model 50.0% N=21,347  Train 25.0% N=10,673  Out-Of-Bag 12.5% N=5,336  Prune 12.5% N=5,337 ◦ Test 50.0% N=21,346

Cumulative Lift 1. Use treatment test data 2. Use forest to predict treatment effect 3. Sort by predicted treatment effect 4. Cumulate count of responders 5. Plot count as proportion vs percent cases

 Predict Treatment Effect 2 Sort by Prediction 3 Cumulate Y Percent of treatment test cases  Predicted Good -- Predicted Poor  Cumulative Lift of Treatment Test Cases

Treated population Untreated population Uplift (difference) Uplift from random prediction Percent of population Cumulative Response and Uplift in Test Data

113 Subgroups (leaves) 51 Trees 10 Clusters Treatment Effect Overall Cluster 3 Train: 0.046 0.075 Test: 0.044 0.071 Cluster OOB Treatment Effect vs Cluster of Subgroups

Thank you for your attention! Padraic.Neville@SAS.com

Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC.

Similar presentations

Presentation on theme: "Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC.

Similar presentations

Presentation on theme: "Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC."— Presentation transcript:

Similar presentations

About project

Feedback