Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008.

Similar presentations


Presentation on theme: "Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008."— Presentation transcript:

1 Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008

2 Overview Data Visualization Data Visualization Data Preparation Data Preparation Model Building Model Building Variable Selection Variable Selection Interaction Interaction Model Assessment Model Assessment ROC ROC

3 Data Visualization 63 Variables 63 Variables Target is binary with 1 indicating people responded to the mailing campaign Target is binary with 1 indicating people responded to the mailing campaign Target is very unbalanced Target is very unbalanced Target rate is 1.13% for training set Target rate is 1.13% for training set

4 Data Visualization Categorical Variable Categorical Variable High level variables High level variables x2 ~ 57 levels x2 ~ 57 levels DATE variables (x10 & x11) ~ over 100 levels DATE variables (x10 & x11) ~ over 100 levels Missing value Missing value DATE variables ~ 30%-70% DATE variables ~ 30%-70% Some variables missing value coded as “Unknown” or “Uncoded”, e.g x20 Some variables missing value coded as “Unknown” or “Uncoded”, e.g x20

5 Data Visualization Interval Variable Interval Variable Skewness Skewness

6 Data Preparation Missing Value Indicator (MVI) Missing Value Indicator (MVI) Variables with > 5% missing Variables with > 5% missing Binary Binary Capture the missing value information Capture the missing value information

7 Data Preparation Imputation Imputation Unconditional imputation Unconditional imputation Categorical variable Categorical variable Tree/Tree Surrogate Tree/Tree Surrogate Interval variable Interval variable Cluster Cluster

8 Data Preparation Transformation Transformation Right skewed Right skewed Log or Square Root transformation Log or Square Root transformation Left skewed Left skewed Square transformation Square transformation

9 Model Building Variable selection Variable selection Individual predictive power Individual predictive power Logistic backward elimination Logistic backward elimination Keep the potential interaction terms Keep the potential interaction terms Logistic stepwise selection Logistic stepwise selection Tree Tree Different criterions Different criterions 21 variables selected 21 variables selected

10 Model Building Interactions Interactions SAS EMiner Regression node SAS EMiner Regression node 11 interaction terms selected 11 interaction terms selected Model Model Ensemble different logistic models Ensemble different logistic models

11 Model Assessment AUC = 0.66 AUC = 0.66

12 Acknowledgement UCF Statistics Dept UCF Statistics Dept BlueCross BlueShield of FL BlueCross BlueShield of FL


Download ppt "Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008."

Similar presentations


Ads by Google