Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008.

Slides:



Advertisements
Similar presentations
Interval Heaps Complete binary tree. Each node (except possibly last one) has 2 elements. Last node has 1 or 2 elements. Let a and b be the elements in.
Advertisements

Brief introduction on Logistic Regression
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
“I Don’t Need Enterprise Miner”
Chapter 7 – Classification and Regression Trees
Introduction to Data Mining with XLMiner
Psychology 202b Advanced Psychological Statistics, II February 10, 2011.
Psychology 202b Advanced Psychological Statistics, II February 15, 2011.
1 Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
Multiple Linear Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Psychology 202b Advanced Psychological Statistics, II March 1, 2011.
Predictive Analysis in Marketing Research
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Copyright © 2010, SAS Institute Inc. All rights reserved. Advanced Business Analytics.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Introduction to Directed Data Mining: Decision Trees
Chapter 4: Predictive Modeling
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Overview DM for Business Intelligence.
Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute.
Simple Linear Regression
Analysis of Variance: Some Review and Some New Ideas
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Data Mining Overview. Lecture Objectives After this lecture, you should be able to: 1.Explain key data mining tasks in your own words. 2.Draw an overview.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Logistic Regression Demo: dmdata2 and dmdata3 Bankloan Assignment: subscribe_training and subscribe_validate.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Chapter 4: Introduction to Predictive Modeling: Regressions
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Lecture DSCI 4520/5240 DATA MINING MYRAW Nonprofit donor data MYRAW Overview Determine who is likely to donate to a non-profit organization campaign.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Some key developments in data analysis Michael Babyak, PhD.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
DR. SATISH NARGUNDKAR GEORGIA STATE UNIVERSITY Analytics Overview.
Binary logistic regression. Characteristic Regression model for target categorized variable explanatory variables – continuous and categorical Estimate.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
DEMONSTRATION OF USING SPSS Logistic Regression Models for Prediction 2016/11/71.
Logistic Regression When and why do we use logistic regression?
Anastasiia Raievska (Veramed)
Multivariate Analysis
Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics
Predicting the Market Value of the Property Using JMP® Pro 11
THE BEGINNING.
Introduction to Data Mining and Classification
Advanced Analytics Using Enterprise Miner
Using Data Analytics to Predict Liquor Sales in Iowa State
NBA Draft Prediction BIT 5534 May 2nd 2018
ECE 471/571 – Lecture 12 Decision Tree.
Analysis of Variance: Some Review and Some New Ideas
Analytics: Its More than Just Modeling
Introduction to Logistic Regression
Machine Learning in Practice Lecture 23
Prediction of in-hospital mortality after ruptured abdominal aortic aneurysm repair using an artificial neural network  Eric S. Wise, MD, Kyle M. Hocking,
Predicting the Sale Price of Homes in Ames,Iowa
A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory.
Data Mining Overview.
About Data Analysis.
Presentation transcript:

Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008

Overview Data Visualization Data Visualization Data Preparation Data Preparation Model Building Model Building Variable Selection Variable Selection Interaction Interaction Model Assessment Model Assessment ROC ROC

Data Visualization 63 Variables 63 Variables Target is binary with 1 indicating people responded to the mailing campaign Target is binary with 1 indicating people responded to the mailing campaign Target is very unbalanced Target is very unbalanced Target rate is 1.13% for training set Target rate is 1.13% for training set

Data Visualization Categorical Variable Categorical Variable High level variables High level variables x2 ~ 57 levels x2 ~ 57 levels DATE variables (x10 & x11) ~ over 100 levels DATE variables (x10 & x11) ~ over 100 levels Missing value Missing value DATE variables ~ 30%-70% DATE variables ~ 30%-70% Some variables missing value coded as “Unknown” or “Uncoded”, e.g x20 Some variables missing value coded as “Unknown” or “Uncoded”, e.g x20

Data Visualization Interval Variable Interval Variable Skewness Skewness

Data Preparation Missing Value Indicator (MVI) Missing Value Indicator (MVI) Variables with > 5% missing Variables with > 5% missing Binary Binary Capture the missing value information Capture the missing value information

Data Preparation Imputation Imputation Unconditional imputation Unconditional imputation Categorical variable Categorical variable Tree/Tree Surrogate Tree/Tree Surrogate Interval variable Interval variable Cluster Cluster

Data Preparation Transformation Transformation Right skewed Right skewed Log or Square Root transformation Log or Square Root transformation Left skewed Left skewed Square transformation Square transformation

Model Building Variable selection Variable selection Individual predictive power Individual predictive power Logistic backward elimination Logistic backward elimination Keep the potential interaction terms Keep the potential interaction terms Logistic stepwise selection Logistic stepwise selection Tree Tree Different criterions Different criterions 21 variables selected 21 variables selected

Model Building Interactions Interactions SAS EMiner Regression node SAS EMiner Regression node 11 interaction terms selected 11 interaction terms selected Model Model Ensemble different logistic models Ensemble different logistic models

Model Assessment AUC = 0.66 AUC = 0.66

Acknowledgement UCF Statistics Dept UCF Statistics Dept BlueCross BlueShield of FL BlueCross BlueShield of FL