Download presentation
Presentation is loading. Please wait.
Published byBrendan Snow Modified over 9 years ago
1
Lecture 9 MARK2039 Summer 2006 George Brown College Wednesday 9-12
2
2 Assignment 7 An acquisition campaign with no targetting was conducted in January. The available information is as follows: – Mail files containing name and address –Responder files containing name and address –2001 Stats Can Census data available at the enumeration area – A conversion table which maps enumeration areas to postal codes How would you use the above information to better target prospects to become new customers. Describe how the analytical file would be created –Answered in last class and is in worknotes
3
3 Types of Predictive Models-Assignment 7 You have been asked to create programs that better target existing customers for insurance products. You have the following info: What would you do and how would you create the analytical file What would you do and how would you create the analytical file In last lecture’s studynotes In last lecture’s studynotes
4
4 Types of Predictive Models You have been asked to target customer that will not only purchase insurance but will also purchase the largest premiums What type of model would be built here? In last lecture’s studynotes In last lecture’s studynotes –
5
5 Geocoding is the process that assigns a latitude-longitude coordinate to an address. Once a latitude-longitude coordinate is assigned, the address can be displayed on a map or used in a spatial search. Data miners often use these coordinates to calculate such things as “distance to the nearest store” Creating the Analytical File- Geo-Coding
6
6 Demographic AnalysisPopulationCountPopulationCount AgeDistributionAgeDistribution Average Age StoreLocationStoreLocation GeoProfile
7
7 Creating the Analytical File-What is Geocoding? Let’s look at a sample of what some data might look like? How do we use this data to create meaningful variables? - use the latitude metric and longitude metrics and then use pythagorean theorem to calculate distance between the two postal codes Ex: distance between A1A5A2 and B5V1A2= Distance=square root of abs.value[(7-5)**2+(20-10)**2] =10.19degrees Above number has to then be converted to kilometres or miles
8
8 Creating the Analytical File-What is Geocoding Example: –A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information –As a marketer, how would you intelligently use this information
9
9 Correlation Coefficient
10
10 Correlation Coefficient
11
11 Correlation Analysis The male gender variable has a perfect correlation of +1. The female gender variable has a perfect correlation of -1. Household size has no correlation with response, hence the correlation coefficient is 0.
12
12 Correlation Results Show the level of confidence which a given variable has with the modelled behaviour i.e. response Correlation coefficient Confidence Interval
13
13 Examples-Correlation-Response Model Listed below is an example of a correlation matrix Answer the following: Is each variable relevant What is the relationship or impact of each variable with response What is the strongest variable and what is the weakest variable? Income –relevant and positive imact Age-relevant and negative impact Product Spend in last 12 months-relevant and negative Live in Quebec-not relevant and negative Tenure-relevant and negative # in household-not relevant and positive # of months since last promoted-relevant and negative # of months since last purchase-not relevant and positive Pay with credit card-relevant and positive Gender is male-relevant and negative strongest variable-# of months since last promoted Weakest variable-live in Quebec
14
14 Exploratory Data Analysis Reports(EDA) After looking at the correlation reports, we also need to create EDA reports which help to better understand the relationship of a given variable with the desired marketing behaviour. It helps the business people and marketers to get inside the so-called black box of modelling.
15
15 Exploratory Data Analysis Reports(EDA)
16
16 Exploratory Data Analysis Reports(EDA) Let’s take a look at example of a binary variable On the next page are some examples of EDA reports of variables that are not statistically significant according to the correlation matrix. Male# of ObservationsResponse Rate Yes500002.00% No500002.60% Average1000002.30%
17
17 Exploratory Data Analysis Reports(EDA) EDA’s of non-stat.sign. variables
18
18 More examples of correlation Previous analysis has indicated the following trends Would the correlations be closer to 1,-1, or 0 here for both variables? Would the correlations be closer to 1,-1, or 0 here for both variables? Closer to 0 here Closer to –1 here
19
19 More examples of correlation Would the correlations be closer to 1,-1, or 0 here for both variables? Would the correlations be closer to 1,-1, or 0 here for both variables? What is the learning here vs. the previous slide What is the learning here vs. the previous slide Closer to +1 here Closer to –1 here
20
20 Exploratory Data Analysis Reports Exploratory Data Analysis Reports: What does this tell us? Younger are more likely to respond What does this tell us? No trend exists here
21
21 Exploratory Data Analysis Reports What does this mean? Clearly, there is more of a binary rather than linear relationship here. Would create binary variable on income >=40K Not quite binary but not perfect in linear sense, would create index variables here
22
22 Creating the Final Model Why couldn’t we just use results of correlation to create model and create index values for each sign.variable. –Age –Tenure –# of products purchased –# of promotions since last purchase Think Statistics here? Independent or predictor variables have interaction here known as multicollinearity and this interaction must be accounted for when building any model. The interaction between model variables(independent variables) will have an impact on the actual variable weight or coefficient within any model equation that is parametric( ie. there are weights or coefficients associated with each parameter)
23
23 Creating the Final Model Need to account for interaction here Let’s take a look at some equations
24
24 The Data Mining Process : Application of Data Mining Techniques-Creating the Final Model Problems with Multicollinearity Example: Years of Education and Income on Response Rate Regression Equation is: Response=.50+.00001*income -.03*yrs. of education Years of Income Education Correlation Coefficient0.110.12 Confidence Interval99% 99.50% Response What is the problem here and what do you do? Income and education are highly correlated causing education to flip its sign within the model equation. I would either replace it with some other variable that does not reduce the model power too much or I would create an interaction variable between the two(i.e. Age X Income) Problems with Multicollinearity Example: Years of Education and Income on Response Rate Regression Equation is: Response=.50+.00001*income -.03*yrs. of education
25
25 Continuing to build the model Multivariate analytical techniques such as multiple regression,logistic regression,etc. may be employed to produce the final model Final equation: Predicted Response Rate:= A –B1*Age +B2*tenure Corr. Coeff. is +.5 for age and +.55 for tenure What is the problem here? Age has flipped What other diagnostics would you undertake to better understand the situation? Examine correlation coefficient between these two variable and compare this result to other independent variable correlations. The magnitude of the age and tenure correlation should be much greater than other independent variable correlations
26
26 Continuing to build the model VariableCorrelation Spend0.6 Live in Ontario0.5 Number in House-0.3 Response=A (+.05 X spend) (-.03 X Live in Ontario) (-.01 X Number in House) VariableCorrelation # of products0.6 Credit Score0.4 Tenure-0.2 Response=A (-.03*number of products) (+.08 X Credit Score) (-.01 X tenure)
27
27 Continuing to build the model After observing correlation results and EDA’s what can we begin to do at this point. –Derive new variables-EDA’s –Derive new variables-multicollinearity –Derive new variables-Factor Analysis –Derive new variables-CHAID(will explore later) Reference Material: Factor Analysis-look up in any Statistics Handbook Regression-look up in textbook under Regression and Statistics Regression.
28
28 Continuing to build the model Running further statistical routines, we are able to develop a final model. The marketer or business person should receive a report that looks as follows: For those of you that have statistics training, how is the % Contribution to model calculated derived? Looks at the partial R2 of each variable and calculates as follows: % contribution= partial R2/ total R2
29
29 Continuing to Build the Model VariablePartialModel EnteredR-Square var 40.0036 var 30.00340.007 var 10.00160.0086 var 20.00070.0092 var 60.00090.0102 var 50.00030.0105
30
30 Continuing to Build the Model What would be the final equation in terms of the sign? The equation should have the same signs as seen above from the impact column
31
31 Continuing to build the model What would you do here What would you do here I would conduct upfront segmentation as live in Quebec is overwhelmingly strong and essentially indicates that we have a one variable model. Create two segments-live in Quebec and Rest of Canada and perhaps develop models to each of these segments
32
32 Continuing to build the model Suppose we have the following equation: Suppose we have the following equation: Response= +.09 +.05 X Income +.06 X Tenure +.08 X Product Spend -.04 X Male What is the problem here? What is the problem here? Problem tenure-sign is inconsistent between report and actual equation-doublecheck actual equation and coefficient signs Problem tenure-sign is inconsistent between report and actual equation-doublecheck actual equation and coefficient signs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.