Lecture 9 MARK2039 Summer 2006 George Brown College Wednesday 9-12.

Slides:



Advertisements
Similar presentations
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Advertisements

Linear regression and correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
1 9. Logistic Regression ECON 251 Research Methods.
Models with Discrete Dependent Variables
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Brown, Suter, and Churchill Basic Marketing Research (8 th Edition) © 2014 CENGAGE Learning Basic Marketing Research Customer Insights and Managerial Action.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Decision Tree Models in Data Mining
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
SPSS Session 4: Association and Prediction Using Correlation and Regression.
Relationships Among Variables
Multiple Regression continued… STAT E-150 Statistical Methods.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Correlation and Linear Regression
Correlation and Linear Regression
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Lecture 8 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Example of Simple and Multiple Regression
Understanding Research Results
Linear Regression.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Chapter 13: Inference in Regression
Linear Regression and Correlation
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Multiple Discriminant Analysis and Logistic Regression.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Lecture 7 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
CHAPTER 14 MULTIPLE REGRESSION
Lecture 6 Correlation and Regression STAT 3120 Statistical Methods I.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Correlation The apparent relation between two variables.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
ANOVA, Regression and Multiple Regression March
DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.
Business Research Methods
PART 2 SPSS (the Statistical Package for the Social Sciences)
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Wednesday: Need a graphing calculator today. Need a graphing calculator today.
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Introduction to Regression Analysis
Regression Analysis Module 3.
Chapter 5 STATISTICS (PART 4).
INFERENTIAL STATISTICS: REGRESSION ANALYSIS AND STANDARDIZATION
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
BUS 308 Competitive Success-- snaptutorial.com
BUS 308 HELPS Perfect Education/ bus308helps.com.
BUS 308 Education for Service-- snaptutorial.com
BUS 308 Teaching Effectively-- snaptutorial.com
Presentation transcript:

Lecture 9 MARK2039 Summer 2006 George Brown College Wednesday 9-12

2 Assignment 7 An acquisition campaign with no targetting was conducted in January. The available information is as follows: – Mail files containing name and address –Responder files containing name and address –2001 Stats Can Census data available at the enumeration area – A conversion table which maps enumeration areas to postal codes How would you use the above information to better target prospects to become new customers. Describe how the analytical file would be created –Answered in last class and is in worknotes

3 Types of Predictive Models-Assignment 7 You have been asked to create programs that better target existing customers for insurance products. You have the following info: What would you do and how would you create the analytical file What would you do and how would you create the analytical file In last lecture’s studynotes In last lecture’s studynotes

4 Types of Predictive Models You have been asked to target customer that will not only purchase insurance but will also purchase the largest premiums What type of model would be built here? In last lecture’s studynotes In last lecture’s studynotes –

5 Geocoding is the process that assigns a latitude-longitude coordinate to an address. Once a latitude-longitude coordinate is assigned, the address can be displayed on a map or used in a spatial search. Data miners often use these coordinates to calculate such things as “distance to the nearest store” Creating the Analytical File- Geo-Coding

6 Demographic AnalysisPopulationCountPopulationCount AgeDistributionAgeDistribution Average Age StoreLocationStoreLocation GeoProfile

7 Creating the Analytical File-What is Geocoding? Let’s look at a sample of what some data might look like? How do we use this data to create meaningful variables? - use the latitude metric and longitude metrics and then use pythagorean theorem to calculate distance between the two postal codes Ex: distance between A1A5A2 and B5V1A2= Distance=square root of abs.value[(7-5)**2+(20-10)**2] =10.19degrees Above number has to then be converted to kilometres or miles

8 Creating the Analytical File-What is Geocoding Example: –A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information –As a marketer, how would you intelligently use this information

9 Correlation Coefficient

10 Correlation Coefficient

11 Correlation Analysis The male gender variable has a perfect correlation of +1. The female gender variable has a perfect correlation of -1. Household size has no correlation with response, hence the correlation coefficient is 0.

12 Correlation Results Show the level of confidence which a given variable has with the modelled behaviour i.e. response Correlation coefficient Confidence Interval

13 Examples-Correlation-Response Model Listed below is an example of a correlation matrix Answer the following: Is each variable relevant What is the relationship or impact of each variable with response What is the strongest variable and what is the weakest variable? Income –relevant and positive imact Age-relevant and negative impact Product Spend in last 12 months-relevant and negative Live in Quebec-not relevant and negative Tenure-relevant and negative # in household-not relevant and positive # of months since last promoted-relevant and negative # of months since last purchase-not relevant and positive Pay with credit card-relevant and positive Gender is male-relevant and negative strongest variable-# of months since last promoted Weakest variable-live in Quebec

14 Exploratory Data Analysis Reports(EDA) After looking at the correlation reports, we also need to create EDA reports which help to better understand the relationship of a given variable with the desired marketing behaviour. It helps the business people and marketers to get inside the so-called black box of modelling.

15 Exploratory Data Analysis Reports(EDA)

16 Exploratory Data Analysis Reports(EDA) Let’s take a look at example of a binary variable On the next page are some examples of EDA reports of variables that are not statistically significant according to the correlation matrix. Male# of ObservationsResponse Rate Yes % No % Average %

17 Exploratory Data Analysis Reports(EDA) EDA’s of non-stat.sign. variables

18 More examples of correlation Previous analysis has indicated the following trends Would the correlations be closer to 1,-1, or 0 here for both variables? Would the correlations be closer to 1,-1, or 0 here for both variables? Closer to 0 here Closer to –1 here

19 More examples of correlation Would the correlations be closer to 1,-1, or 0 here for both variables? Would the correlations be closer to 1,-1, or 0 here for both variables? What is the learning here vs. the previous slide What is the learning here vs. the previous slide Closer to +1 here Closer to –1 here

20 Exploratory Data Analysis Reports Exploratory Data Analysis Reports: What does this tell us? Younger are more likely to respond What does this tell us? No trend exists here

21 Exploratory Data Analysis Reports What does this mean? Clearly, there is more of a binary rather than linear relationship here. Would create binary variable on income >=40K Not quite binary but not perfect in linear sense, would create index variables here

22 Creating the Final Model Why couldn’t we just use results of correlation to create model and create index values for each sign.variable. –Age –Tenure –# of products purchased –# of promotions since last purchase Think Statistics here? Independent or predictor variables have interaction here known as multicollinearity and this interaction must be accounted for when building any model. The interaction between model variables(independent variables) will have an impact on the actual variable weight or coefficient within any model equation that is parametric( ie. there are weights or coefficients associated with each parameter)

23 Creating the Final Model Need to account for interaction here Let’s take a look at some equations

24 The Data Mining Process : Application of Data Mining Techniques-Creating the Final Model Problems with Multicollinearity Example: Years of Education and Income on Response Rate Regression Equation is: Response= *income -.03*yrs. of education Years of Income Education Correlation Coefficient Confidence Interval99% 99.50% Response What is the problem here and what do you do? Income and education are highly correlated causing education to flip its sign within the model equation. I would either replace it with some other variable that does not reduce the model power too much or I would create an interaction variable between the two(i.e. Age X Income) Problems with Multicollinearity Example: Years of Education and Income on Response Rate Regression Equation is: Response= *income -.03*yrs. of education

25 Continuing to build the model Multivariate analytical techniques such as multiple regression,logistic regression,etc. may be employed to produce the final model Final equation: Predicted Response Rate:= A –B1*Age +B2*tenure Corr. Coeff. is +.5 for age and +.55 for tenure What is the problem here? Age has flipped What other diagnostics would you undertake to better understand the situation? Examine correlation coefficient between these two variable and compare this result to other independent variable correlations. The magnitude of the age and tenure correlation should be much greater than other independent variable correlations

26 Continuing to build the model VariableCorrelation Spend0.6 Live in Ontario0.5 Number in House-0.3 Response=A (+.05 X spend) (-.03 X Live in Ontario) (-.01 X Number in House) VariableCorrelation # of products0.6 Credit Score0.4 Tenure-0.2 Response=A (-.03*number of products) (+.08 X Credit Score) (-.01 X tenure)

27 Continuing to build the model After observing correlation results and EDA’s what can we begin to do at this point. –Derive new variables-EDA’s –Derive new variables-multicollinearity –Derive new variables-Factor Analysis –Derive new variables-CHAID(will explore later) Reference Material: Factor Analysis-look up in any Statistics Handbook Regression-look up in textbook under Regression and Statistics Regression.

28 Continuing to build the model Running further statistical routines, we are able to develop a final model. The marketer or business person should receive a report that looks as follows: For those of you that have statistics training, how is the % Contribution to model calculated derived? Looks at the partial R2 of each variable and calculates as follows: % contribution= partial R2/ total R2

29 Continuing to Build the Model VariablePartialModel EnteredR-Square var var var var var var

30 Continuing to Build the Model What would be the final equation in terms of the sign? The equation should have the same signs as seen above from the impact column

31 Continuing to build the model What would you do here What would you do here I would conduct upfront segmentation as live in Quebec is overwhelmingly strong and essentially indicates that we have a one variable model. Create two segments-live in Quebec and Rest of Canada and perhaps develop models to each of these segments

32 Continuing to build the model Suppose we have the following equation: Suppose we have the following equation: Response= X Income +.06 X Tenure +.08 X Product Spend -.04 X Male What is the problem here? What is the problem here? Problem tenure-sign is inconsistent between report and actual equation-doublecheck actual equation and coefficient signs Problem tenure-sign is inconsistent between report and actual equation-doublecheck actual equation and coefficient signs