Multicollinearity: an introductory example A high-tech business wants to measure the effect of advertising on sales and likes to distinguish between traditional.

Slides:



Advertisements
Similar presentations
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Advertisements

Quantitative Techniques
Assumptions in linear regression models
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Multiple Linear Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Linear Regression/Correlation
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Correlation & Regression
INFERENTIAL STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Example of Simple and Multiple Regression
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
Objectives of Multiple Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Introduction to Linear Regression and Correlation Analysis
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Chapter 12 Multiple Regression and Model Building.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Understanding Regression Analysis Basics. Copyright © 2014 Pearson Education, Inc Learning Objectives To understand the basic concept of prediction.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Lecture 4 Introduction to Multiple Regression
Correlation & Regression. The Data SPSS-Data.htmhttp://core.ecu.edu/psyc/wuenschk/SPSS/ SPSS-Data.htm Corr_Regr.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Simple linear regression Tron Anders Moger
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Correlation and Simple Linear Regression
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
Regression Analysis Simple Linear Regression
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 15 Linear Regression
Correlation and Simple Linear Regression
Linear Regression/Correlation
Correlation and Simple Linear Regression
I271b Quantitative Methods
Multiple Linear Regression
Regression Forecasting and Model Building
Financial Econometrics Fin. 505
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multicollinearity: an introductory example A high-tech business wants to measure the effect of advertising on sales and likes to distinguish between traditional advertising (TV and newspapers) and advertising on internet. –Y : sales in $m –X 1 : advertising in $m –X 2 : internet in $m Data: Sales3.sav

A matrix scatter plot of the data Cor(y, x1)0.983 Cor(y, x2)0.986 Cor(x1,x2)0.990 x 1 and x 2 are strongly correlated, i.e. they have a substantial amount of common information x 1 = α 0 + α 1 x 2 + ε

Regression output R R2R2 Adj R 2 DS 1,983 a,965,962,9764 Anova b SSdfMSFSig. 1Regressione265, ,438,000 a Residuo9,53410,953 Totale275,00011 Coefficients Model tSig. BDS 1 (Costante),885,6961,272,232 X1 = traditional advertising in $m 2,254,13516,686,000 Using x 1 only With equivalent results when using x 2 only

Regression output Using x 1 and x 2 R R2R2 Adj R 2 DS 1,987 a,974,968,8916 Anova b SSdfMSFSig. 1Regressione267, ,923168,483,000 a Residuo7,1549,795 Totale275,00011 Coefficients tSig. BDS 1 (Costante)1,992,9022,210,054 X1 = traditional advertising in $m,767,868,884,400 X2 = internet advertising in $m 1,275,7371,730,118 x 1 and x 2 are not significant anymore

Multicollinearity Multicollinearity exists when two or more of the independent variables are moderately or highly correlated with each other. In practice if independent variables are (highly) correlated they contribute too much redundant information which prevents isolating the effect of single independent variables on y. Confusion is often the result. High levels of multicollinearity: a)inflate the variance of the β estimates b)regression results maybe misleading and confusing. In the extreme case, if there exists perfect correlation among some of the independent variables, OLS estimates cannot be computed. x i = α 0 + α 1 x j +… + α p x j+p + ε, j+p<k, i≠j,j+1,…, j+p

Detecting Multicollinearity The following are indicators of multicollinearity: 1.Significant correlations between pairs of independent variables in the model (sufficient but not necessary). 2.Nonsignificant t-tests for all (or nearly all) the individual β parameters when the F test for model adequacy H 0 : β 1 = β 2 = … = β k = 0 is significant. 3.Opposite signs (from what expected) in the estimated parameters 4.A variance inflation factor (VIF) for a β parameter greater that 10. The VIFs can be calculated in SPSS by selecting “Collinearity diagnostics” in the “Statistics” options in the “Regression” dialog box.

A typical situation Multicollinearity can arise when transforming variables, e.g. using x 1 and x 1 2 in the regression equations if the range of values of x 1 is limited. Cor(x,x 2 )=0.987

Remember, if the multicollinearity is present but not excessive (no high correlations, no VIFs above 10), you can ignore it. Each variable provides enough independent information and one can assess its value. If your main goal is explaining relationships, then the multicollinearity maybe a problem because measured effects can be misleading. If your main goal is prediction (using the available explanatory variables to predict the response), then you can safely ignore the multicollinearity.

Some solutions to Multicollinearity Get more data if you can. Drop one or more of the correlated independent variables from the final model. A screening procedure like Stepwise regression may be helpful in determining which variable to drop. If you keep all independent variables be cautios in interpreting parameter values and keep prediction within the range of your data. Use Ridge regression (we do not touch this subject in the course).

Some solutions to MC If the multicollinearity is introduced by the use of higher order models (e.g. use x and x 2 or x 1, x 2 and x 1 x 2 ) use IV as deviations from their mean. Example: suppose multicollinearity is present in E(Y) = β 0 + β 1 x + β 2 x 2 1)Compute: x* = x – Mean(X) 2)Run the regression E(Y) = β 0 + β 1 x* + β 2 (x*) 2 In most cases multicollinearity is greatly reduced. Clearly the parameters β of the new regression will have different values and meaning.

Example: Shipping costs - continues –Y : cost of shipment in dollars –X 1 : package weight in pounds –X 2 : distance shipped in miles Model 1: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x β 5 x 2 2 Data: Express.sav A company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. It is suspected that non linear effect may be present, let us analyze the model

Matrix scatter-plot A matrix scatter- plot shows at once the bivariate scatter plots for the selected variables. Use it as preliminary screening. In SPSS choose the “Matrix” option from “Scatter/Dot” Graph and input the variables of interest Note the obvious quadratic relation for some of the variables, very close to linearity Symmetric matrix, just look at the lower triangle

Correlation matrix Correlazioni Weight of parcel Distance shipped Cost of shipm. Weight squared Dist. squared Weight* Dist. Weight of parcel in lbs. Correlazione1,182,774 **,967 **,151,820 ** Sig. (2-code),444,000,524,000 N20 Distance shipped Correlazione,1821,695 **,202,980 **,633 ** Sig. (2-code),444,001,393,000,003 N20 Cost of shipment Correlazione,774 **,695 ** 1,799 **,652 **,989 ** Sig. (2-code),000,001,000,002,000 N20 Weight squared Correlazione,967 **,202,799 ** 1,160,821 ** Sig. (2-code),000,393,000,500,000 N20 Distance squared Correlazione,151,980 **,652 **,1601,590 ** Sig. (2-code),524,000,002,500,006 N20 Weight*Dist ance Correlazione,820 **,633 **,989 **,821 **,590 ** 1 Sig. (2-code),000,003,000,006 N20 **. La correlazione è significativa al livello 0,01 (2-code). Individually strongly related to Y

Model 1:VIF statistics A VIF statistics larger than 10 is usualy considered an indicator of substantial collinearity The VIFs can be calculated in SPSS by selecting “Collinearity diagnostics” in the “Statistics” options in the “Regression” dialog box. Coefficienti a Model tSig. BDSVIF 1(Costante),827,7021,178,259 Weight of parcel in lbs. -,609,180-3,386,00420,031 Distance shipped,004,008,503,62335,526 Weight squared,090,0204,442,00117,027 Distance squared1,507E-5,000,672,51328,921 Weight*Distance,007,00111,495,00012,618

Model 2: Using IV as deviations from their mean Note: problems of multicollinearity have disappeared Coefficients a Model tSig. BDSVIF 1 (Costante)5,467,21625,252,000 X1star1,263,04230,128,0001,087 X2star,038,00127,563,0001,081 X1x2star,007,00111,495,0001,095 X1star2,090,0204,442,0011,113 x2star21,507E-5,000,672,5131,120 Note: R-square (adjusted), ANOVA table and prediction are the same for the two models (check). Seems actually irrelevant, drop it