2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
The Multiple Regression Model.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Regression [ Cross-Sectional Data ]
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Statistics for Managers Using Microsoft® Excel 5th Edition
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Chapter 7 Multicollinearity. What is in this Chapter? In Chapter 4 we stated that one of the assumptions in the basic regression model is that the explanatory.
Predictive Analysis in Marketing Research
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Chapter 9 Multicollinearity
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
Selecting Variables and Avoiding Pitfalls Chapters 6 and 7.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Lecture 4 Introduction to Multiple Regression
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Correlation & Regression. The Data SPSS-Data.htmhttp://core.ecu.edu/psyc/wuenschk/SPSS/ SPSS-Data.htm Corr_Regr.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Multiple Regression.
Chapter 15 Multiple Regression and Model Building
Linear Regression.
Chapter 9 Multiple Linear Regression
Correlation and Simple Linear Regression
Analysis of Variance in Matrix form
Regression Diagnostics
Multiple Regression Analysis and Model Building
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Fundamentals of regression analysis
Multivariate Analysis Lec 4
Correlation and Simple Linear Regression
Regression Model Building - Diagnostics
11. Multiple Regression y – response variable
CHAPTER 29: Multiple Regression*
Multiple Regression.
Chapter 3 Multiple Linear Regression
Multiple Regression Models
Correlation and Simple Linear Regression
Multiple Regression Chapter 14.
Regression Model Building - Diagnostics
Simple Linear Regression and Correlation
Multiple Linear Regression
Chapter 13 Additional Topics in Regression Analysis
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014

FOCUS Definition of Multicollinearity Distinguish of Multicollinearity Remedial measures of Multicollinearity Example 6 April 20143

Multicollinearity Definition: predictor variable are highly correlated among themselves Example: body fat potential harm of collinearity: difficult to infer the separate influence of such explanatory variables on the response variable. 6 April 20144

Problems with Multicollinearity 1. adding or deleting a predictor variable change the regression coefficient. 2. the extra sum of square associated with a predictor varies, depending upon which other predictor variables are already included in the model. 3. the estimated SD of the regression coefficients become large 4. the estimated regression coefficients individually may not be statistically significant even though a definite statistical relations exists between the response variable and the set of predictor variables. 6 April 20145

6

diagnosis consists of two related but separate elements: 1- detecting the presence of collinear 2-assessing the extent to which these relationships have degraded estimated parameters. 6 April 20147

diagnostic Informal diagnostics for multicollinearity 6 April large changes in the estimated regression coefficient when a predictor variable is added or deleted, or when an observation is altered or deleted. 2- nonsignificant result in individual tests on the regression coefficient for important predictor variables. 3- estimated regression coefficient with an algebraic sign that is the opposite of that expected from theoretical considerations or prior experience. 4- large coefficient of simple correlation between pairs of predictor variable in the correlation matrix r xx. 5- wide confidence intervals for the regression coefficients representing important predictor variables.

limitation of informal diagnostics 1.they provide qualitative measurements 2.sometimes the observed behavior may occur without Multicollinearity being present.

Multicollinearity diagnostic methods Correlation matrix R (or ) of x`s (absence of high correlations cannot be viewed as evidence of no problem) Variance Inflation Factor (VIF) Weakness: 1.unable to reveal the presence of several coexisting near dependencies among the explanatory variates. 2.meaningful boundary to distinguish between values of VIF The technique of Farrar and Glauber (partial correlation) 6 April

The technique of Farrar and Glauber the n*p data matrix X is a sample of size n from a p-variate Gaussian (normal) distribution that is, the partial correlation between Xi and Xj, adjusted for all other X-variates, to investigate the patterns of interdependence in greater detail 6 April

Variance inflation factor (VIF) VIF: how much the variances of the ß are inflated as compared to when the xi`s are not linearly related. Variance-covariance matrix of the ß and ß*:

Diagnostic uses severity of multicollinearity: 1.Large value of VIF VIF > 10 2.means of the VIF : how far the estimated standardized regression coefficient b k * are from the true values β k *. It can be shown that the expected value of the sum of these squared errors (b k *-β k *) 2 is given by :

When no X variable is linearly related to the others in the regression model ; Sum of (VIF) k ≡ p-1 Provide useful information about the effect of multicollinearity on the sum of the squared errors :

Mean of the VIF values, to be denote by (VIF) : VIF > 1 indicate of serious Multicollinearity problems.

Body fat example ; The expected sum of the squared errors in the least squares standardized regression coefficient is nearly 460 times as large as it would be if the x variables were uncorrelated. Multicollinearity problem ?

Comments 1.reciprocal of the VIF for exclusion x variables: 2.Limitation of VIF: distinguish between several simultaneous multicollinearity

Remedial measures 1.Making predictions is not a problem 2.Centered data for x`s 3.Dropping one or more predictors 4.Add some cases that may break the pattern of multicollinearity 5.Use different data sets to estimate different coefficients 6.Principal component analysis 6 April

Ridge regression By modifying the method of least square to allow biased estimator of the regression coefficients Ridge estimators: by the correlation transform Idea is to use a small biasing constant c and find 6 April

Ridge regression standardized ridge coefficients C amount of bias in estimator C=0 =OLS in standardized form c>0 `s are biased but more stable than OLS 6 April

Ridge regression Results: 1. As c increases, bias increases, variance of the betas decreases 2. There always exists a c for which the total MSE for ridge regression is SMALLER than that for OLS. 3. There are no hard and fast ways of finding c. 6 April

Choice of biasing constant c: 1.Ridge trace: Simultaneous plot of the values of the (p-1) estimated ridge standardized regression coefficients for different values of c between 0 and 1. 2.VIF 6 April

Choice of biasing constant c Smallest value of c where it is deemed that: 1- regression coefficients have steadied itself and 2- VIF is small 6 April

Comments: Limitation of ridge regression: 1-Precision of ridge regression coefficient: Bootstrap 2- choice of c 6 April

 Use ridge regression to reducing predictor variables: Unstable ridge trace with coefficient tending toward zero Ridge trace is stable but at a very small value Unstable ridge trace that do not tend toward zero: candidate 6 April

VIF - SPSS 6 April

Output- Spss 6 April

Example 2 – VIF value and remedial measure 6 April