The Problem of Large Correlations Among the Independent Variables

Slides:



Advertisements
Similar presentations
Items to consider - 3 Multicollinearity
Advertisements

More on understanding variance inflation factors (VIFk)
Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
© McGraw-Hill Higher Education. All Rights Reserved. Chapter 2F Statistical Tools in Evaluation.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Comparing the Various Types of Multiple Regression
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Prediction/Regression
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Simple Linear Regression Analysis
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Objectives of Multiple Regression
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Regression with 2 IVs Generalization of Regression from 1 to 2 Independent Variables.
Understanding Multivariate Research Berry & Sanders.
Sampling and Nested Data in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Model Building III – Remedial Measures KNNL – Chapter 11.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Chapter 16 Data Analysis: Testing for Associations.
Right Hand Side (Independent) Variables Ciaran S. Phibbs June 6, 2012.
Correlation & Regression. The Data SPSS-Data.htmhttp://core.ecu.edu/psyc/wuenschk/SPSS/ SPSS-Data.htm Corr_Regr.
Review for Final Examination COMM 550X, May 12, 11 am- 1pm Final Examination.
Education 795 Class Notes P-Values, Partial Correlation, Multi-Collinearity Note set 4.
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
Assumption checking in “normal” multiple regression with Stata.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
رگرسیون چندگانه Multiple Regression
Market Intelligence Class 11. Regression - Basics Terminology – In simple regression with a single variable, we get a zero-order effect (full effect)
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 15 Multiple Regression Model Building
Logistic Regression When and why do we use logistic regression?
Multiple Imputation using SOLAS for Missing Data Analysis
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
INFERENTIAL STATISTICS: REGRESSION ANALYSIS AND STANDARDIZATION
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Chapter 12: Regression Diagnostics
Fundamentals of regression analysis
Regression.
Simple Linear Regression
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
I271b Quantitative Methods
Regression Diagnostics
The estimate of the proportion (“p-hat”) based on the sample can be a variety of values, and we don’t expect to get the same value every time, but the.
Regression Forecasting and Model Building
Checking Assumptions Primary Assumptions Secondary Assumptions
Chapter 13 Additional Topics in Regression Analysis
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Financial Econometrics Fin. 505
REGRESSION DIAGNOSTICS
Regression Part II.
Presentation transcript:

The Problem of Large Correlations Among the Independent Variables Collinearity The Problem of Large Correlations Among the Independent Variables

Skill Set What is collinearity? Why is it a problem? How do I know if I’ve got it? What can I do about it?

Collinearity Defined Within the set of IVs, one or more IVs are (nearly) totally predicted by the other IVs. In such a case, the b or beta weights are poorly estimated. Problem of the “Bouncing Betas.”

Diagnostics 1. Variance Inflation Factor (VIF). Standard error of the b weight with 2 IVs: Sampling Variance of b weight VIF

VIF (2) Standard Error with k predictors: Large values of VIF are trouble. Some say values > 10 are high.

Tolerance Tolerance is Small values are trouble. Maybe .10?

Condition Index Lambda is an eigenvalue.   Number Eigenval Condition Variance Proportions   Index Constant X1 X2 X3 1 3.771 1.00 .004 .006 .008 2 .106 5.969 .003 .029 .268 .774 3 .079 6.90 .000 .749 .397 .066 4 .039 9.946 .993 .215 .329 .152 Number refers to a linear combination of the predictors. Eigenvalue refers to the variance of that combination. Collinearity is spotted by finding 2 or more variables that have large proportions of variance (.50 or more) that correspond to large condition indices. A rule of thumb is to label as large those condition indices in the range of 30 or larger. No apparent problem here.

Condition Index (2) Number Eigenval Condition Variance Proportions   Index Constant X1 X2 X3 1 3.819 1.00 .004 .006 .002 2 .117 5.707 .043 .384 .041 .087 3 .047 9.025 .876 .608 .001 .042 4 .017 15.128 .077 .967 .868 The last condition index (15.128) is highly associated with X2 and X3. The b weights for X2 and X3 are probably not well estimated.

Dealing with Collinearity Lump it. Admit ambiguity; SE of b weights. Refer also to correlations. Select or combine variables. Factor analyze set of IVs. Use another type of analysis (e.g., path analysis). Use another type of regression (ridge regression). Unit weights (no longer regression).

Review What is collinearity? Why is collinearity a problem? What is the VIF? What is Tolerance? What is a condition index? What are some things you can do to deal with collinearity?