Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple.

Slides:



Advertisements
Similar presentations
Welcome to Econ 420 Applied Regression Analysis
Advertisements

Items to consider - 3 Multicollinearity
Chapter 12 Simple Linear Regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
NOTATION & ASSUMPTIONS 2 Y i =  1 +  2 X 2i +  3 X 3i + U i Zero mean value of U i No serial correlation Homoscedasticity Zero covariance between U.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Generalized Linear Models (GLM)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Simple Linear Regression Analysis
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
Chapter 1: Introduction to Statistics
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
MULTIPLE REGRESSION Using more than one variable to predict another.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Regression Problem 1 What is your forecast fore the next period? In which period are we? 7. Next period is 8. Standard Deviation of Forecast = 2.09.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Lecture 4 Introduction to Multiple Regression
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Multiple Regression INCM 9102 Quantitative Methods.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Correlation. Up Until Now T Tests, Anova: Categories Predicting a Continuous Dependent Variable Correlation: Very different way of thinking about variables.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Feature Selection and Extraction Michael J. Watts
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Hypothesis Testing Example 3: Test the hypothesis that the average content of containers of a particular lubricant is 10 litters if the contents of random.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
HW 23 Key. 24:41 Promotion. These data describe promotional spending by a pharmaceutical company for a cholesterol-lowering drug. The data cover 39 consecutive.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Brain Size Samantha Stanley Jasmine Dumas Christopher Lee.
Chapter 15 Multiple Regression Model Building
Basic Estimation Techniques
Multiple Regression Analysis and Model Building
Fundamentals of regression analysis
Basic Estimation Techniques
Multiple Regression Models
STEM Fair Graphs.
Multiple Linear Regression
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

Collinearity

Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple regression, but not in multiple regression Variables not significant in multiple regression, but multiple regression model (as whole) significant Large changes in coefficient estimates between full and reduced models Large standard errors in multiple regression models despite high power

Collinearity and confounding independent variables Two independent variables, correlated with each other, where both influence the response

Methods Truth: y = x 1 + 3x 2 + N(0,2) x 1 = U[0,10] x 2 = x 1 + N(0,z) where z = U[0.5,20] Run simple regression between y and x 1 Run multiple regression between y and x 1 + x 2 No interactions!

Simple regression: y~x 1

Multiple regression: y~x 1 +x 2

Collinearity and redundant independent variables Two independent variables, correlated with each other, where only one influences the response, although we don’t know which one

Methods Truth: y = x 1 + N(0,2) x 1 = U[0,10] x 2 = x 1 + N(0,z) where z = U[0.5,20] Run simple regression between y and x 1 Run multiple regression between y and x 1 + x 2 No interactions!

Simple regression: y~x 1

Simple regression: y~x 2

Multiple regression: y~x 1 +x 2

What to do? Be sure to calculate collinearity and vif among independent variables (before you start your analysis) Pay attention to how coefficient estimates and variable significance change as variables are removed or added Be careful to identify potentially confounding variables prior to data collection

Is a variable redundant or confounding? Think! Extreme collinearity – Redundant Large changes in coefficient estimates of both variables between full and reduced models – Confounding Large changes in coefficient estimates of one variable between full and reduced models – Redundant – full model estimate close to zero Uncertain – assume confounding – Multiple regression always produces unbiased estimates (on average) regardless of type of collinearity

What to do? Confounding variables Be sure to sample in a manner that eliminates collinearity – Collinearity may be due to real collinearity or sampling artifact Use multiple regression – May have large standard errors if strong collinearity Include confounding variables even if non- significant Get more data – Decreases standard errors (vif)

What to do? Redundant variables Determine which variable explains response best using P-values from regression and changes in coefficient estimates with variable addition and removal Do not include redundant variable in final model – Reduces vif Try a variable reduction technique like PCA