Items to consider - 3 Multicollinearity

Slides:



Advertisements
Similar presentations
Chapter 9: Regression Analysis
Advertisements

More on understanding variance inflation factors (VIFk)
Prediction with multiple variables Statistics for the Social Sciences Psychology 340 Spring 2010.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Comparing the Various Types of Multiple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 6: Multiple Regression
Multiple Regression.
Predictive Analysis in Marketing Research
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
SPSS Statistical Package for Social Sciences Multiple Regression Department of Psychology California State University Northridge
Correlational Research Strategy. Recall 5 basic Research Strategies Experimental Nonexperimental Quasi-experimental Correlational Descriptive.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Objectives of Multiple Regression
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
MULTIPLE REGRESSION Using more than one variable to predict another.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Understanding Regression Analysis Basics. Copyright © 2014 Pearson Education, Inc Learning Objectives To understand the basic concept of prediction.
Multiple Regression Last Part – Model Reduction 1.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
The Goal of MLR  Types of research questions answered through MLR analysis:  How accurately can something be predicted with a set of IV’s? (ex. predicting.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Part IV Significantly Different Using Inferential Statistics Chapter 15 Using Linear Regression Predicting Who’ll Win the Super Bowl.
Part IV Significantly Different: Using Inferential Statistics
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Multiple Linear Regression Partial Regression Coefficients.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Education 795 Class Notes P-Values, Partial Correlation, Multi-Collinearity Note set 4.
Multiple Regression INCM 9102 Quantitative Methods.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Presented by Joe Boffa Bev Bricker Courtney Doussett.
Multiple Regression II 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 2) Terry Dielman.
Testing assumptions of simple linear regression
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Chapter 15 Multiple Regression Model Building
Correlation, Bivariate Regression, and Multiple Regression
Chapter 9 Multiple Linear Regression
The Problem of Large Correlations Among the Independent Variables
Understanding Regression Analysis Basics
Moderation, Mediation, and Other Issues in Regression
Regression Diagnostics
Regression.
Regression Model Building - Diagnostics
Regression Diagnostics
Regression Model Building - Diagnostics
Checking Assumptions Primary Assumptions Secondary Assumptions
Regression Analysis.
Chapter 13 Additional Topics in Regression Analysis
Regression Part II.
Presentation transcript:

Items to consider - 3 Multicollinearity 1 Multicollinearity The relationship between IV’s…when IV’s are highly correlated with one another What to do: Examine the correlation matrix of all IV’s & DV to detect any multicollinearity Look for r’s between IV’s in excess of 0.70 If detected, it is generally best (or at least most simple) to re-run MLR and eliminate one of the offending IV’s from the model (see model reduction, later) 2 3

Multicollinearity – what is it? It’s to do with unique and shared variance of the IV’s with the predictor & themselves Must establish what unique variance on each predictor (IV) is related to variance on criterion (DV) Example 1 (graphical): y – freshman college GPA predictor 1 – high school GPA predictor 2 – SAT total score predictor 3 – attitude toward education 1

Multicollinearity – what is it? Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here) y 1 3 x2 variance in y accounted for by predictor 2 after the effect of predictor 1 has been partialled out 5 Common variance in y that both predictors 1 and 2 account for x1 4 2

Multicollinearity – what is it? Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here) 1 x1 y x2 Total R2 = .66 or 66% 3 2

Multicollinearity – what is it? Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here) 2 1 x2 x1 y Total R2 = .33 or 33% 3 4

Multicollinearity – what is it? 1 Example 2 (words): y – freshman college GPA predictor 1 – high school GPA predictor 2 – SAT total score predictor 3 – attitude toward education 5 3 4 2

Multicollinearity – what is it? 1 = variance in college GPA predictable from variance in high school GPA = residual variance in SAT related to variance in college GPA = residual variance in attitude related to variance in college GPA

Multicollinearity – what is it? Consider these: A B C 1 X1 X2 X3 Y .2 .1 .3 .5 .4 .6 X1 X2 X3 Y .6 .5 .7 .2 .3 X1 X2 X3 Y .6 .7 .8 Which would we expect to have the largest overall R2, and which would we expect to have the smallest?

Multicollinearity – what is it? R2 will be at least .7 for B & C, but only at least .3 for A No chance of R2 for A getting much larger, because intercorrelations of X’s are as large for A as for B & C A B C 1 X1 X2 X3 Y .2 .1 .3 .5 .4 .6 X1 X2 X3 Y .6 .5 .7 .2 .3 X1 X2 X3 Y .6 .7 .8 2

Multicollinearity – what is it? R will probably be largest for B Predictors are correlated with Y Not much redundancy among predictors R probably greater in B than C, as C has considerable redundancy in predictors 1 2 A B C X1 X2 X3 Y .2 .1 .3 .5 .4 .6 X1 X2 X3 Y .6 .5 .7 .2 .3 X1 X2 X3 Y .6 .7 .8

What effect does the big M have? 1 Can increase SEE of regression coefficients (those with the multicollinearity) This can lead to insignificant findings for those coefficients So predictors that may be significant when used in isolation may not be significant when used together Can also lead to imprecision among regression coefficients (mistakes in estimating the change in Y for a unit change in the IV) So a model with multicollinearity is misleading, & can have redundancy among the predictors 2 3 4

What do we do about the big M? Many opinions E.g. O‘Brien (2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 5, 673-690 Can use “VIF” (variance inflation factor) and “tolerance” values in SPSS (“problem” variables are those with “VIF” < 4) Can painstakingly examine all possible versions of the model (putting each predictor in 1st) We’ll just signal multicollinearity with a r > .70, and enforce removal of at least one of the variables, and signal possible multicollinearity with a r of between .5 and .7, and suggest examination of the model with and without one of the variables. 1 2

The Goal of MLR The big picture… What we’re trying to do is create a model predicting a DV that explains as much of the variance in that DV as possible, while at the same time: Meet the assumptions of MLR Best manage the other issues – sample size, n of predictors, outliers, multicollinearity, r with dependent variable, significance in model Be parsimonious (can be very important) 1 2