Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation,

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Managerial Economics in a Global Economy
Multivariate Regression
Welcome to Econ 420 Applied Regression Analysis
The Multiple Regression Model.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Introduction and Overview
Multicollinearity Multicollinearity - violation of the assumption that no independent variable is a perfect linear function of one or more other independent.
Multiple Linear Regression Model
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Lecture 25 Multiple Regression Diagnostics (Sections )
Lecture 24 Multiple Regression (Sections )
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Topic 3: Regression.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Ordinary Least Squares
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Hypothesis Testing in Linear Regression Analysis
Regression Method.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
What does it mean? The variance of the error term is not constant
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Ordinary Least Squares Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Robust Estimators.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
PADM 692 | Data Analysis II Session II Linear Regression Diagnostics March 17, 2012 University of La Verne Soomi Lee, PhD Copyright © by Soomi Lee Do not.
Ch5 Relaxing the Assumptions of the Classical Model
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
Chapter 9 Multiple Linear Regression
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Fundamentals of regression analysis
Diagnostics and Transformation for SLR
Serial Correlation and Heteroskedasticity in Time Series Regressions
Chapter 6: MULTIPLE REGRESSION ANALYSIS
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Serial Correlation and Heteroscedasticity in
Regression Diagnostics
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Transformation for SLR
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Serial Correlation and Heteroscedasticity in
Presentation transcript:

Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation, and Influential Cases

1.“Excessive” intercorrelations among X variables. 2.The degree of collinearity can be defined as R 2 for a model that regresses X k variable on all the other X variables, the proportion of the variance of X k explained by the other X’s. 3.The tolerance of X k is the proportion of its variance not shared by the other X’s, 1 - R 2. 4.Bivariate correlation coefficients are not effective measures because it is not bicollinearity Conducting Social ResearchMulticollinearity

Dominant Variable An independent variable that is definitionally related to the dependent variable. Variable that appears on both sides of the equation. Masked the effects of other variable. Conducting Social Research

Perfect Multicollinearity Rare. Problem is obvious. Solution is obvious. Conducting Social Research

1.Estimates remain unbiased. 2.Variances and standard errors of estimates increase. 3.t-tests will decrease and p-values will increase. 4.Estimates of multicollinear variables will become very sensitive to changes in specification. 5.Overall fit of the equation will remain unaffected. 6.Coefficients of non-multicollinear variables will remain largely unaffected. Conducting Social Research Multicollinearity Consequences

1.The extent to which an independent variable can be explained by all other independent variables. 2.Calculated for each independent variable. 3.Index of the increase in the variance of an estimated coefficient due to multicollinearity. 4.A high index means a lower t-test. Conducting Social Research Multicollinearity Detection Variance Inflation Factor

Conducting Social Research Variance Inflation Factor Regression Model Auxilliary Regression Model

1.There is no critical value. 2.If R 2 of Auxilliary Regression is 0.8 then VIF is Then 80% of the variance in the independent variable is explained by the other independent variables. 4.The denominator is the tolerance or TOL. Conducting Social Research VIF Critical Value

1.Do nothing. 2.Drop a redundant variable. 3.Form an index or other combination of the multicollinear variables. 4.Transform the variables into first differences, that is change since a previous period. 5.Increase sample size. Conducting Social Research Multicollinearity Solutions No Perfect Solution

1.The error terms of the observations are correlated with one another. 2.Can exist in any model in which the order of the observations is meaningful (temporal or spatial). 1.Temporal-the value of the error term is correlated with the error term for the same observation from a different time period. 2.Spatial-the value of the error term is correlated with the error term for neighboring observations. Conducting Social Research Serial Correlation Autocorrelation

1.The error terms of the observations are correlated with one another. 2.Can exist in any model in which the order of the observations is meaningful (temporal or spatial). 1.Temporal-the value of the error term is correlated with the error term for the same observation from a different time period. 2.Spatial-the value of the error term is correlated with the error term for neighboring observations. Conducting Social Research Serial Correlation

1.Estimates remain unbiased if the model is correctly specified. 2.OLS no longer the minimum variance estimator. 3.Causes bias in the standard errors of the coefficients. 4.In positive serial correlation, t-tests will increase and p-values will decrease. Conducting Social Research Serial Correlation Consequences

1.Assumptive. 2.Correlation of lagged residuals. 3.Durbin-Watson d Statistic. 4.Moran’s I and other global and local spatial auto-correlation statistics. Conducting Social Research Serial Correlation Detection

1.Include lagged dependent variable in model. 2.Generalized Least Squares. 3.Spatial Autoregressive Models. Conducting Social Research Serial Correlation Solutions

A case is influential if its deletion substantially changes the regression results. Not all outliers are influential. In bivariate regression a scatterplot may identify influential cases/outliers. In multivariate regression influence results from a particular combination of values on all variables in the regression. Conducting Social Research Influential Cases Outliers – Not Necessarily

problems with outliers in the y-direction (response direction) problems with multivariate outliers in the x-space (i.e., outliers in the covariate space, which are also referred to as leverage points) problems with outliers in both the y- direction and the x-space Conducting Social Research Influential Cases

Measure the influence of the ith case on the kth regression coefficient. Conducting Social Researchdfbetas

External scaling. Absolute: |dfbeta ik | > 2 Size adjusted: |dfbeta ik | > 2/squareroot(n) Internal scaling. Univariate outlier detection Gaps. Plots Conducting Social Research Assessment Criteria

Influential cases “bedevil” regression and many other statistical methods. Influence statistics and leverage plots lessen the risk of over-looking influence problems. Plots can detect influential clusters that influence statistics have difficulty detecting. Once detected, influential cases should not necessarily be thrown out. Conducting Social Research Influential Cases

Identify and include omitted variable(s). Report results both with and without the influential cases (footnote). “This is simple and honest.” Examine influential cases closely and if they reflect measurement error or belong to another population correct or delete them. If the influential cases come from a leptokurtic (fat tail) distribution, transform the variable to reduce the thickness of the tails. Try robust regression, which is less susceptible to influence. Conducting Social ResearchAlternatives

1.Statisticians believe that classical methods (e.g. OLS) are robust. 2.Robust methods are said to particularly benefit “unsophisticated researchers” (Hamilton). 3.The theoretical expositions are less accessible. 4.There are several competing methods. 5.There were many methods that were tried and failed. 6.Robust methods are much more computing- intensive than OLS. Conducting Social Research Robust Regression An Unpopular Alternative

Conducting Social Research OLS and Robust Regression OLS is the best linear unbiased estimator (BLUE) given normally-distributed errors. The BLUE for non-Gaussian error distributions is unclear. Non-normality takes countless forms and therefore cannot be accounted from by one estimator.

Conducting Social Research Robust Regression Objectives 1.Produce consistent and reasonably efficient estimates when the assumed model is true. 2.Produce only slightly impaired estimates due to small departures from the model. 3.Not be drastically affected by “somewhat larger” departures from the model.

Conducting Social Research Resistant and Robust Estimates An estimator is resistant if its value is not “much” affected by small changes in sample data. An estimator is robust if it performs well even when there are small violations about the underlying assumptions. Most resistant estimators are also distributionally robust.

Measure the influence of the ith case on the kth regression coefficient. Conducting Social Researchdfbetas

Regression and Matrix Notation where y is the n×1 vector of responses, X is the n×p design matrix (rows are observations and columns are explanatory variables), is the p×1 vector of unknown parameters, and is the n×1 vector of unknown errors.

Conducting Social Research The Projection Matrix The Hat Matrix

Is a scaled measure of the change in the predicted value for the ith case. Conducting Social Researchdffits

External scaling. Absolute: |dffits i | > 2 Size adjusted: |dfbeta i | > 2*squareroot(k/n) Conducting Social Research Assessment Criteria

Conducting Social Research Robust Regression and SAS M Estimation Introduced by Huber (1973), and it is the simplest approach both computationally and theoretically. Although it is not robust with respect to leverage points, it is still used extensively in analyzing data for which it can be assumed that the contamination is mainly in the response direction.

Conducting Social Research Least Trimmed Squares (LTS) estimation is a high breakdown value method introduced by Rousseeuw (1984). The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. Robust Regression and SAS Least Trimmed Squares

Conducting Social Research S estimation is a high breakdown value method introduced by Rousseeuw and Yohai (1984). With the same breakdown value, it has a higher statistical efficiency than LTS estimation. MM estimation, introduced by Yohai (1987), combines high breakdown value estimation and M estimation. It has both the high breakdown property and a higher statistical efficiency than S estimation. Robust Regression and SAS S Estimation

Conducting Social Research MM estimation, introduced by Yohai (1987), combines high breakdown value estimation and M estimation. It has both the high breakdown property and a higher statistical efficiency than S estimation. Robust Regression and SAS MM Estimation