Download presentation
Presentation is loading. Please wait.
Published byCandace York Modified over 9 years ago
1
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation, and Influential Cases
2
1.“Excessive” intercorrelations among X variables. 2.The degree of collinearity can be defined as R 2 for a model that regresses X k variable on all the other X variables, the proportion of the variance of X k explained by the other X’s. 3.The tolerance of X k is the proportion of its variance not shared by the other X’s, 1 - R 2. 4.Bivariate correlation coefficients are not effective measures because it is not bicollinearity Conducting Social ResearchMulticollinearity
3
Dominant Variable An independent variable that is definitionally related to the dependent variable. Variable that appears on both sides of the equation. Masked the effects of other variable. Conducting Social Research
4
Perfect Multicollinearity Rare. Problem is obvious. Solution is obvious. Conducting Social Research
5
1.Estimates remain unbiased. 2.Variances and standard errors of estimates increase. 3.t-tests will decrease and p-values will increase. 4.Estimates of multicollinear variables will become very sensitive to changes in specification. 5.Overall fit of the equation will remain unaffected. 6.Coefficients of non-multicollinear variables will remain largely unaffected. Conducting Social Research Multicollinearity Consequences
6
1.The extent to which an independent variable can be explained by all other independent variables. 2.Calculated for each independent variable. 3.Index of the increase in the variance of an estimated coefficient due to multicollinearity. 4.A high index means a lower t-test. Conducting Social Research Multicollinearity Detection Variance Inflation Factor
7
Conducting Social Research Variance Inflation Factor Regression Model Auxilliary Regression Model
8
1.There is no critical value. 2.If R 2 of Auxilliary Regression is 0.8 then VIF is 5.0. 3.Then 80% of the variance in the independent variable is explained by the other independent variables. 4.The denominator is the tolerance or TOL. Conducting Social Research VIF Critical Value
9
1.Do nothing. 2.Drop a redundant variable. 3.Form an index or other combination of the multicollinear variables. 4.Transform the variables into first differences, that is change since a previous period. 5.Increase sample size. Conducting Social Research Multicollinearity Solutions No Perfect Solution
10
1.The error terms of the observations are correlated with one another. 2.Can exist in any model in which the order of the observations is meaningful (temporal or spatial). 1.Temporal-the value of the error term is correlated with the error term for the same observation from a different time period. 2.Spatial-the value of the error term is correlated with the error term for neighboring observations. Conducting Social Research Serial Correlation Autocorrelation
11
1.The error terms of the observations are correlated with one another. 2.Can exist in any model in which the order of the observations is meaningful (temporal or spatial). 1.Temporal-the value of the error term is correlated with the error term for the same observation from a different time period. 2.Spatial-the value of the error term is correlated with the error term for neighboring observations. Conducting Social Research Serial Correlation
12
1.Estimates remain unbiased if the model is correctly specified. 2.OLS no longer the minimum variance estimator. 3.Causes bias in the standard errors of the coefficients. 4.In positive serial correlation, t-tests will increase and p-values will decrease. Conducting Social Research Serial Correlation Consequences
13
1.Assumptive. 2.Correlation of lagged residuals. 3.Durbin-Watson d Statistic. 4.Moran’s I and other global and local spatial auto-correlation statistics. Conducting Social Research Serial Correlation Detection
14
1.Include lagged dependent variable in model. 2.Generalized Least Squares. 3.Spatial Autoregressive Models. Conducting Social Research Serial Correlation Solutions
15
A case is influential if its deletion substantially changes the regression results. Not all outliers are influential. In bivariate regression a scatterplot may identify influential cases/outliers. In multivariate regression influence results from a particular combination of values on all variables in the regression. Conducting Social Research Influential Cases Outliers – Not Necessarily
16
problems with outliers in the y-direction (response direction) problems with multivariate outliers in the x-space (i.e., outliers in the covariate space, which are also referred to as leverage points) problems with outliers in both the y- direction and the x-space Conducting Social Research Influential Cases
17
Measure the influence of the ith case on the kth regression coefficient. Conducting Social Researchdfbetas
18
External scaling. Absolute: |dfbeta ik | > 2 Size adjusted: |dfbeta ik | > 2/squareroot(n) Internal scaling. Univariate outlier detection Gaps. Plots Conducting Social Research Assessment Criteria
19
Influential cases “bedevil” regression and many other statistical methods. Influence statistics and leverage plots lessen the risk of over-looking influence problems. Plots can detect influential clusters that influence statistics have difficulty detecting. Once detected, influential cases should not necessarily be thrown out. Conducting Social Research Influential Cases
20
Identify and include omitted variable(s). Report results both with and without the influential cases (footnote). “This is simple and honest.” Examine influential cases closely and if they reflect measurement error or belong to another population correct or delete them. If the influential cases come from a leptokurtic (fat tail) distribution, transform the variable to reduce the thickness of the tails. Try robust regression, which is less susceptible to influence. Conducting Social ResearchAlternatives
21
1.Statisticians believe that classical methods (e.g. OLS) are robust. 2.Robust methods are said to particularly benefit “unsophisticated researchers” (Hamilton). 3.The theoretical expositions are less accessible. 4.There are several competing methods. 5.There were many methods that were tried and failed. 6.Robust methods are much more computing- intensive than OLS. Conducting Social Research Robust Regression An Unpopular Alternative
22
Conducting Social Research OLS and Robust Regression OLS is the best linear unbiased estimator (BLUE) given normally-distributed errors. The BLUE for non-Gaussian error distributions is unclear. Non-normality takes countless forms and therefore cannot be accounted from by one estimator.
23
Conducting Social Research Robust Regression Objectives 1.Produce consistent and reasonably efficient estimates when the assumed model is true. 2.Produce only slightly impaired estimates due to small departures from the model. 3.Not be drastically affected by “somewhat larger” departures from the model.
24
Conducting Social Research Resistant and Robust Estimates An estimator is resistant if its value is not “much” affected by small changes in sample data. An estimator is robust if it performs well even when there are small violations about the underlying assumptions. Most resistant estimators are also distributionally robust.
25
Measure the influence of the ith case on the kth regression coefficient. Conducting Social Researchdfbetas
26
Regression and Matrix Notation where y is the n×1 vector of responses, X is the n×p design matrix (rows are observations and columns are explanatory variables), is the p×1 vector of unknown parameters, and is the n×1 vector of unknown errors.
27
Conducting Social Research The Projection Matrix The Hat Matrix
28
Is a scaled measure of the change in the predicted value for the ith case. Conducting Social Researchdffits
29
External scaling. Absolute: |dffits i | > 2 Size adjusted: |dfbeta i | > 2*squareroot(k/n) Conducting Social Research Assessment Criteria
30
Conducting Social Research Robust Regression and SAS M Estimation Introduced by Huber (1973), and it is the simplest approach both computationally and theoretically. Although it is not robust with respect to leverage points, it is still used extensively in analyzing data for which it can be assumed that the contamination is mainly in the response direction.
31
Conducting Social Research Least Trimmed Squares (LTS) estimation is a high breakdown value method introduced by Rousseeuw (1984). The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. Robust Regression and SAS Least Trimmed Squares
32
Conducting Social Research S estimation is a high breakdown value method introduced by Rousseeuw and Yohai (1984). With the same breakdown value, it has a higher statistical efficiency than LTS estimation. MM estimation, introduced by Yohai (1987), combines high breakdown value estimation and M estimation. It has both the high breakdown property and a higher statistical efficiency than S estimation. Robust Regression and SAS S Estimation
33
Conducting Social Research MM estimation, introduced by Yohai (1987), combines high breakdown value estimation and M estimation. It has both the high breakdown property and a higher statistical efficiency than S estimation. Robust Regression and SAS MM Estimation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.