Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha
Previous Lecture Summary Outliers and Residuals Example of Model analysis for multiple regression
Outliers and Residuals The normal or unstandardized residuals are measured in the same units as the outcome variable and so are difficult to interpret across different models we cannot define a universal cut-off point for what constitutes a large residual we use standardized residuals, which are the residuals divided by an estimate of their standard deviation
Outliers and Residuals Some general rules for standardized residuals are derived from these facts: (1) standardized residuals with an absolute value greater than 3.29 (we can use 3 as an approximation) are cause for concern because in an average sample case a value this high is unlikely to happen by chance; (2) if more than 1% of our sample cases have standardized residuals with an absolute value greater than 2.58 (we usually just say 2.5) there is evidence that the level of error within our model is unacceptable (the model is a fairly poor fit of the sample data)
Outliers and Residuals (3) if more than 5% of cases have standardized residuals with an absolute value greater than 1.96 (we can use 2 for convenience) then there is also evidence that the model is a poor representation of the actual data. Studentized residual, which is the unstandardized residual divided by an estimate of its standard deviation that varies point by point. These residuals have the same properties as the standardized residuals but usually provide a more precise estimate of the error variance of a specific case.
Influential Cases There are several residual statistics that can be used to assess the influence of a particular case. Adjusted predicted value for a case when that case is excluded from the analysis. The computer calculates a new model without a particular case and then uses this new model to predict the value of the outcome variable for the case that was excluded If a case does not exert a large influence over the model then we would expect the adjusted predicted value to be very similar to the predicted value when the case is included
Influential Cases The difference between the adjusted predicted value and the original predicted value is known as DFFit We can also look at the residual based on the adjusted predicted value: that is, the difference between the adjusted predicted value and the original observed value. This is the deleted residual. The deleted residual can be divided by the standard deviation to give a standardized value known as the Studentized deleted residual. The deleted residuals are very useful to assess the influence of a case on the ability of the model to predict that case.
Influential Cases One statistic that does consider the effect of a single case on the model as a whole is Cook’s distance. Cook’s distance is a measure of the overall influence of a case on the model and Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern.
Mediation Refers to a situation when the relationship between a predictor variable and outcome variable can be explained by their relationship to a third variable (the mediator).
The Statistical Model
Baron & Kenny, (1986) Mediation is tested through three regression models: 1. Predicting the outcome from the predictor variable. 2. Predicting the mediator from the predictor variable. 3. Predicting the outcome from both the predictor variable and the mediator.
Baron & Kenny, (1986) Four conditions of mediation: 1. The predictor must significantly predict the outcome variable. 2. The predictor must significantly predict the mediator. 3. The mediator must significantly predict the outcome variable. 4. The predictor variable must predict the outcome variable less strongly in model 3 than in model 1.
Limitations of Baron & Kenny’s (1986) Approach How much of a reduction in the relationship between the predictor and outcome is necessary to infer mediation? people tend to look for a change in significance, which can lead to the ‘all or nothing’ thinking that p- values encourage.
Lecture Summary Mediation through multiple regression Example in SPSS