Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent Variables
Common Assumptions 1)Errors are independent, that is not correlated with X or the errors of other cases. 2)Errors have identical distributions (iid), with mean zero and equal variance (Homoscedasticity) for every value of X. 3)Errors are normally distributed. Conducting Social Research
Violation of Assumptions Nonconstant error variance Correlation among errors Nonnormal errors Conducting Social Research Other Problems Omitted variables Nonlinear relationships Influential cases
Omitted Variables An important variable that is left out of the model Causes Unmeasured Variable Forgotten Variable Consequences No coefficient for the Omitted Variable Bias in the other Coefficients No longer Minimum Variance Conducting Social Research
Omitted Variable Bias (Specification Bias) Conducting Social Research
Ramsey’s Regression Specification Error Test Conducting Social Research
Irrelevant Variables An unimportant variable that is included in the model Causes Error in Theory Measurement Error Improper Variable Selection Procedure Consequence No longer Minimum Variance Conducting Social Research
Irrelevant Variable (Not biased)
Irrelevant Variable (Nonminimum Variance) Conducting Social Research
1.Theory: Is the variable’s place in the equation unambiguous and theoretically sound? 2.t-Test: Is the variable’s estimated coefficient significant in the expected direction. 3.Adjusted Coefficient of Determination: Does the overall fit of the equation improve when the variable is added. 4.Bias: Do other variables’ coefficients change significantly when the variable is added to the equation? Conducting Social Research Model Selection Principles
When the four criteria do not agree, use careful judgment and do not rely on a single criterion. Conducting Social Research Model Selection Principles
1.Rely on theory rather than statistical fit as much as possible. 2.Minimize the number of equations estimated. 3.Reveal “all” alternative specifications estimated. Conducting Social Research Model Selection Principles
1.Searching tends to exploit chance patterns in the sample. 2.If variables are correlated, searching will often exclude one or more of them making it appear unimportant. 3.If variables are positively correlated but their effect on the dependent variable is opposite, one or both may be excluded. Conducting Social Research Model Selection-Searching
1.Forward inclusion: Start with no variables in the model and then add variables based on the largest increase in R 2. 2.Backward elimination: Start with all possible variables in the model and then delete variables that produce the smallest decrease in R 2. 3.Stepwise selection: Iteratively including and eliminating variables based on R 2, F- test, or t-tests with a stopping rule. Conducting Social Research Model Selection-Searching
1.Statistical significance is overstated because the significance of the previous models is ignored. 2.The researchers criteria for selecting various results are not disclosed. Conducting Social Research Model Selection-Searching
1.Purposely estimating a number of alternative specifications the determine the robustness of the model. 2.Attempts to find results that are significant in a number of specifications. 3.Opposite Selection Searching, which attempts to find one specification that is different from the other specifications. Conducting Social Research Sensitivity Analysis
1.Develop a model using one sample. 2.Validate the model using a second sample. Conducting Social Research Model Validation (Data Mining with Dual Samples)
1.Information Criteria are measures of goodness of fit or uncertainty for the range of values of the data. 2.Information criteria measure the difference between a given model and the “true” underlying model. 3.AIC is a function of the number of observations n, the SSE and the number of parameters p. 4.There are two parts, lack of fit term and a parameter penalty term. Conducting Social Research Information Criteria
Conducting Social Research Akaike Information Criterion
1.As the number of parameters in the model increases, the lack of fit term decreases while the penalty term increases. 2.The model with the smallest AIC is deemed the “best” model since it minimizes the difference from the given model to the “true” model. Conducting Social ResearchAIC
Bayesian Information Criterion (Schwarz Criterion)
1.The BIC is an increasing function of RSS and an increasing function of k. 2.Unexplained variation in the dependent variable and the number of explanatory variables increase the value of BIC. 3.Lower BIC implies either fewer explanatory variables, better fit, or both. 4.The BIC penalizes free parameters more strongly than does the Akaike information criterion. Conducting Social ResearchBIC
1.“Excessive” intercorrelations among X variables. 2.The degree of collinearity can be defined as R 2 for a model that regresses X k variable on all the other X variables, the proportion of the variance of X k explained by the other X’s. 3.The tolerance of X k is the proportion of its variance not shared by the other X’s, 1 - R 2. Conducting Social Research Multiple Collinearity