Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing models Dummy variables F-tests comparing models Example from ASR 1
Review: Types of 3-variable Causal Models Spurious x 2 causes both x 1 and y e.g., age causes both marital status and earnings Intervening x 1 causes x 2 which causes y e.g., marital status causes more hours worked which raises annual earnings No statistical difference between these models. Statistical interaction effects: The relationship between x 1 and y depends on the value of another variable, x 2 e.g., the relationship between marital status and earnings is different for men and women. 2
Review: Regression models using Stata see: 3
Review: Regression models with Earnings Marital status, Age, and Hours worked. 4 Model 0Model 1Model 2 Married10,383.4***8,243.1***7,328.5***7,465.1*** Age702.1***631.6***640.2*** Hours worked281.3***278.3*** Constant35,065.3***8,836.3*-232.1n.s n.s. N R-square
Regression with Dummy Variables 5 Agresti and Finlay 12.3 (skim on analysis of variance) Example: marital status, 5 categories married widowed divorced separated never married
Regression with Dummy Variables: example 6 Example: marital status, 5 categories married widowed divorced separated never married. tab marital marital | status | Freq. Percent Cum married | widowed | divorced | separated | never married | Total | 1,
Dummy Variables: stata programming 7 * create 5 dummy variables from marital status: gen byte married=0 if marital<. replace married=1 if marital==1 gen byte widow=0 if marital<. replace widow=1 if marital==2 gen byte divorced=0 if marital<. replace divorced=1 if marital==3 gen byte separated=0 if marital<. replace separated=1 if marital==4 gen byte nevermar=0 if marital<. replace nevermar=1 if marital==5 * check marital dummies (maritalcheck should =1 for all nonmissing cases) egen byte maritalcheck=rowtotal(married widow divorced separated nevermar) tab marital maritalcheck, missing * shortcut method: tab marital, gen(mar) describe mar* * check new mar dummies (marcheck should =1 for all nonmissing cases) egen byte marcheck=rowtotal(mar1-mar5) tab marital marcheck, missin
Regression with Dummy Variables: example 8. regress conrinc mar1-mar4 if sex==1 Source | SS df MS Number of obs = F( 4, 720) = 9.78 Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] mar1 | mar2 | mar3 | mar4 | _cons | Omitted category = never married (mar5) b 1 = 14111; Currently married men earn on average $14,111 more than never married men. t= 6.09; p<001; so, statistically significant (more than single men).
Regression with Dummy Variables: example 9. regress conrinc mar1-mar4 if sex==1 Source | SS df MS Number of obs = F( 4, 720) = 9.78 Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] mar1 | mar2 | mar3 | mar4 | _cons | Omitted category = never married (mar5) b 2 = 11331; Currently widowed men earn on average $11,331 more than never married men. t= 1.59; p=.11; so, not statistically significant. So, no earnings difference between widowed men and never married men.
Regression with Dummy Variables: example 10. regress conrinc mar1-mar4 if sex==1 Source | SS df MS Number of obs = F( 4, 720) = 9.78 Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] mar1 | mar2 | mar3 | mar4 | _cons | Omitted category = never married (mar5) b 3 = ; Currently divorced men earn on average $6,710 more than never married men. t= 2.26; p<.05; so, statistically significant (more than single men). Note that b 3 < b 2, but b 3 is statistically significant even though b 2 is not. High standard error of b 2 (because few widowed men 25-54).
Inferences: F-tests Comparing models 11 Comparing Regression Models, Agresti & Finlay, p 409: Where: R c 2 = R-square for complete model, R r 2 = R-square for reduced model, k = number of explanatory variables in complete model, g = number of explanatory variables in reduced model, and N = number of cases.
Next: Regression with Interaction Effects 12 Examples with earnings: age x gender marital status x gender