Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests and inferences (A&F 11.4) 1
Review: Types of 3-variable Causal Models Spurious x 2 causes both x 1 and y e.g., age causes both marital status and earnings Intervening x 1 causes x 2 which causes y e.g., marital status causes more hours worked which raises annual earnings No statistical difference between these models. Statistical interaction effects: The relationship between x 1 and y depends on the value of another variable, x 2 e.g., the relationship between marital status and earnings is different for men and women. 2
Review: Causal Models with earnings & marital status bivariate relationship: 1.married earnings spuriousness: 2. married earnings age intervening: 3. married hoursearnings interaction effect: 4.married earnings gender 3
Review: Stata Commands describe summarize tab tab xcat, sum(yvar) drop if / keep if gen / replace ttest regress predict / predict, residuals histogram / scattergram graph box yvar, over(xvar) 4
Review: Regression models using Stata see: 5
Review: Regression models with Earnings, Marital status and Age bivariate relationship:. * association of earnings and marital status:. regress conrinc married Source | SS df MS Number of obs = F( 1, 723) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | _cons | spuriousness (partial):. * age makes the marriage-earnings relationship partly spurious:. regress conrinc married age Source | SS df MS Number of obs = F( 2, 722) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | age | _cons |
Review: Regression models with Earnings, Marital status and Hours Worked Intervening variable relationship (hours worked):. * hours worked explains some of how marital status increases earnings:. regress conrinc married age hrs1 Source | SS df MS Number of obs = F( 3, 660) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | age | hrs1 | _cons | But: problem with N! Create new hours worked:. gen hrs=hrs1 (101 missing values generated). replace hrs=hrs2 if hrs1>=. (24 real changes made, 2 to missing). replace hrs=0 if hrs1>=. & wrkstat>=3 (101 real changes made) 7
Review: Regression models with Earnings, Marital status and Hours Worked Intervening variable relationship (revised hours worked):. regress conrinc married age hrs Source | SS df MS Number of obs = F( 3, 721) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | age | hrs | _cons | b(married) reduced to from (N= 725 for both) 8
Review: Regression models with Earnings Marital status, Age, and Hours worked. 9 Model 0Model 1Model 2xModel 2 Married10,383.4***8,243.1***7,328.5***7,465.1*** Age702.1***631.6***640.2*** Hours worked281.3***278.3*** Constant35,065.3***8,836.3*-232.1n.s n.s. N R-square
Review: Regression models with Earnings and Marital status, separately by Gender Statistical Interaction Effect:. * association of earnings and marital status for men:. regress conrinc married if sex==1 Source | SS df MS Number of obs = F( 1, 723) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | _cons | * association of earnings and marital status for women:. regress conrinc married if sex==2 Source | SS df MS Number of obs = F( 1, 747) = 0.26 Model | Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | _cons |
Inferences: F-tests of global model H o : β 1 = β 2 =... β k = 0 α or β 0 ? F-tests of H 0 : Calculate new test statistic, F ratio of “explained variance” / “unexplained variance” F-distribution: ratio of chi-square distributions df 1 (numerator); df 2 (denominator) if df 1 =1, then F = t 2 Table D, pages Global F-test less useful (almost always significant unless you have a really bad model or very small N). Base for F-test comparing regression models (later) 11
F-test: Method 1, STATA output. regress conrinc married age hrs1 Source | SS df MS Number of obs = F( 3, 721) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] married | age | hrs | _cons | df 1 = 3 (= k = # parameters = β (married), β (age), β (hrs) ) df 2 = 721 [ = N – (k+1) = 725 – (3+1) ] F (3,721) = 2.60 (α =.05); >>
F-test: Method 2, using R-square 13
F-test: Method 3, using SSE and Model SS 14 F = e+10 / = 36.27
Inferences: β i 15 H 0 : β i = 0 what we are usually most interested in test statistic:
Next: Regression with Dummy Variables 16 Agresti and Finlay 12.3 (skim on analysis of variance) Example: marital status, 3 categories currently married never married widowed separated divorced