Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic effects –example: earnings and age –plotting F-tests comparing models Example from Sociology of Religion 1
Review: Regression with Dummy Variables 2 Create dummy variables for age: why? age is an interval variable, what advantage is there to creating a series of dummies? gen byte age25=0 if age<. /* new variable, age25, will be missing if age is missing */ replace age25=1 if age>=25 & age<=29 gen byte age30=0 if age<. replace age30=1 if age>=30 & age<=34 gen byte age35=0 if age<. replace age35=1 if age>=35 & age<=39 gen byte age40=0 if age<. replace age40=1 if age>=40 & age<=44 gen byte age45=0 if age<. replace age45=1 if age>=45 & age<=49 gen byte age50=0 if age<. replace age50=1 if age>=50 & age<=55 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age25-age50) tab agecheck, missing
Stata Shortcut for Dummy Variables 3 gen byte agecat= floor(age/5)*5 tab agecat, gen(age) * floor function deletes decimal places: * e.g., at age 23: floor(23/5)*5 = floor(4.6)*5 = 4*5 = 20 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age1-age6) tab agecheck, missing drop if age 54
Regression with Age Dummy Variables 4. regress conrinc age2-age6 if sex==1 Source | SS df MS Number of obs = F( 5, 719) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age2 | age3 | age4 | age5 | age6 | _cons | Same R-squared and overall F, but different b’s and t’s (although same relative order):. regress conrinc age1-age5 if sex==1 Source | SS df MS Number of obs = F( 5, 719) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age1 | age2 | age3 | age4 | age5 | _cons |
Plot Earnings by Age 5. tab age, sum(conrinc) | Summary of respondent income in age of | constant dollars respondent | Mean Std. Dev. Freq | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Total |
Regression Test for Curvilinearity 6 test whether x has a curvilinear relationship with y: testing for a quadratic relationship is the most common, but not the only method of testing for curvilinearity. y i = β 0 + β 1 x i + β 2 x i 2 + e i test whether β 2 ≠ 0 o if β 2 > 0, then U-shape curve (or part) o if β 2 < 0, then inverted-U curve (or part) o if β 2 !> 0 & β 2 !< 0, then revert to linear equation by dropping x 2 β 1 is rather irrelevant in this test o if p(β 2 ≠ 0)>.05 and p(β 1 ≠ 0)>.05, that does not mean there is no linear relationship.
Curvilinear Regression Equation: β 2 7 y i = β 0 + β 1 x i + β 2 x i 2 + e i β 2 (quadratic coefficient) determines how steeply the curve accelerates: y = 2x 2 ; y = x 2 ; y =.5 x 2
Curvilinear Regression Equation: β 2 < 0 8 y i = β 0 + β 1 x i + β 2 x i 2 + e i β 2 (quadratic coefficient) < 0 then curve is inverted-U y = -2x 2 ; y = -x 2 ; y = -.5 x 2
Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum 9 y i = β 0 + β 1 x i + β 2 x i 2 + e i inflexion point = value of x when y is a maximum or minimum = - β 1 / 2β 2 y = -20x x inflexion= -800 / (-20 * 2) = 20 (i.e., below observed x values) y = -100x x – inflexion = / (-100 *2) = 40 (i.e., within the x range) y = -20x x inflexion = / (-20 * 2) = 60 (i.e., above observed values)
Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum 10 y i = β 0 + β 1 x i + β 2 x i 2 + e i for completeness, when β 2 is positive: inflexion point = value of x when y is a maximum or minimum = - β 1 / 2β y = 20x x inflexion= / (20 * 2) = 20 (i.e., below observed x values) y = 100x x inflexion = / (-100 *2) = 40 (i.e., within the x range) y = 20x x inflexion = / (-20 * 2) = 60 (i.e., above observed values)
Example: Regression with Curvilinear Age 11. gen int agesq=age*age. summarize age agesq Variable | Obs Mean Std. Dev. Min Max age | agesq | regress conrinc age agesq if sex==1 Source | SS df MS Number of obs = F( 2, 722) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | agesq | _cons | t agesq = -3.52; p <.001, so: curvilinear; b agesq = negative, so: inverted U; inflexion point = -b age / (2 * b agesq) ) = / (2 * ) = 47.4 so maximum earnings at age 47 and a half.
Cubic Polynomials 12 Occasionally (actually, rarely), it is worthwhile to investigate whether a more complex polynomial would better describe the curvilinear relationship. Add a cubic term (x 3 ) to the previous quadratic equation: y i = β 0 + β 1 x i + β 2 x i 2 + β 3 x i 3 + e i Test β 3 ≠ 0 o if you can’t show β 3 ≠ 0, then revert to quadratic model o if p(β 3 ≠ 0) >.05, then don’t interpret β 2 and β 1 if β 3 ≠ 0, then curve has at least two bends (although not necessarily over the range of observed x’s)
Cubic Polynomials: Earnings and Age Example. regress conrinc age agesq agecu if sex==1 Source | SS df MS Number of obs = F( 3, 721) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | agesq | agecu | _cons | Note: after age cubed in entered, none of the coefficients are statistically significant (even though age and age squared were in the quadratic model). So, since β agecubed is not statistically significant, revert to the quadratic model (DON’T conclude that age has no relationship with earnings!) 13
Cubic Polynomials: Actual Results 14
Inferences: F-tests Comparing models 15 Comparing Regression Models, Agresti & Finlay, p 409: Where: R c 2 = R-square for complete model, R r 2 = R-square for reduced model, k = number of explanatory variables in complete model, g = number of explanatory variables in reduced model, and N = number of cases.
Example: F-tests Comparing models 16 Complete model: men’s earnings on age, age square, age cubed, education, and currently married dummy. Reduced model: men’s earnings on education and currently married dummy. F-test comparing model is whether age variables, as a group, have a significant relationship with earnings after controls for education and marital status
Example: F-tests Comparing models 17 Complete model: men’s earnings. regress conrinc age agesq agecu educ married if sex==1 Source | SS df MS Number of obs = F( 5, 719) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | agesq | agecu | educ | married | _cons | Note: none of the three age coefficients are, by themselves, statistically significant. R c 2 =.2387; k = 5.
Example: F-tests Comparing models 18 Reduced model: men’s earnings. regress conrinc educ married if sex==1 Source | SS df MS Number of obs = F( 2, 722) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] educ | married | _cons | R r 2 =.1818; g = 2.
Inferences: F-tests Comparing models 19 F = ( – ) / (5 – 2)df 1 =5-2; df 1 =725-6 ( ) / (725 – 6) = / /719 = 26.87, df=(3,719), p <.001 (Agresti & Finlay, table D, page 673)
Next: Regression with Interaction Effects 20 Examples with earnings: married x gender age x gender age x education marital status x gender