Regression Forced March Spring 2006
Regression quantifies how one variable can be described in terms of another
Black Elected Officials Example I
Stop a second: What is the correlation between beo & bpop?.72,.82,.92?
The Linear Relationship between Two Variables
The Linear Relationship between African American Population & Black Legislators
How did we get that line? 1. Pick a representative value of Y i YiYi
How did we get that line? 2. Decompose Y i into two parts
How did we get that line? 3. Label the points YiYi YiYi ^ εiεi Y i -Y i ^ “residual”
Stop a moment: What is g i ? Vagueness of theory Poor proxies (i.e., measurement error) Wrong functional form See Utts & Heckard discussion about the difference between deterministic relationships and statistical relationships
The Method of Least Squares YiYi YiYi ^ εiεi Y i -Y i ^
Solve for (Utts & Heckard, p. 164)
Solve for (Utts & Heckard, p. 164)
About the Functional Form Linear in the variables vs. linear in the parameters –Y = a + bX + e (linear in both) –Y = a + bX + cX 2 + e (linear in parms.) –Y = a + X b + e (linear in variables) –Y = a + lnX b /Z c + e (linear in neither) Utts & Heckard pp
Black Elected Officials
Log transformations Y = a + bX + eb = dY/dX, or b = the unit change in Y given a unit change in X Typical case Y = a + b lnX + eb = dY/(dX/X), or b = the unit change in Y given a % change in X Cases where there’s a natural limit on growth ln Y = a + bX + eb = (dY/Y)/dX, or b = the % change in Y given a unit change in X Exponential growth ln Y = a + b ln X + eb = (dY/Y)/(dX/X), or b = the % change in Y given a % change in X (elasticity) Economic production
How “good” is the fitted line?
Judging results Substantive interpretation of coefficients Technical judgment of regression –Judgment of coefficients –Judgment of overall fit
Determining Goodness of Fit I Coefficients –Standard error of a coefficient –t-statistic: coeff./s.e.
Standard error of the regression picture YiYi YiYi ^ εiεi Y i -Y i ^ Add these up after squaring
Determining Goodness of Fit Standard error of the regression or standard error of estimate (Root mean square error in STATA) d.f. = n-2
(Y i -Y i ) ^ R 2 picture Y _ (Y i -Y) ^ 0 10 beo bpop beo Fitted values
Y _ (Y i -Y) (Y i -Y i ) ^ (Y i -Y) ^ _ _ 0 10
Determining Goodness of Fit R-squared “coefficient of determination”
Return to Black Elected Officials Example. reg beo bpop Source | SS df MS Number of obs = F( 1, 39) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = beo | Coef. Std. Err. t P>|t| [95% Conf. Interval] bpop | _cons |
Residuals e i = Y i – B 0 – B 1 X i
AL IL
One important numerical property of residuals The sum of the residuals is zero.
Regression Commands in STATA reg depvar indvars predict newvar predict newvar, resid
Why It’s Called Regression Height of Fathers Height of Sons
Some Regressions
Temperature and Latitude
. reg jantemp latitude Source | SS df MS Number of obs = F( 1, 18) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = jantemp | Coef. Std. Err. t P>|t| [95% Conf. Interval] latitude | _cons | predict py (option xb assumed; fitted values). predict ry,resid
gsort -ry. list city jantemp py ry | city jantemp py ry | | | 1. | PortlandOR | 2. | SanFranciscoCA | 3. | LosAngelesCA | 4. | PhoenixAZ | 5. | NewYorkNY | | | 6. | MiamiFL | 7. | BostonMA | 8. | NorfolkVA | 9. | BaltimoreMD | 10. | SyracuseNY | | | 11. | MobileAL | 12. | WashingtonDC | 13. | MemphisTN | 14. | ClevelandOH | 15. | DallasTX | | | 16. | HoustonTX | 17. | KansasCityMO | 18. | PittsburghPA | 19. | MinneapolisMN | 20. | DuluthMN |
Bush Vote and Southern Baptists
. reg bush sbc_mpct Source | SS df MS Number of obs = F( 1, 48) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = bush | Coef. Std. Err. t P>|t| [95% Conf. Interval] sbc_mpct | _cons |
Weight by State Population. reg bush sbc_mpct [aw=votes] (sum of wgt is e+08) Source | SS df MS Number of obs = F( 1, 48) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = bush | Coef. Std. Err. t P>|t| [95% Conf. Interval] sbc_mpct | _cons |
Midterm loss & pres’l popularity
. reg loss gallup Source | SS df MS Number of obs = F( 1, 15) = 5.70 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = loss | Coef. Std. Err. t P>|t| [95% Conf. Interval] gallup | _cons |
. reg loss gallup if year>1948 Source | SS df MS Number of obs = F( 1, 12) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = loss | Coef. Std. Err. t P>|t| [95% Conf. Interval] gallup | _cons |