x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,]
y. The decimal point is 2 digit(s) to the right of the | 2 | | | | 55 log10(x). |The decimal point is at the | 4 | 02 4 | | | 9 6 | 0
OLS. S(A,B) = (Y i - A - BX i ) 2 min = E i 2 "sum of residuals-squared" SSR Residuals. -0 | | | | 679 S E 2 = SSR/(n-2) = 45.11
Total sum of squares (TSS) TSS = (Y i - ) 2 = i = A + BX i Residual sum of squares (RSS) RSS = (Y i - i ) 2 = Regression sum of squares (RegSS) RegSS = ( i - ) 2 = 6173
Analysis of variance TSS = RegSS + RSS Analysis of Variance Table. via anova(lm(y~log10(x)) Response: y Df Sum Sq Mean Sq F value Pr(>F) log10(x) Residuals
Call: lm(formula = y ~ log10(x)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) log10(x) Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 19 degrees of freedom Multiple R-squared: , Adjusted R-squared: , r = S XY /(S X S Y )=.371 F-statistic: on 1 and 19 DF, p-value:
Conclusion. Not much in the way of a linear relationship Might have taken log10(x) as the explanatory and y as the explanatory. Why not? "Correlation does not imply causation" Perhaps there is an important missing explanatory
Crime? FBI records violent crime values per thousand, z x y z [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,]
Multiple regression. S(A, B 1, B 2 ) = (Y i - A - B 1 X i1 - B 2 X i2 ) 2 min = E i 2 "sum of residuals-squared" Analysis of variance TSS = RegSS + RSS multiple R-squared = RegSS/TSS
Total sum of squares (TSS) TSS = (Y i - ) 2 = i = A + B 1 X i1 + B 2 X i2 Residual sum of squares (RSS) RSS = (Y i - i ) 2 = Regression sum of squares (RegSS) RegSS = ( i - ) 2 = 6173
Analysis of variance TSS = RegSS + RSS zz = z/x ANOVA Table. via anova(lm(y~log10(x) + zz) Response: y Df Sum Sq Mean Sq F value Pr(>F) log10(x) zz Residuals
Call: lm(formula = y ~ (x) + zz) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) log10(x) zz Residual standard error: on 18 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 18 DF, p-value:
Conclusion. Including violent crime does not appear to have helped too much But R-squared did rise
Normal equations. Y i = A + BX i + E i S(A,B) = (Y i - A - BX i ) 2 (-2) (Y i - A - BX i ) = 0 E i = 0 (-2) X i (Y i - A - BX i ) = 0 X i E i = 0