Download presentation
Presentation is loading. Please wait.
2
x y z The data as seen in R [1,] 58035 354.559 46 population city manager compensation [2,] 120100 351.593 998 [3,] 102743 339.815 615 [4,] 117242 321.533 168 [5,] 137538 311.839 169 [6,] 101400 305.200 1095 [7,] 1000007 304.206 2439 [8,] 58047 292.977 204 [9,] 74900 285.698 199 [10,] 72200 283.900 180 [11,] 81400 282.573 404 [12,] 56727 264.282 168 [13,] 76000 255.228 232 [14,] 744230 250.739 5957 [15,] 93700 242.903 338 [16,] 77000 237.704 272 [17,] 62000 229.780 108 [18,] 146027 222.851 741 [19,] 17500 213.515 45 [20,] 48527 209.557 177 [21,] 11150 195.987 9
4
y. The decimal point is 2 digit(s) to the right of the | 2 | 0112344 2 | 5668899 3 | 01124 3 | 55 log10(x). |The decimal point is at the | 4 | 02 4 | 7888899999 5 | 0001112 5 | 9 6 | 0
5
OLS. S(A,B) = (Y i - A - BX i ) 2 min = E i 2 "sum of residuals-squared" SSR Residuals. -0 | 665 -0 | 44333210 0 | 1113334 0 | 679 S E 2 = SSR/(n-2) = 45.11
6
Total sum of squares (TSS) TSS = (Y i - ) 2 = 44829 i = A + BX i Residual sum of squares (RSS) RSS = (Y i - i ) 2 = 38656 Regression sum of squares (RegSS) RegSS = ( i - ) 2 = 6173
7
Analysis of variance TSS = RegSS + RSS Analysis of Variance Table. via anova(lm(y~log10(x)) Response: y Df Sum Sq Mean Sq F value Pr(>F) log10(x) 1 6173 6172.6 3.0339 0.0977. Residuals 19 38656 2034.5
8
Call: lm(formula = y ~ log10(x)) Residuals: Min 1Q Median 3Q Max -61.827 -34.134 -2.070 28.420 87.797 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 69.84 117.69 0.593 0.5599 log10(x) 41.34 23.73 1.742 0.0977. Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 45.11 on 19 degrees of freedom Multiple R-squared: 0.1377, Adjusted R-squared: 0.09231, r = S XY /(S X S Y )=.371 F-statistic: 3.034 on 1 and 19 DF, p-value: 0.0977
9
Conclusion. Not much in the way of a linear relationship Might have taken log10(x) as the explanatory and y as the explanatory. Why not? "Correlation does not imply causation" Perhaps there is an important missing explanatory
10
Crime? FBI records violent crime values per thousand, z x y z [1,] 58035 354.559 46 [2,] 120100 351.593 998 [3,] 102743 339.815 615 [4,] 117242 321.533 168 [5,] 137538 311.839 169 [6,] 101400 305.200 1095 [7,] 1000007 304.206 2439 [8,] 58047 292.977 204 [9,] 74900 285.698 199
11
Multiple regression. S(A, B 1, B 2 ) = (Y i - A - B 1 X i1 - B 2 X i2 ) 2 min = E i 2 "sum of residuals-squared" Analysis of variance TSS = RegSS + RSS multiple R-squared = RegSS/TSS
12
Total sum of squares (TSS) TSS = (Y i - ) 2 = 44829 i = A + B 1 X i1 + B 2 X i2 Residual sum of squares (RSS) RSS = (Y i - i ) 2 = 38656 Regression sum of squares (RegSS) RegSS = ( i - ) 2 = 6173
13
Analysis of variance TSS = RegSS + RSS zz = z/x ANOVA Table. via anova(lm(y~log10(x) + zz) Response: y Df Sum Sq Mean Sq F value Pr(>F) log10(x) 1 6173 6172.6 2.8883 0.1064 zz 1 189 188.6 0.0883 0.7698 Residuals 18 38467 2137.1
14
Call: lm(formula = y ~ (x) + zz) Residuals: Min 1Q Median 3Q Max -64.53 -33.96 -1.55 26.06 91.07 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 78.830 124.363 0.634 0.534 log10(x) 38.554 26.067 1.479 0.156 zz 1.257 4.232 0.297 0.770 Residual standard error: 46.23 on 18 degrees of freedom Multiple R-squared: 0.1419, Adjusted R-squared: 0.04656 F-statistic: 1.488 on 2 and 18 DF, p-value: 0.2523
15
Conclusion. Including violent crime does not appear to have helped too much But R-squared did rise
16
Normal equations. Y i = A + BX i + E i S(A,B) = (Y i - A - BX i ) 2 (-2) (Y i - A - BX i ) = 0 E i = 0 (-2) X i (Y i - A - BX i ) = 0 X i E i = 0
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.