Download presentation
Presentation is loading. Please wait.
Published byHarvey Moore Modified over 9 years ago
1
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II
2
ANOVA Analysis of Variance Similar in derivation to ANOVA that is generalization of two-sample t-test Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE The sum of the two parts is the total sum of squares: SST
3
Total Deviations:
4
Regression Deviations:
5
Error Deviations:
6
Definitions
7
Example: logLOS ~ BEDS > ybar <- mean(data$logLOS) > yhati <- reg$fitted.values > sst <- sum((data$logLOS- ybar)^2) > ssr <- sum((yhati - ybar )^2) > sse <- sum((data$logLOS - yhati)^2) > > sst [1] 3.547454 > ssr [1] 0.6401715 > sse [1] 2.907282 > sse+ssr [1] 3.547454 >
8
Degrees of Freedom Degrees of freedom for SST: n - 1 one df is lost because it is used to estimate mean Y Degrees of freedom for SSR: 1 only one df because all estimates are based on same fitted regression line Degrees of freedom for SSE: n - 2 two lost due to estimating regression line (slope and intercept)
9
Mean Squares “Scaled” version of Sum of Squares Mean Square = SS/df MSR = SSR/1 MSE = SSE/(n-2) Notes: mean squares are not additive! That is, MSR + MSE ≠ SST/(n-1) MSE is the same as we saw previously
10
Standard ANOVA Table SSdfMS Regression SSR1MSR Error SSEn-2MSE Total SSTn-1
11
ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 *** Residuals 111 2.90728 0.02619
12
Inference? What is of interest and how do we interpret? We’d like to know if BEDS is related to logLOS. How do we do that using ANOVA table? We need to know the expected value of the MSR and MSE:
13
Implications mean of sampling distribution of MSE is σ 2 regardless of whether or not β 1 = 0 If β 1 = 0, E(MSE) = E(MSR) If β 1 ≠ 0, E(MSE) < E(MSR) To test significance of β 1, we can test if MSR and MSE are of the same magnitude.
14
F-test Derived naturally from the arguments just made Hypotheses: H 0 : β 1 = 0 H 1 : β 1 ≠ 0 Test statistic: F* = MSR/MSE Based on earlier argument we expect F* >1 if H 1 is true. Implies one-sided test.
15
F-test The distribution of F under the null has two sets of degrees of freedom (df) numerator degrees of freedom denominator degrees of freedom These correspond to the df as shown in the ANOVA table numerator df = 1 denominator df = n-2 Test is based on
16
Implementing the F-test The decision rule If F* > F(1- α; 1, n-2), then reject Ho If F* ≤ F(1- α; 1, n-2), then fail to reject Ho
17
F-distributions
18
ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 *** Residuals 111 2.90728 0.02619 > qf(0.95, 1, 111) [1] 3.926607 > 1-pf(24.44,1,111) [1] 2.739016e-06
19
More interesting: MLR You can test that several coefficients are zero at the same time Otherwise, F-test gives the same result as a t- test That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result: H 0 : β 1 = 0 H 1 : β 1 ≠ 0
20
general F testing approach Previous seems simple It is in this case, but can be generalized to be more useful Imagine more general test: Ho: small model Ha: large model Constraint: the small model must be ‘nested’ in the large model That is, the small model must be a ‘subset’ of the large model
21
Example of ‘nested’ models Model 1: Model 2: Model 3: Models 2 and 3 are nested in Model 1 Model 2 is not nested in Model 3 Model 3 is not nested in Model 2
22
Testing: Models must be nested! To test Model 1 vs. Model 2 we are testing that β 2 = 0 Ho: β 2 = 0 vs. Ha: β 2 ≠ 0 If β 2 = 0, then we conclude that Model 2 is superior to Model 1 That is, if we reject the null hypothesis Model 2: Model 1:
23
R reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data) reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data) reg3 <- lm(LOS ~ INFRISK + ms, data=data) > anova(reg1) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.4043 8.115e-10 *** ms 1 12.897 12.897 5.0288 0.02697 * NURSE 1 1.097 1.097 0.4277 0.51449 nurse2 1 1.789 1.789 0.6976 0.40543 Residuals 108 276.981 2.565 ---
24
R > anova(reg2) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 44.8865 9.507e-10 *** NURSE 1 8.212 8.212 3.1653 0.078. nurse2 1 1.782 1.782 0.6870 0.409 Residuals 109 282.771 2.594 --- > anova(reg1, reg2) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + NURSE + nurse2 Res.Df RSS Df Sum of Sq F Pr(>F) 1 108 276.981 2 109 282.771 -1 -5.789 2.2574 0.1359
25
R > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 *** INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 *** ms 7.829e-01 5.211e-01 1.502 0.136 NURSE 4.136e-03 4.093e-03 1.010 0.315 nurse2 -5.676e-06 6.796e-06 -0.835 0.405 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.601 on 108 degrees of freedom Multiple R-squared: 0.3231, Adjusted R-squared: 0.2981 F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08 >
26
Testing more than two covariates To test Model 1 vs. Model 3 we are testing that β 3 = 0 AND β 4 = 0 Ho: β 3 = β 4 = 0 vs. Ha: β 3 ≠ 0 or β 4 ≠ 0 If β 3 = β 4 = 0, then we conclude that Model 3 is superior to Model 1 That is, if we reject the null hypothesis Model 1: Model 3:
27
R > anova(reg3) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.7683 6.724e-10 *** ms 1 12.897 12.897 5.0691 0.02634 * Residuals 110 279.867 2.544 --- > anova(reg1, reg3) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + ms Res.Df RSS Df Sum of Sq F Pr(>F) 1 108 276.981 2 110 279.867 -2 -2.886 0.5627 0.5713
28
R > summary(reg3) Call: lm(formula = LOS ~ INFRISK + ms, data = data) Residuals: Min 1Q Median 3Q Max -2.9037 -0.8739 -0.1142 0.5965 8.5568 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.4547 0.5146 12.542 <2e-16 *** INFRISK 0.6998 0.1156 6.054 2e-08 *** ms 0.9717 0.4316 2.251 0.0263 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.595 on 110 degrees of freedom Multiple R-squared: 0.3161, Adjusted R-squared: 0.3036 F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10
29
Testing multiple coefficients simultaneously Region: it is a ‘factor’ variable with 4 categories
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.