Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.

Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II

ANOVA  Analysis of Variance  Similar in derivation to ANOVA that is generalization of two-sample t-test  Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE  The sum of the two parts is the total sum of squares: SST

Total Deviations:

Regression Deviations:

Error Deviations:

Definitions

Example: logLOS ~ BEDS > ybar <- mean(data$logLOS) > yhati <- reg$fitted.values > sst <- sum((data$logLOS- ybar)^2) > ssr <- sum((yhati - ybar )^2) > sse <- sum((data$logLOS - yhati)^2) > > sst [1] 3.547454 > ssr [1] 0.6401715 > sse [1] 2.907282 > sse+ssr [1] 3.547454 >

Degrees of Freedom  Degrees of freedom for SST: n - 1 one df is lost because it is used to estimate mean Y  Degrees of freedom for SSR: 1 only one df because all estimates are based on same fitted regression line  Degrees of freedom for SSE: n - 2 two lost due to estimating regression line (slope and intercept)

Mean Squares  “Scaled” version of Sum of Squares  Mean Square = SS/df  MSR = SSR/1  MSE = SSE/(n-2)  Notes: mean squares are not additive! That is, MSR + MSE ≠ SST/(n-1) MSE is the same as we saw previously

Standard ANOVA Table SSdfMS Regression SSR1MSR Error SSEn-2MSE Total SSTn-1

ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 *** Residuals 111 2.90728 0.02619

Inference?  What is of interest and how do we interpret?  We’d like to know if BEDS is related to logLOS.  How do we do that using ANOVA table?  We need to know the expected value of the MSR and MSE:

Implications  mean of sampling distribution of MSE is σ 2 regardless of whether or not β 1 = 0  If β 1 = 0, E(MSE) = E(MSR)  If β 1 ≠ 0, E(MSE) < E(MSR)  To test significance of β 1, we can test if MSR and MSE are of the same magnitude.

F-test  Derived naturally from the arguments just made  Hypotheses: H 0 : β 1 = 0 H 1 : β 1 ≠ 0  Test statistic: F* = MSR/MSE  Based on earlier argument we expect F* >1 if H 1 is true.  Implies one-sided test.

F-test  The distribution of F under the null has two sets of degrees of freedom (df) numerator degrees of freedom denominator degrees of freedom  These correspond to the df as shown in the ANOVA table numerator df = 1 denominator df = n-2  Test is based on

Implementing the F-test  The decision rule  If F* > F(1- α; 1, n-2), then reject Ho  If F* ≤ F(1- α; 1, n-2), then fail to reject Ho

F-distributions

ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 *** Residuals 111 2.90728 0.02619 > qf(0.95, 1, 111) [1] 3.926607 > 1-pf(24.44,1,111) [1] 2.739016e-06

More interesting: MLR  You can test that several coefficients are zero at the same time  Otherwise, F-test gives the same result as a t- test  That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result: H 0 : β 1 = 0 H 1 : β 1 ≠ 0

general F testing approach  Previous seems simple  It is in this case, but can be generalized to be more useful  Imagine more general test: Ho: small model Ha: large model  Constraint: the small model must be ‘nested’ in the large model  That is, the small model must be a ‘subset’ of the large model

Example of ‘nested’ models Model 1: Model 2: Model 3: Models 2 and 3 are nested in Model 1 Model 2 is not nested in Model 3 Model 3 is not nested in Model 2

Testing: Models must be nested!  To test Model 1 vs. Model 2 we are testing that β 2 = 0 Ho: β 2 = 0 vs. Ha: β 2 ≠ 0 If β 2 = 0, then we conclude that Model 2 is superior to Model 1 That is, if we reject the null hypothesis Model 2: Model 1:

R reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data) reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data) reg3 <- lm(LOS ~ INFRISK + ms, data=data) > anova(reg1) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.4043 8.115e-10 *** ms 1 12.897 12.897 5.0288 0.02697 * NURSE 1 1.097 1.097 0.4277 0.51449 nurse2 1 1.789 1.789 0.6976 0.40543 Residuals 108 276.981 2.565 ---

R > anova(reg2) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 44.8865 9.507e-10 *** NURSE 1 8.212 8.212 3.1653 0.078. nurse2 1 1.782 1.782 0.6870 0.409 Residuals 109 282.771 2.594 --- > anova(reg1, reg2) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + NURSE + nurse2 Res.Df RSS Df Sum of Sq F Pr(>F) 1 108 276.981 2 109 282.771 -1 -5.789 2.2574 0.1359

R > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 *** INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 *** ms 7.829e-01 5.211e-01 1.502 0.136 NURSE 4.136e-03 4.093e-03 1.010 0.315 nurse2 -5.676e-06 6.796e-06 -0.835 0.405 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.601 on 108 degrees of freedom Multiple R-squared: 0.3231, Adjusted R-squared: 0.2981 F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08 >

Testing more than two covariates  To test Model 1 vs. Model 3 we are testing that β 3 = 0 AND β 4 = 0 Ho: β 3 = β 4 = 0 vs. Ha: β 3 ≠ 0 or β 4 ≠ 0 If β 3 = β 4 = 0, then we conclude that Model 3 is superior to Model 1 That is, if we reject the null hypothesis Model 1: Model 3:

R > anova(reg3) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.7683 6.724e-10 *** ms 1 12.897 12.897 5.0691 0.02634 * Residuals 110 279.867 2.544 --- > anova(reg1, reg3) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + ms Res.Df RSS Df Sum of Sq F Pr(>F) 1 108 276.981 2 110 279.867 -2 -2.886 0.5627 0.5713

R > summary(reg3) Call: lm(formula = LOS ~ INFRISK + ms, data = data) Residuals: Min 1Q Median 3Q Max -2.9037 -0.8739 -0.1142 0.5965 8.5568 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.4547 0.5146 12.542 <2e-16 *** INFRISK 0.6998 0.1156 6.054 2e-08 *** ms 0.9717 0.4316 2.251 0.0263 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.595 on 110 degrees of freedom Multiple R-squared: 0.3161, Adjusted R-squared: 0.3036 F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10

Testing multiple coefficients simultaneously  Region: it is a ‘factor’ variable with 4 categories

Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.

Similar presentations

Presentation on theme: "Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.

Similar presentations

Presentation on theme: "Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II."— Presentation transcript:

Similar presentations

About project

Feedback