Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Simple Linear Regression and Correlation (Part II) By Asst. Prof. Dr. Min Aung.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Multiple Regression Predicting a response with multiple explanatory variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Simple Linear Regression: An Introduction Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
5.1 Basic Estimation Techniques  The relationships we theoretically develop in the text can be estimated statistically using regression analysis,  Regression.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
EQT 272 PROBABILITY AND STATISTICS
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Econ 3790: Business and Economics Statistics
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Lecture 4: Inference in SLR (continued) Diagnostic approaches in SLR BMTRY 701 Biostatistical Methods II.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Simple Linear Regression ANOVA for regression (10.2)
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 13 Multiple Regression
Regression Analysis Relationship with one independent variable.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
The general linear test approach to regression analysis.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
1 1 Slide © 2011 Cengage Learning Assumptions About the Error Term  1. The error  is a random variable with mean of zero. 2. The variance of , denoted.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 19 Measure of Variation in the Simple Linear Regression Model (Data)Data.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Lecture 11: Simple Linear Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Statistics for Business and Economics (13e)
Relationship with one independent variable
Quantitative Methods Simple Regression.
Slides by JOHN LOUCKS St. Edward’s University.
Review of Chapter 3 where Multiple Linear Regression Model:
Prepared by Lee Revere and John Large
Relationship with one independent variable
St. Edward’s University
Presentation transcript:

Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II

ANOVA  Analysis of Variance  Similar in derivation to ANOVA that is generalization of two-sample t-test  Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE  The sum of the two parts is the total sum of squares: SST

Total Deviations:

Regression Deviations:

Error Deviations:

Definitions

Example: logLOS ~ BEDS > ybar <- mean(data$logLOS) > yhati <- reg$fitted.values > sst <- sum((data$logLOS- ybar)^2) > ssr <- sum((yhati - ybar )^2) > sse <- sum((data$logLOS - yhati)^2) > > sst [1] > ssr [1] > sse [1] > sse+ssr [1] >

Degrees of Freedom  Degrees of freedom for SST: n - 1 one df is lost because it is used to estimate mean Y  Degrees of freedom for SSR: 1 only one df because all estimates are based on same fitted regression line  Degrees of freedom for SSE: n - 2 two lost due to estimating regression line (slope and intercept)

Mean Squares  “Scaled” version of Sum of Squares  Mean Square = SS/df  MSR = SSR/1  MSE = SSE/(n-2)  Notes: mean squares are not additive! That is, MSR + MSE ≠ SST/(n-1) MSE is the same as we saw previously

Standard ANOVA Table SSdfMS Regression SSR1MSR Error SSEn-2MSE Total SSTn-1

ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS e-06 *** Residuals

Inference?  What is of interest and how do we interpret?  We’d like to know if BEDS is related to logLOS.  How do we do that using ANOVA table?  We need to know the expected value of the MSR and MSE:

Implications  mean of sampling distribution of MSE is σ 2 regardless of whether or not β 1 = 0  If β 1 = 0, E(MSE) = E(MSR)  If β 1 ≠ 0, E(MSE) < E(MSR)  To test significance of β 1, we can test if MSR and MSE are of the same magnitude.

F-test  Derived naturally from the arguments just made  Hypotheses: H 0 : β 1 = 0 H 1 : β 1 ≠ 0  Test statistic: F* = MSR/MSE  Based on earlier argument we expect F* >1 if H 1 is true.  Implies one-sided test.

F-test  The distribution of F under the null has two sets of degrees of freedom (df) numerator degrees of freedom denominator degrees of freedom  These correspond to the df as shown in the ANOVA table numerator df = 1 denominator df = n-2  Test is based on

Implementing the F-test  The decision rule  If F* > F(1- α; 1, n-2), then reject Ho  If F* ≤ F(1- α; 1, n-2), then fail to reject Ho

F-distributions

ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS e-06 *** Residuals > qf(0.95, 1, 111) [1] > 1-pf(24.44,1,111) [1] e-06

More interesting: MLR  You can test that several coefficients are zero at the same time  Otherwise, F-test gives the same result as a t- test  That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result: H 0 : β 1 = 0 H 1 : β 1 ≠ 0

general F testing approach  Previous seems simple  It is in this case, but can be generalized to be more useful  Imagine more general test: Ho: small model Ha: large model  Constraint: the small model must be ‘nested’ in the large model  That is, the small model must be a ‘subset’ of the large model

Example of ‘nested’ models Model 1: Model 2: Model 3: Models 2 and 3 are nested in Model 1 Model 2 is not nested in Model 3 Model 3 is not nested in Model 2

Testing: Models must be nested!  To test Model 1 vs. Model 2 we are testing that β 2 = 0 Ho: β 2 = 0 vs. Ha: β 2 ≠ 0 If β 2 = 0, then we conclude that Model 2 is superior to Model 1 That is, if we reject the null hypothesis Model 2: Model 1:

R reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data) reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data) reg3 <- lm(LOS ~ INFRISK + ms, data=data) > anova(reg1) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK e-10 *** ms * NURSE nurse Residuals

R > anova(reg2) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK e-10 *** NURSE nurse Residuals > anova(reg1, reg2) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + NURSE + nurse2 Res.Df RSS Df Sum of Sq F Pr(>F)

R > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.355e e < 2e-16 *** INFRISK 6.289e e e-06 *** ms 7.829e e NURSE 4.136e e nurse e e Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 108 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 108 DF, p-value: 1.298e-08 >

Testing more than two covariates  To test Model 1 vs. Model 3 we are testing that β 3 = 0 AND β 4 = 0 Ho: β 3 = β 4 = 0 vs. Ha: β 3 ≠ 0 or β 4 ≠ 0 If β 3 = β 4 = 0, then we conclude that Model 3 is superior to Model 1 That is, if we reject the null hypothesis Model 1: Model 3:

R > anova(reg3) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK e-10 *** ms * Residuals > anova(reg1, reg3) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + ms Res.Df RSS Df Sum of Sq F Pr(>F)

R > summary(reg3) Call: lm(formula = LOS ~ INFRISK + ms, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** INFRISK e-08 *** ms * --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 110 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 110 DF, p-value: 8.42e-10

Testing multiple coefficients simultaneously  Region: it is a ‘factor’ variable with 4 categories