Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Topic 12: Multiple Linear Regression
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Topic 15: General Linear Tests and Extra Sum of Squares.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
EPI 809/Spring Probability Distribution of Random Error.
Chapter 12 Simple Linear Regression
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Chapter 13 Multiple Regression
Chapter 10 Simple Regression.
Chapter 12 Multiple Regression
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
BCOR 1020 Business Statistics
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Topic 16: Multicollinearity and Polynomial Regression.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
Announcements: Homework 10: –Due next Thursday (4/25) –Assignment will be on the web by tomorrow night.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Lecture 4 SIMPLE LINEAR REGRESSION.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Simple Linear Regression ANOVA for regression (10.2)
ANOVA for Regression ANOVA tests whether the regression model has any explanatory power. In the case of simple regression analysis the ANOVA test and the.
Regression Analysis Relationship with one independent variable.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
Topic 4: Statistical Inference. Outline Statistical inference –confidence intervals –significance tests Statistical inference for β 1 Statistical inference.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
The general linear test approach to regression analysis.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Correlation and Regression 9.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Topic 21: ANOVA and Linear Regression. Outline Review cell means and factor effects models Relationship between factor effects constraint and explanatory.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 19 Measure of Variation in the Simple Linear Regression Model (Data)Data.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Multiple Regression.
Simple Linear Regression & Correlation
Regression model with multiple predictors
Relationship with one independent variable
Review of Chapter 3 where Multiple Linear Regression Model:
Multiple Regression.
Model Comparison: some basic concepts
ANOVA Table Models can be evaluated by examining variability.
Review of Chapter 2 Some Basic Concepts: Sample center
Relationship with one independent variable
The basic idea of Analysis of Variance (ANOVA) approach to Regression analysis In the previous topics, we have developed the basic regression model.
Pearson Correlation and R2
Presentation transcript:

Topic 7: Analysis of Variance

Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General linear test Pearson Correlation / R 2

Analysis of Variance Organize results arithmetically Total sum of squares in Y is Partition this into two sources –Model (explained by regression) –Error (unexplained / residual)

Total Sum of Squares MST is the usual estimate of the variance of Y if there are no explanatory variables SAS uses the term Corrected Total for this source Uncorrected is ΣY i 2 The “corrected” means that we subtract of the mean before squaring

Model Sum of Squares df R = 1 (due to the addition of the slope) MSR = SSR/df R KNNL uses regression for what SAS calls model So SSR (KNNL) is the same as SS Model

Error Sum of Squares df E = n-2 (both slope and intercept) MSE = SSE/df E MSE is an estimate of the variance of Y taking into account (or conditioning on) the explanatory variable(s) MSE=s 2

ANOVA Table Source df SS MS Regression 1 SSR/df R Error n-2 SSE/df E ________________________________ Total n-1 SSTO/df T

Expected Mean Squares MSR, MSE are random variables When H 0 : β 1 = 0 is true E(MSR) =E(MSE)

F test F*=MSR/MSE ~ F(df R, df E ) = F(1, n-2) See KNNL pgs When H 0 : β 1 =0 is false, MSR tends to be larger than MSE We reject H 0 when F is large If F*  F(1-α, df R, df E ) = F(.95, 1, n-2) In practice we use P-values

F test When H 0 : β 1 =0 is false, F has a noncentral F distribution This can be used to calculate power Recall t* = b 1 /s(b 1 ) tests H 0 : β 1 =0 It can be shown that (t*) 2 = F* (pg 71) Two approaches give same P-value

ANOVA Table Source df SS MS F P Model 1 SSM MSM MSM/MSE 0.## Error n-2 SSE MSE Total n-1 **Note: Model instead of Regression used here. More similar to SAS

Examples Tower of Pisa study (n=13 cases) proc reg data=a1; model lean=year; run; Toluca lot size study (n=25 cases) proc reg data=toluca; model hours=lotsize; run;

Pisa Output Number of Observations Read13 Number of Observations Used13 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total

Pisa Output Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept year <.0001 (30.07) 2 =904.2 (rounding error)

Toluca Output Number of Observations Read25 Number of Observations Used25 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total

Toluca Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept lotsize <.0001 Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var (10.29) 2 =105.88

General Linear Test A different view of the same problem We want to compare two models –Y i = β 0 + β 1 X i + e i (full model) –Y i = β 0 + e i (reduced model) Compare two models using the error sum of squares…better model will have “smaller” mean square error

General Linear Test Let SSE(F) = SSE for full model SSE(R) = SSE for reduced model Compare with F(1-α,df R -df F,df F )

Simple Linear Regression df R =n-1, df F =n-2, df R -df F =1 F=(SSTO-SSE)/MSE=SSR/MSE Same test as before This approach is more general

Pearson Correlation r is the usual correlation coefficient It is a number between –1 and +1 and measures the strength of the linear relationship between two variables

Pearson Correlation Notice that Test H 0 : β 1 =0 similar to H 0 : ρ=0

R 2 and r 2 Ratio of explained and total variation

R 2 and r 2 We use R 2 when the number of explanatory variables is arbitrary (simple and multiple regression) r 2 =R 2 only for simple regression R 2 is often multiplied by 100 and thereby expressed as a percent

R 2 and r 2 R 2 always increases when additional explanatory variables are added to the model Adjusted R 2 “penalizes” larger models Doesn’t necessarily get larger

Pisa Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total R-Square (SAS) = SSM/SSTO = 15804/15997 =

Toluca Output R-Square (SAS) = SSM/SSTO = / = Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total

Background Reading May find 2.10 and 2.11 interesting 2.10 provides cautionary remarks –Will discuss these as they arise 2.11 discusses bivariate Normal dist –Similarities and differences –Confidence interval for r Program topic7.sas has the code to generate the ANOVA output Read Chapter 3