Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.

Topic 7: Analysis of Variance

Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General linear test Pearson Correlation / R 2

Analysis of Variance Organize results arithmetically Total sum of squares in Y is Partition this into two sources –Model (explained by regression) –Error (unexplained / residual)

Total Sum of Squares MST is the usual estimate of the variance of Y if there are no explanatory variables SAS uses the term Corrected Total for this source Uncorrected is ΣY i 2 The “corrected” means that we subtract of the mean before squaring

Model Sum of Squares df R = 1 (due to the addition of the slope) MSR = SSR/df R KNNL uses regression for what SAS calls model So SSR (KNNL) is the same as SS Model

Error Sum of Squares df E = n-2 (both slope and intercept) MSE = SSE/df E MSE is an estimate of the variance of Y taking into account (or conditioning on) the explanatory variable(s) MSE=s 2

ANOVA Table Source df SS MS Regression 1 SSR/df R Error n-2 SSE/df E ________________________________ Total n-1 SSTO/df T

Expected Mean Squares MSR, MSE are random variables When H 0 : β 1 = 0 is true E(MSR) =E(MSE)

F test F*=MSR/MSE ~ F(df R, df E ) = F(1, n-2) See KNNL pgs 69-71 When H 0 : β 1 =0 is false, MSR tends to be larger than MSE We reject H 0 when F is large If F*  F(1-α, df R, df E ) = F(.95, 1, n-2) In practice we use P-values

F test When H 0 : β 1 =0 is false, F has a noncentral F distribution This can be used to calculate power Recall t* = b 1 /s(b 1 ) tests H 0 : β 1 =0 It can be shown that (t*) 2 = F* (pg 71) Two approaches give same P-value

ANOVA Table Source df SS MS F P Model 1 SSM MSM MSM/MSE 0.## Error n-2 SSE MSE Total n-1 **Note: Model instead of Regression used here. More similar to SAS

Examples Tower of Pisa study (n=13 cases) proc reg data=a1; model lean=year; run; Toluca lot size study (n=25 cases) proc reg data=toluca; model hours=lotsize; run;

Pisa Output Number of Observations Read13 Number of Observations Used13 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model115804 904.12<.0001 Error11192.2857117.48052 Corrected Total1215997

Pisa Output Root MSE4.18097R-Square0.9880 Dependent Mean693.69231Adj R-Sq0.9869 Coeff Var0.60271 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-61.1208825.12982-2.430.0333 year19.318680.3099130.07<.0001 (30.07) 2 =904.2 (rounding error)

Toluca Output Number of Observations Read25 Number of Observations Used25 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model1252378 105.88<.0001 Error23548252383.71562 Corrected Total24307203

Toluca Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept162.3658626.177432.380.0259 lotsize13.570200.3469710.29<.0001 Root MSE48.82331R-Square0.8215 Dependent Mean312.28000Adj R-Sq0.8138 Coeff Var15.63447 (10.29) 2 =105.88

General Linear Test A different view of the same problem We want to compare two models –Y i = β 0 + β 1 X i + e i (full model) –Y i = β 0 + e i (reduced model) Compare two models using the error sum of squares…better model will have “smaller” mean square error

General Linear Test Let SSE(F) = SSE for full model SSE(R) = SSE for reduced model Compare with F(1-α,df R -df F,df F )

Simple Linear Regression df R =n-1, df F =n-2, df R -df F =1 F=(SSTO-SSE)/MSE=SSR/MSE Same test as before This approach is more general

Pearson Correlation r is the usual correlation coefficient It is a number between –1 and +1 and measures the strength of the linear relationship between two variables

Pearson Correlation Notice that Test H 0 : β 1 =0 similar to H 0 : ρ=0

R 2 and r 2 Ratio of explained and total variation

R 2 and r 2 We use R 2 when the number of explanatory variables is arbitrary (simple and multiple regression) r 2 =R 2 only for simple regression R 2 is often multiplied by 100 and thereby expressed as a percent

R 2 and r 2 R 2 always increases when additional explanatory variables are added to the model Adjusted R 2 “penalizes” larger models Doesn’t necessarily get larger

Pisa Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model115804 904.12<.0001 Error11192.2857117.48052 Corrected Total1215997 R-Square 0.9880 (SAS) = SSM/SSTO = 15804/15997 = 0.9879

Toluca Output R-Square 0.8215 (SAS) = SSM/SSTO = 252378/307203 = 0.8215 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model1252378 105.88<.0001 Error23548252383.71562 Corrected Total24307203

Background Reading May find 2.10 and 2.11 interesting 2.10 provides cautionary remarks –Will discuss these as they arise 2.11 discusses bivariate Normal dist –Similarities and differences –Confidence interval for r Program topic7.sas has the code to generate the ANOVA output Read Chapter 3

Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.

Similar presentations

Presentation on theme: "Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.

Similar presentations

Presentation on theme: "Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General."— Presentation transcript:

Similar presentations

About project

Feedback