Download presentation
Presentation is loading. Please wait.
Published byAllan Barnett Modified over 9 years ago
1
Topic 7: Analysis of Variance
2
Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General linear test Pearson Correlation / R 2
3
Analysis of Variance Organize results arithmetically Total sum of squares in Y is Partition this into two sources –Model (explained by regression) –Error (unexplained / residual)
4
Total Sum of Squares MST is the usual estimate of the variance of Y if there are no explanatory variables SAS uses the term Corrected Total for this source Uncorrected is ΣY i 2 The “corrected” means that we subtract of the mean before squaring
5
Model Sum of Squares df R = 1 (due to the addition of the slope) MSR = SSR/df R KNNL uses regression for what SAS calls model So SSR (KNNL) is the same as SS Model
6
Error Sum of Squares df E = n-2 (both slope and intercept) MSE = SSE/df E MSE is an estimate of the variance of Y taking into account (or conditioning on) the explanatory variable(s) MSE=s 2
7
ANOVA Table Source df SS MS Regression 1 SSR/df R Error n-2 SSE/df E ________________________________ Total n-1 SSTO/df T
8
Expected Mean Squares MSR, MSE are random variables When H 0 : β 1 = 0 is true E(MSR) =E(MSE)
9
F test F*=MSR/MSE ~ F(df R, df E ) = F(1, n-2) See KNNL pgs 69-71 When H 0 : β 1 =0 is false, MSR tends to be larger than MSE We reject H 0 when F is large If F* F(1-α, df R, df E ) = F(.95, 1, n-2) In practice we use P-values
10
F test When H 0 : β 1 =0 is false, F has a noncentral F distribution This can be used to calculate power Recall t* = b 1 /s(b 1 ) tests H 0 : β 1 =0 It can be shown that (t*) 2 = F* (pg 71) Two approaches give same P-value
11
ANOVA Table Source df SS MS F P Model 1 SSM MSM MSM/MSE 0.## Error n-2 SSE MSE Total n-1 **Note: Model instead of Regression used here. More similar to SAS
12
Examples Tower of Pisa study (n=13 cases) proc reg data=a1; model lean=year; run; Toluca lot size study (n=25 cases) proc reg data=toluca; model hours=lotsize; run;
13
Pisa Output Number of Observations Read13 Number of Observations Used13 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model115804 904.12<.0001 Error11192.2857117.48052 Corrected Total1215997
14
Pisa Output Root MSE4.18097R-Square0.9880 Dependent Mean693.69231Adj R-Sq0.9869 Coeff Var0.60271 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-61.1208825.12982-2.430.0333 year19.318680.3099130.07<.0001 (30.07) 2 =904.2 (rounding error)
15
Toluca Output Number of Observations Read25 Number of Observations Used25 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model1252378 105.88<.0001 Error23548252383.71562 Corrected Total24307203
16
Toluca Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept162.3658626.177432.380.0259 lotsize13.570200.3469710.29<.0001 Root MSE48.82331R-Square0.8215 Dependent Mean312.28000Adj R-Sq0.8138 Coeff Var15.63447 (10.29) 2 =105.88
17
General Linear Test A different view of the same problem We want to compare two models –Y i = β 0 + β 1 X i + e i (full model) –Y i = β 0 + e i (reduced model) Compare two models using the error sum of squares…better model will have “smaller” mean square error
18
General Linear Test Let SSE(F) = SSE for full model SSE(R) = SSE for reduced model Compare with F(1-α,df R -df F,df F )
19
Simple Linear Regression df R =n-1, df F =n-2, df R -df F =1 F=(SSTO-SSE)/MSE=SSR/MSE Same test as before This approach is more general
20
Pearson Correlation r is the usual correlation coefficient It is a number between –1 and +1 and measures the strength of the linear relationship between two variables
21
Pearson Correlation Notice that Test H 0 : β 1 =0 similar to H 0 : ρ=0
22
R 2 and r 2 Ratio of explained and total variation
23
R 2 and r 2 We use R 2 when the number of explanatory variables is arbitrary (simple and multiple regression) r 2 =R 2 only for simple regression R 2 is often multiplied by 100 and thereby expressed as a percent
24
R 2 and r 2 R 2 always increases when additional explanatory variables are added to the model Adjusted R 2 “penalizes” larger models Doesn’t necessarily get larger
25
Pisa Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model115804 904.12<.0001 Error11192.2857117.48052 Corrected Total1215997 R-Square 0.9880 (SAS) = SSM/SSTO = 15804/15997 = 0.9879
26
Toluca Output R-Square 0.8215 (SAS) = SSM/SSTO = 252378/307203 = 0.8215 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model1252378 105.88<.0001 Error23548252383.71562 Corrected Total24307203
27
Background Reading May find 2.10 and 2.11 interesting 2.10 provides cautionary remarks –Will discuss these as they arise 2.11 discusses bivariate Normal dist –Similarities and differences –Confidence interval for r Program topic7.sas has the code to generate the ANOVA output Read Chapter 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.