Download presentation
Presentation is loading. Please wait.
Published byAlexander Paul Modified over 9 years ago
1
Topic 22: Inference
2
Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts
3
Cell Means Model Y ij = μ i + ε ij where μ i is the theoretical mean or expected value of all observations at level i and the ε ij are iid N(0, σ 2 ) Y ij ~N(μ i, σ 2 ) independent
4
Parameters The parameters of the model are – μ 1, μ 2, …, μ r –σ 2 Estimate μ i by the mean of the observations at level i, –û i = = (ΣY ij )/(n i ) (level i sample mean) –s i 2 = Σ(Y ij - ) 2 /(n i -1) (level i sample variance) –s 2 = Σ(n i -1)s i 2 / (n T -r) (pooled variance)
5
F test F = MSM/MSE H 0 : μ 1 = μ 2 = … = μ r = μ H 1 : not all of the μ i are equal Under H 0, F ~ F(r-1, n T -r) Reject H 0 when F is large Report the P-value
6
KNNL Example KNNL p 676, p 685 Y is the number of cases of cereal sold X is the design of the cereal package –there are 4 levels for X because there are 4 different package designs i =1 to 4 levels j =1 to n i stores with design i
7
Plot the means proc means data=a1; var cases; by design; output out=a2 mean=avcases; symbol1 v=circle i=join; proc gplot data=a2; plot avcases*design; run;
8
The means
9
Confidence intervals ~ N(μ i, σ 2 /n i ) CI for μ i is ± t c t c is computed from the t(α/2,n T -r) Degrees of freedom larger than n i -1 because we’re pooling variances together into one single estimate This is advantage of ANOVA if appropriate
10
Using Proc Means proc means data=a1 mean std stderr clm maxdec=2; class design; var cases; run; This does not use the pooled estimate of variance.
11
Output N des Obs Mean Std Dev Std Error 1 5 14.60 2.30 1.03 2 5 13.40 3.65 1.63 3 4 19.50 2.65 1.32 4 5 27.20 3.96 1.77 We can use this information to calculate SSE SSE = 4(2.3)2 + 4(3.65)2 + 3(2.65)2 + 4(3.96)2 = 111.37
12
Confidence Intervals Lower 95% Upper 95% des CL for Mean CL for Mean 1 11.74 17.46 2 8.87 17.93 3 15.29 23.71 4 22.28 32.12 There is no pooling of error in computing these CIs Each interval assumes different variance estimate
13
CI’s using PROC GLM proc glm data=a1; class design; model cases=design; means design/t clm; run;
14
Output The GLM Procedure t Confidence Intervals for cases Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square 10.54667 Critical Value of t 2.1314
15
CI Output 95% Confidence des N Mean Limits 4 5 27.200 24.104 30.296 3 4 19.500 16.039 22.961 1 5 14.600 11.504 17.696 2 5 13.400 10.304 16.496 These CI’s are often narrower because more degrees of freedom (common variance)
16
Multiplicity Problem We have constructed 4 (in general, r) 95% confidence intervals The overall confidence level (all intervals contain its mean) is less that 95% Many different kinds of adjustments have been proposed We have previously discussed the Bonferroni (i.e., use α/r)
17
BON option proc glm data=a1; class design; model cases=design; means design/bon clm; run;
18
Output Bonferroni t Confidence Intervals for cases Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square 10.54667 Critical Value of t 2.83663 Note the bigger t*
19
Bonferroni CIs Simultaneous 95% des N Mean Confidence Limits 4 5 27.200 23.080 31.320 3 4 19.500 14.894 24.106 1 5 14.600 10.480 18.720 2 5 13.400 9.280 17.520
20
Using LSMEANS LSMEANS statement provides proper SEs for each individual treatment estimate AND confidence intervals as well lsmeans design / cl; However, statement does not provide ability to adjust these single mean CIs for multiplicity
21
Hypothesis tests on individual means Not usually done Use PROC MEANS options T and PROBT for a test of the null hypothesis H 0 : μ i = 0 To test H 0 : μ i = c, where c is an arbitrary constant, first use a DATA STEP to subtract c from all observations and then run PROC MEANS options T and PROBT Can also use GLM’s MEAN statement with CLM option (more df)
22
Differences in means ~ N(μ i -μ k, σ 2 /n i + σ 2 /n k ) CI for μ i -μ k is ± t c s( ) where s( ) =s
23
Determining t c We deal with the multiplicity problem by adjusting t c Many different choices are available –Change α level (e.g., bonferonni) –Use different distribution
24
LSD Least Significant Difference (LSD) Simply ignores multiplicity issue Uses t(n T -r) to determine critical value Called T or LSD in SAS
25
Bonferroni Use the error budget idea There are r(r-1)/2 comparisons among r means So, replace α by α/(r(r-1)/2) and use t(n T -r) to determine critical value Called BON in SAS
26
Tukey Based on the studentized range distribution (maximum minus minimum divided by the standard deviation) t c = q c / where q c is detemined from SRD Details are in KNNL Section 17.5 (p 746) Called TUKEY in SAS
27
Scheffe Based on the F distribution t c = Takes care of multiplicity for all linear combinations of means Protects against data snooping Called SCHEFFE in SAS See KNNL Section 17.6 (page 753)
28
Multiple Comparisons LSD is too liberal (get too many Type I errors) Scheffe is too conservative (very low power) Bonferroni is ok for small r Tukey (HSD) is recommended
29
Example proc glm data=a1; class design; model cases=design; means design/ lsd tukey bon scheffe; run;
30
LSD t Tests (LSD) for cases NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square 10.54667 Critical Value of t 2.13145
31
LSD Intervals Difference design Between Simultaneous 95% Comparison Means Confidence Limits 4 - 3 7.700 3.057 12.343 *** 4 - 1 12.600 8.222 16.978 *** 4 - 2 13.800 9.422 18.178 *** 3 - 4 -7.700 -12.343 -3.057 *** 3 - 1 4.900 0.257 9.543 *** 3 - 2 6.100 1.457 10.743 *** 1 - 4 -12.600 -16.978 -8.222 *** 1 - 3 -4.900 -9.543 -0.257 *** 1 - 2 1.200 -3.178 5.578 2 - 4 -13.800 -18.178 -9.422 *** 2 - 3 -6.100 -10.743 -1.457 *** 2 - 1 -1.200 -5.578 3.178
32
Tukey Tukey's Studentized Range (HSD) Test for cases NOTE: This test controls the Type I experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square 10.54667 Critical Value of Studentized Range 4.07597 4.07/sqrt(2)= 2.88
33
Tukey Intervals Difference design Between Simultaneous 95% Comparison Means Confidence Limits 4 - 3 7.700 1.421 13.979 *** 4 - 1 12.600 6.680 18.520 *** 4 - 2 13.800 7.880 19.720 *** 3 - 4 -7.700 -13.979 -1.421 *** 3 - 1 4.900 -1.379 11.179 3 - 2 6.100 -0.179 12.379 1 - 4 -12.600 -18.520 -6.680 *** 1 - 3 -4.900 -11.179 1.379 1 - 2 1.200 -4.720 7.120 2 - 4 -13.800 -19.720 -7.880 *** 2 - 3 -6.100 -12.379 0.179 2 - 1 -1.200 -7.120 4.720
34
Scheffe Scheffe’s Test for cases NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than Tukey's for all pairwise comparisons. Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square 10.54667 Critical Value of F 3.28738 Sqrt(3*3.28738)= 3.14
35
Scheffe Intervals Difference design Between Simultaneous 95% Comparison Means Confidence Limits 4 - 3 7.700 0.859 14.541 *** 4 - 1 12.600 6.150 19.050 *** 4 - 2 13.800 7.350 20.250 *** 3 - 4 -7.700 -14.541 -0.859 *** 3 - 1 4.900 -1.941 11.741 3 - 2 6.100 -0.741 12.941 1 - 4 -12.600 -19.050 -6.150 *** 1 - 3 -4.900 -11.741 1.941 1 - 2 1.200 -5.250 7.650 2 - 4 -13.800 -20.250 -7.350 *** 2 - 3 -6.100 -12.941 0.741 2 - 1 -1.200 -7.650 5.250
36
Output (LINES option) Mean N design A 27.200 5 4 B 19.500 4 3 B B 14.600 5 1 B B 13.400 5 2 Scheffe
37
Using LSMEANS LSMEANS statement provides proper SEs AND confidence intervals based on multiplicity adjustment lsmeans design / cl adjust=bon; However, statement does not have lines option Can use PROC GLIMMIX to do this….approach shown in sasfile
38
Linear Combinations of Means These combinations should come from research questions, not from an examination of the data L = Σ i c i μ i = Σ i c i ~ N(L, Var( ) ) Var( ) = Σ i c i 2 Var( ) Estimated by s 2 Σ i (c i 2 /n i )
39
Contrasts Special case of a linear combination Requires Σ i c i = 0 Example 1: μ 1 – μ 2 Example 2: μ 1 – (μ 2 + μ 3 )/2 Example 3: (μ 1 + μ 2 )/2 - (μ 3 + μ 4 )/2
40
Contrast and Estimate proc glm data=a1; class design; model cases=design; contrast '1&2 v 3&4' design.5.5 -.5 -.5; estimate '1&2 v 3&4' design.5.5 -.5 -.5; run;
41
Output Contr DF SS MS F P 1&2 v 3&4 1 411 411 39.01 <.0001 St Par Est Err t P 1&2 v 3& -9.35 1.49 -6.25 <.0001 Contrast performs F test. Estimate performs t-test and gives estimate.
42
Multiple contrasts We can simultaneously test a collection of contrasts (1 df each contrast) Example 1, H 0 : μ 2 = μ 3 = μ 4 The F statistic for this test will have an F(2, n T -r) distribution Example 2, H 0 : μ 1 = (μ 2 + μ 3 + μ 4 )/3 The F statistic for this test will have an F(1, n T -r) distribution
43
Example proc glm data=a1; class design; model cases=design; contrast '1 v 2&3&4' design 1 -.3333 -.3333 -.3333; estimate '1 v 2&3&4' design 3 -1 -1 -1 /divisor=3; contrast '2 v 3 v 4' design 0 1 -1 0, design 0 0 1 -1;
44
Output Con DF F P 1 v 2&3&4 1 10.29 0.0059 2 v 3 v 4 2 22.66 <.0001 St Par Est Err t P 1 v 2&3&4 -5.43 1.69 -3.21 0.0059
45
Last slide We did large part of Chapter 17 We used programs topic22.sas to generate the output for today
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.