Download presentation
Presentation is loading. Please wait.
Published byEugenia Gibson Modified over 9 years ago
1
6.1 - One Sample 6.1 - One Sample Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples 6.3 - Multiple Samples Means, Variances, Proportions μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k CHAPTER 6 Statistical Inference & Hypothesis Testing CHAPTER 6 Statistical Inference & Hypothesis Testing
2
6.1 - One Sample 6.1 - One Sample Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples 6.3 - Multiple Samples Means, Variances, Proportions μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k CHAPTER 6 Statistical Inference & Hypothesis Testing CHAPTER 6 Statistical Inference & Hypothesis Testing
3
s 2 = SS/df Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. SS 1 SS 2
4
p-value = SS Err = 6480 Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. s 2 = SS/df df Err = 6 Standard Error > 2 * (1 - pt(3.5, 6)) [1] 0.01282634 Reject H 0 at α =.05 stat signif, Hosp > Clinic
5
R code: > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > t.test(y1, y2, var.equal = T) Two Sample t-test data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 sample estimates: mean of x mean of y 630 546 p-value < α =.05 Reject H 0 at this level. p-value < α =.05 Reject H 0 at this level. The samples provide evidence that the difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84). Formal Conclusion Interpretation
6
“Total Variability” =“Variability between groups”+ “Variability within groups” 11 22 kk = = Null Hypothesis? N u l l H y p o t h e s i s ? = H 0 : H A : “At least one ‘treatment mean’ μ i is significantly different from the others. Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~
7
(if equivariance holds): Point estimates ANOVA F-test Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” 5 (630) 3 (546) The grand mean is a weighted average of the group means, using the sample sizes as the weights.
8
“Total Variability” = Alternate method ~ “Variability between groups”+ “Variability within groups” 11 22 kk = = = H0:H0: H A : “At least one ‘treatment mean’ μ i is significantly different from the others. Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…
9
Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test How far is the “total” sample from the grand mean?
10
Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 df Tot df Tot = (5+3) –1 7 = 7
11
“Total Variability” = Alternate method ~ “Variability between groups” + “Variability within groups” 11 22 kk = = = H0:H0: How can we measure this? Imagine zero variability within groups… Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…
12
“Total Variability” = Alternate method ~ “Variability between groups”+ “Variability within groups” 11 22 kk == H0:H0: How can we measure this? Imagine zero variability within groups… = Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…
13
Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 { 630, 630, 630, 630, 630 } Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 { 546, 546, 546 } SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 “The Clonemaster”
14
“Total Variability” = Alternate method ~ “Variability between groups” + “Variability within groups” 11 22 kk = = = H0:H0: Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…
15
(if equivariance holds): Point estimates Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 How far is each sample from its own group mean?
16
Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 SS Err SS Err = BUT…
17
s 2 = SS/df Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. SS 1 SS 2 RECALL… R E C A L L …
18
SS Err = 6480 Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. s 2 = SS/df df Err = 6 RECALL…
19
Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 SS Err SS Err =
20
Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 SS Err SS Err = df Err df Err = 6 = 6 (5+3) –2 6480 = 6480 SS Tot = SS Trt + SS Err df Tot = df Trt + df Err
21
SS Tot = SS Trt + SS Err df Tot = df Trt + df Err SourcedfSSMSF-ratiop-value Treatment 113230 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt
22
SourcedfSSMSF-ratiop-value Treatment 113230 12.25???? Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt
23
Test Statistic Sampling Distribution =?
24
SourcedfSSMSF-ratiop-value Treatment 113230 12.25 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt | 12.25 p-value
25
5.99
26
SourcedfSSMSF-ratiop-value Treatment 113230 12.25 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt | 12.25 p-value | | 5.99
27
SourcedfSSMSF-ratiop-value Treatment 113230 12.25 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also p <.05 SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt
28
SourcedfSSMSF-ratiop-value Treatment 113230 12.25.01282634 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt 1–pf(12.25, 1, 6)
29
SourcedfSSMSF-ratiop-value Treatment 113230 12.25.01282634 Error 6 64801080 Total 7 – ANOVA Table 1–pf(12.25, 1, 6) Err SS Tot = SS Trt + SS Err df Tot = df Trt + df Err
30
R code: # ANOVA FOR UNBALANCED DESIGN > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > Data = data.frame( + Y = c(y1, y2), + X = factor(rep(c("y1", "y2"), times = c(length(y1), length(y2)))) + ) > > var.test(Y ~ X, data = Data) # EQUIVARIANCE? F test to compare two variances data: Y by X F = 0.4741, num df = 4, denom df = 2, p-value = 0.4738 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.01208057 5.04920249 sample estimates: ratio of variances 0.4741431
31
R code: # ANOVA FOR UNBALANCED DESIGN > out = aov(Y ~ X, data = Data) > anova(out) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) X 1 13230 13230 12.25 0.01283 * Residuals 6 6480 1080 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Note: Vis-à-vis T-test vs. F-test, p-value is the same using either method (.01283), since the sample is unchanged! The square of the T df -score (3.5) is equal to the F 1, df -score (12.25). (Recall that the square of the Z-score is equal to the -score.)
34
Suppose this ANOVA “overall F-test” indicates that a significant difference exists between one (or more) of the treatment means, at =.05. How can we find out which one(s)?
35
Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… PROBLEM???
36
Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… PROBLEM??? =.05 SPURIOUS SIGNIFICANCE!!! * =.05/10
37
Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… Make each comparison at level * = / 10. PROBLEM???
38
Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… Make each comparison at level * = / 10.
39
11 22 kk = = = H0:H0: Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~
40
11 22 kk = = = H0:H0: Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ Equivariance can be tested via very similar “two variances” F-test in 6.2.2 (but this is very sensitive to normality assumption), or others. If violated, can extend Welch Test for two means.
41
11 22 kk = = = H0:H0: Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ Normality can be tested via usual methods. If violated, use nonparametric Kruskal-Wallis Test.
42
11 22 kk = = = H0:H0: Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ Extensions of ANOVA for data in matched “blocks” designs, repeated measures, multiple factor levels within groups, etc.
43
11 22 kk = = = H0:H0: Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ How to identify significant group(s)? Pairwise testing, with correction (e.g., Bonferroni) for spurious significance. Example: k = 5 groups result in 10 such tests, so let each α* = α / 10.
45
“spurious significance” “ s p u r i o u s s i g n i f i c a n c e ”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.