Presentation is loading. Please wait.

Presentation is loading. Please wait.

6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.

Similar presentations


Presentation on theme: "6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2."— Presentation transcript:

1 6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples 6.3 - Multiple Samples  Means, Variances, Proportions μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k CHAPTER 6 Statistical Inference & Hypothesis Testing CHAPTER 6 Statistical Inference & Hypothesis Testing

2 6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples 6.3 - Multiple Samples  Means, Variances, Proportions μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k CHAPTER 6 Statistical Inference & Hypothesis Testing CHAPTER 6 Statistical Inference & Hypothesis Testing

3 s 2 = SS/df Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. SS 1 SS 2

4 p-value = SS Err = 6480 Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. s 2 = SS/df df Err = 6 Standard Error > 2 * (1 - pt(3.5, 6)) [1] 0.01282634 Reject H 0 at α =.05 stat signif, Hosp > Clinic

5 R code: > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > t.test(y1, y2, var.equal = T) Two Sample t-test data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 sample estimates: mean of x mean of y 630 546 p-value < α =.05 Reject H 0 at this level. p-value < α =.05 Reject H 0 at this level. The samples provide evidence that the difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84). Formal Conclusion Interpretation

6 “Total Variability” =“Variability between groups”+ “Variability within groups” 11 22 kk = = Null Hypothesis? N u l l H y p o t h e s i s ? = H 0 : H A : “At least one ‘treatment mean’ μ i is significantly different from the others.  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~

7 (if equivariance holds): Point estimates ANOVA F-test Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” 5 (630) 3 (546) The grand mean is a weighted average of the group means, using the sample sizes as the weights.

8 “Total Variability” = Alternate method ~ “Variability between groups”+ “Variability within groups” 11 22 kk = = = H0:H0: H A : “At least one ‘treatment mean’ μ i is significantly different from the others.  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”…

9 Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test How far is the “total” sample from the grand mean?

10 Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 df Tot df Tot = (5+3) –1 7 = 7

11 “Total Variability” = Alternate method ~ “Variability between groups” + “Variability within groups” 11 22 kk = = = H0:H0: How can we measure this? Imagine zero variability within groups…  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”…

12 “Total Variability” = Alternate method ~ “Variability between groups”+ “Variability within groups” 11 22 kk == H0:H0: How can we measure this? Imagine zero variability within groups… =  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”…

13 Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 { 630, 630, 630, 630, 630 } Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 { 546, 546, 546 } SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 “The Clonemaster”

14 “Total Variability” = Alternate method ~ “Variability between groups” + “Variability within groups” 11 22 kk = = = H0:H0:  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”…

15 (if equivariance holds): Point estimates Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 How far is each sample from its own group mean?

16 Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 SS Err SS Err = BUT…

17 s 2 = SS/df Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. SS 1 SS 2 RECALL… R E C A L L …

18 SS Err = 6480 Example: Y = “$ Cost of a certain medical service” Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Analysis via T-test (if equivariance holds): Point estimates NOTE: > 0 Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Group Variances” Pooled Variance The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. s 2 = SS/df df Err = 6 RECALL…

19 Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 SS Err SS Err =

20 Data: Sample 1 = {667, 653, 614, 612, 604}; n 1 = 5 Sample 2 = { 593, 525, 520}; n 2 = 3 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Clinic: Y 2 ~ N( μ 2, σ 2 )Hospital: Y 1 ~ N( μ 1, σ 1 ) Null Hypothesis H 0 : μ 1 = μ 2, i.e., μ 1 – μ 2 = 0 (“No difference exists.") 2-sided test at significance level α =.05 “Group Means” “Grand Mean” (if equivariance holds): Point estimates ANOVA F-test SS Tot SS Tot = 19710 = 19710 SS Trt SS Trt = 13230 = 13230 df Tot df Tot = (5+3) –1 7 = 7 df Trt df Trt = (2) –1 1 = 1 SS Err SS Err = df Err df Err = 6 = 6 (5+3) –2 6480 = 6480 SS Tot = SS Trt + SS Err df Tot = df Trt + df Err

21 SS Tot = SS Trt + SS Err df Tot = df Trt + df Err SourcedfSSMSF-ratiop-value Treatment 113230 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt

22 SourcedfSSMSF-ratiop-value Treatment 113230 12.25???? Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt

23 Test Statistic Sampling Distribution =?

24 SourcedfSSMSF-ratiop-value Treatment 113230 12.25 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt | 12.25 p-value

25 5.99

26 SourcedfSSMSF-ratiop-value Treatment 113230 12.25 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt | 12.25 p-value | | 5.99

27 SourcedfSSMSF-ratiop-value Treatment 113230 12.25 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also p <.05 SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt

28 SourcedfSSMSF-ratiop-value Treatment 113230 12.25.01282634 Error 6 64801080 Total 719710– ANOVA Table Note: This is also Note: This is also SS Tot = SS Trt + SS Err df Tot = df Trt + df Err Tot Err Trt 1–pf(12.25, 1, 6)

29 SourcedfSSMSF-ratiop-value Treatment 113230 12.25.01282634 Error 6 64801080 Total 7 – ANOVA Table 1–pf(12.25, 1, 6) Err SS Tot = SS Trt + SS Err df Tot = df Trt + df Err

30 R code: # ANOVA FOR UNBALANCED DESIGN > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > Data = data.frame( + Y = c(y1, y2), + X = factor(rep(c("y1", "y2"), times = c(length(y1), length(y2)))) + ) > > var.test(Y ~ X, data = Data) # EQUIVARIANCE? F test to compare two variances data: Y by X F = 0.4741, num df = 4, denom df = 2, p-value = 0.4738 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.01208057 5.04920249 sample estimates: ratio of variances 0.4741431

31 R code: # ANOVA FOR UNBALANCED DESIGN > out = aov(Y ~ X, data = Data) > anova(out) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) X 1 13230 13230 12.25 0.01283 * Residuals 6 6480 1080 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Note: Vis-à-vis T-test vs. F-test, p-value is the same using either method (.01283), since the sample is unchanged! The square of the T df -score (3.5) is equal to the F 1, df -score (12.25). (Recall that the square of the Z-score is equal to the -score.)

32

33

34 Suppose this ANOVA “overall F-test” indicates that a significant difference exists between one (or more) of the treatment means, at  =.05. How can we find out which one(s)?

35 Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… PROBLEM???

36 Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… PROBLEM???  =.05 SPURIOUS SIGNIFICANCE!!!  * =.05/10

37 Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… Make each comparison at level  * =  / 10. PROBLEM???

38 Idea: Test all possible pairwise comparisons, each via a two-sample t-test. Example : Suppose there are k = 5 treatment groups. There are such comparisons. 11 22 kk = = = H0:H0: …etc… Make each comparison at level  * =  / 10.

39 11 22 kk = = = H0:H0:  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~

40 11 22 kk = = = H0:H0:  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ Equivariance can be tested via very similar “two variances” F-test in 6.2.2 (but this is very sensitive to normality assumption), or others. If violated, can extend Welch Test for two means.

41 11 22 kk = = = H0:H0:  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ Normality can be tested via usual methods. If violated, use nonparametric Kruskal-Wallis Test.

42 11 22 kk = = = H0:H0:  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ Extensions of ANOVA for data in matched “blocks” designs, repeated measures, multiple factor levels within groups, etc.

43 11 22 kk = = = H0:H0:  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Alternate method ~ How to identify significant group(s)? Pairwise testing, with correction (e.g., Bonferroni) for spurious significance. Example: k = 5 groups result in 10 such tests, so let each α* = α / 10.

44

45 “spurious significance” “ s p u r i o u s s i g n i f i c a n c e ”


Download ppt "6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2."

Similar presentations


Ads by Google