Download presentation
Presentation is loading. Please wait.
Published byHarold Harvey Modified over 8 years ago
1
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations. Develop a statistical test for the difference in means between two independent normal populations, assuming equal population variances. Develop a statistical test for the difference in means between two independent normal populations, assuming unequal population variances. Develop a nonparametric statistical test for the difference in means between two independent populations, assuming equal population variances.
2
Two-Sample-Means-2 Take soil samples at random locations within the site and at random locations at areas outside the site. Assume values at areas outside the site are unaffected by site activities that lead to contamination. Need to determine if concentration of contaminant at an old industrial site is greater than background levels from areas surrounding the site. Situation
3
Two-Sample-Means-3 1.Background level of contaminant is greater than the no-detect level (-5.0) [on a natural log scale]. 2.Site level of contaminant is greater than the no-detect level (-5.0). 3.Site level is different from Background level. 11 22 -5.0 22 Is the situation this or this? Hypotheses of Interest
4
Two-Sample-Means-4 Background level of contaminant is greater than the no-detect level (-5.0). Site level of contaminant is greater than the no-detect level (-5.0). One sided t-tests: R.R. n t.05,n 61.943 71.895 81.860 91.833 T. S. i = 1 => Background areas (B) i = 2 => Contamination site (S) Critical Values One- Sample Hypothesis Tests H 0 : i -5.0 H A : i > -5.0
5
Two-Sample-Means-5 Y -2.96 -1.09 -3.13 -2.12 -2.59 -4.31 -1.20 H 0 : B 0 = -5.0 H A : B > 0 = -5.0 T.S. R.R. Pr(Type I error) = = 0.05 Reject H 0 if t > t ,n-1 T 0.05,6 =1.943 Conclusion: Since 5.90 > 1.943 we reject H 0 and conclude that the true average background level is above -5.0. Same test could be performed for contaminated site data. DATA and One-Sided T-tests
6
Two-Sample-Means-6 H 0 : Site level is not different from Background level. ( B = S ) H A : Site level is different from Background level. ( B S ) This requires comparing sample means from two “independent” samples, one from each population. T. S. the standard error of the difference of the two means. Obvious test statistic. Comparison Hypothesis
7
Two-Sample-Means-7 If the true variances of the two populations are known, we use the property of independent random variables that says: Var(X-Y) = Var(X) + Var(Y). From sampling Dist of T. S. Standard Error of the Difference of Two Means
8
Two-Sample-Means-8 True standard error of the difference of two means. Estimate of the standard error of the mean differences. Estimate of the common (Pooled) standard deviation. Assume the two populations have the same, or nearly the same true variance, p.
9
Two-Sample-Means-9 Assume confidence level of (1- ) 100 % Confidence Interval for difference
10
Two-Sample-Means-10 Test Statistic: Pr(Type I Error) = Pooled Variances T-test Statistical test for difference of means
11
Two-Sample-Means-11 Y B Y S -2.96-3.81 -1.09-5.83 -3.13-5.70 -2.12-4.11 -2.59-3.83 -4.31-5.01 -1.20-5.49 H 0 : B - S = 0 H A : B - s 0 H A : Average site level significantly different from background. BackgroundSite CONT
12
Two-Sample-Means-12 T.S. R.R. Reject if: Conclusion:Since 4.304 > 2.179 we reject H 0 and conclude that site concentration levels are significantly different from background. Pr(Type I Error) = = 0.05
13
Two-Sample-Means-13 What if the two populations do not have the same variance? New estimate of standard error of difference. Test and CI is no longer exact - uses Satterthwaite’s approximate df value (df’). T.S.: Round df’ down to the nearest integer. Separate Variances CI and T-test C.I.:
14
Two-Sample-Means-14 Redo Test For site contamination example, assume 1 2 and redo test. T.S. R.R. Reject if: Conclusion: Since 4.304 > 2.201 we reject H 0 and conclude that site concentration levels are significantly different from background.
15
Two-Sample-Means-15 One-sided Test: Two-sided Test: = Pr(Type I Error) = Pr(Type II Error) Sample Size Determination (equal variances) (1- )100% CI for μ 1 – μ 2 :
16
Two-Sample-Means-16 In our sample, a of 2.34 was observed. What if we had wanted to be sure that a of say 1 unit would be declared significant with: 0.05 = = Pr(Type I Error) 0.10 = = Pr(Type II Error) Assume a common population variance of 2 = 1. Two-sided test: One-sided test: Example of Sample Size Determination n=n 1 =n 2
17
Two-Sample-Means-17 Summary These two-sample inferences require the assumption of independent, normal population distributions. If we have reason to believe the two population variances are equal, then we should use the pooled variances method. This results in more powerful inferences. We need not worry about the normality assumption when both sample sizes are large, all results are still approximately correct. (CLT.) What to do when independence does not hold? Advanced! Partial solution next lecture (paired samples). What to do when we have small samples and we don’t believe the data are normal? Next slide…
18
Two-Sample-Means-18 Population variances assumed to be equal. Measurements (observations) are assumed to be independent from continuous distributions. Interest is whether the center of the two population distributions are the same or not. Also known as the Mann-Whitney U Test. Sign Test: the nonparametric equivalent of the one sample t-test. A class of nonparametric tests. These: The Wilcoxon Rank Sum Test Do not require data to have normal distributions. Seek to make inferences about the median, a more appropriate representation of the center of the population for highly skewed and/or very heavy-tailed distributions. In the Wilcoxon Rank Sum Test:
19
Two-Sample-Means-19 H 0 : Populations are identical If H 0 is true, when we put the data from the two samples together and sort them from lowest to highest, i.e. we rank them (lowest obs gets rank=1, 2 nd rank=2, etc., tied obs get average of ranks). The ranks of the observations from the two samples should be fairly well intermingled. Thus, the sum of the ranks from population 1 observations should be approximately equal to the sum of the ranks of population 2 observations. H A : Population 1 is shifted to the right of population 2. If H A is true, if we put the data from the two samples together then sort them lowest to highest, the sum of the ranks of population 1 observations should be greater than the sum of ranks of observations from population 2. Idea behind the Wilcoxon Test
20
Two-Sample-Means-20 H 0 : Populations are identical H A :1. Population 1 is shifted to right of population 2. 2. Population 1 is shifted to the left of population 2. 3. Populations 1 and 2 have different location parameters. 1.Reject H 0 if T > T U 2.Reject H 0 if T < T L 3.Reject H 0 if T>T U or T<T L R.R. Let T denote the sum of the ranks of population 1 observations. T.S. If n 1 10 and n 2 10 use T as the test statistic and Table 6 in Ott & Longnecker for critical values. Situation #1 Wilcoxon Test Statistic
21
Two-Sample-Means-21 If n 1 >10, n 2 >10 we use a normal approximation to the distribution of the sum of the ranks. T.S. and t j denotes the number of tied ranks in the j th group. 1.Reject H 0 if z > z 2.Reject H 0 if z < -z 3.Reject H 0 if |z|>z /2 = Pr(Type I Error) R.R. Let T denote the sum of the ranks of population 1 observations. Situation #2
22
Two-Sample-Means-22 GroupValueRankPop 1 Ranks 1-1.091414 1-1.201313 1-2.121212 1-2.591111 1-2.961010 1-3.1399 2-3.818 2-3.837 2-4.106 1-4.3155 2-5.014 2-5.412.5 2-5.831 SUM74 = T R.R. :Reject H 0 if T>T U or T<T L n 1 10 and n 2 10 Two sided alternative hypothesis TL = 39 TU = 66 Table 6 n 1 = n 2 = 7 Conclusion: Reject H 0 Situation #1:
23
Two-Sample-Means-23 Wilcoxon Critical Values Table 6
24
Two-Sample-Means-24 Paired Data Situation (§6.4-6.5) In this set of slides we will: Develop confidence intervals and test for the difference between two means that have been measured on the same or highly related experimental units. The underlying population of differences is assumed to be normally distributed. A nonparametric alternative that does not rely on normality will be discussed (§6.5).
25
Two-Sample-Means-25 Two analysts, supposedly of identical abilities, each measure the parts per million of a certain type of chemical impurity in drinking water. It is claimed that analyst 1 tends to give higher readings than analyst 2. To test this theory, each of six water samples is divided and then analyzed by both analysts separately. The data are as follows: Data are paired hence observations are not independent. Observations in the same row are more likely to be close to each other than are observations between rows. Situation
26
Two-Sample-Means-26 Because of the dependence within rows we don’t use the difference of two means but instead use the mean of the individual differences. written as Let y ij represent the i th observation for the j th sample, i=1,…,n, j=1,2. Compute d i = y i1 - y i2 the difference in responses for the i th observation, and then proceed as in the one sample t-test situation. the sample mean of the differences. the sample standard deviation of the differences. Paired Data Considerations
27
Two-Sample-Means-27 If the two populations were independent, the variance of the difference would be computed using the probability rule: Var( X - Y ) = Var(X) + Var(Y) But here the two populations are dependent and the above rule does not hold. Hence In above Example: i.e we don’t use a pooled variance estimator. We use the variance of the differences. Importance of Independence/Dependence on Variance of Difference Very different!
28
Two-Sample-Means-28 H 0 : d = D 0 H a :1. d > D 0 2. d < D 0 3. d D 0 Test Statistic: Rejection Region: 1.Reject H 0 if t > t ,n-1 2.Reject H 0 if t < -t ,n-1 3.Reject H 0 if |t| > t /2,n-1 -level confidence interval: For the Example, t = 1.853 and t 0.05,5 = 2.015 hence we do not reject H 0: d =0 in testing situation 1, and conclude that there are no significant differences. 95% CI: 1.4 1.94 or ( -.54, 3.34 ) The paired t-test
29
Two-Sample-Means-29 T+ = sum of the positive ranks (T+ = 0 if no positive ranks). T- = sum of the negative ranks (T- = 0 if no negative ranks). T = smaller of T+, T- ignoring their signs. A nonparametric alternative to the paired t-test when the population distribution of differences are not normal. (Requires symmetry about the population median.) Test Construction: 1.Compute the differences in paired observations. 2.Rank the resulting obs smallest to largest. 3.All obs less than D 0 are prefaced with a negative sign. 4.All obs greater than D 0 are prefaced with a positive sign. 5.Discard any obs that are equal to D 0. 6.Let n be the resulting number of obs. Wilcoxon Signed-Rank Test for Paired Data (§6.5) Small values of |T-| support differences>D 0, small values of T+ support differences<D 0, small values of T support differences D 0.
30
Two-Sample-Means-30 Test Statistics H 0 : The distribution of differences is symmetrical around D 0 (usually 0). H A :1.The differences tend to be larger than D 0. 2.The differences tend to be smaller than D 0. 3.Either 1 or 2 above are true (two-sided alternative). T.S. 1.T=|T- | 2.T=T+ 3.T=smaller(|T- |,T+) Critical value (from Table 7) 1.Reject if T T ,n 2.Reject if T T ,n 3.Reject if T T /2,n R.R. (n 50) Situation #1 T.S. (n>50) 1.Reject H 0 if z < -z 2.Reject H 0 if z < - z 3.Reject H 0 if z < -z /2 R.R. Situation #2 Critical value (from Table 1)
31
Two-Sample-Means-31 DifferencesSigned Ranks -1.3-1 -0.1-2 1.53.5 3.35 3.46 For H A : Differences tend to be larger than 0, the test statistic is |T-| = 3 and the critical value is T 0.05,6 = 2. Hence we do not reject H 0 and conclude that there is no difference. Problem Illustration of calculations for large n (not the case here):
32
Two-Sample-Means-32 Concluding Comments The paired measurements (samples) case can be easily extended to more than two measurements. When repeated measurements are taken on an individual, we are in the same situation as with two paired samples, that is, the repeated measurements on an individual are expected to be more correlated than measurements among individuals. Solutions to the multiple repeated measurements case cannot follow the simple solution for two dependent samples. The final solution involves specifying not only what we expect to happen in the means of the sampling “times” but we have to specify the structure of the correlations between sampling times. It is advantageous to design a paired data experiment rather than an independent samples one. This helps to eliminate the confounding effect (masking of treatment differences) that sources of variation other than the treatments have on the experimental units.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.