Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Chapter 16 Introduction to Nonparametric Statistics
Chapter 15 Comparing Two Populations: Dependent samples.
Is it statistically significant?
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Significance Testing Chapter 13 Victor Katch Kinesiology.
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Elementary hypothesis testing
Chapter 14 Analysis of Categorical Data
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
BCOR 1020 Business Statistics
Topic 2: Statistical Concepts and Market Returns
Chapter Goals After completing this chapter, you should be able to:
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
© 2004 Prentice-Hall, Inc.Chap 10-1 Basic Business Statistics (9 th Edition) Chapter 10 Two-Sample Tests with Numerical Data.
Inferences About Process Quality
Basic Business Statistics (9th Edition)
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
Quantitative Business Methods for Decision Making Estimation and Testing of Hypotheses.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests Basic Business Statistics 10 th Edition.
Chapter 15 Nonparametric Statistics
Chapter 12: Analysis of Variance
Statistical Inference for Two Samples
Variance-Test-1 Inferences about Variances (Chapter 7) Develop point estimates for the population variance Construct confidence intervals for the population.
Experimental Statistics - week 2
Hypothesis testing – mean differences between populations
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
NONPARAMETRIC STATISTICS
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Wilcoxon rank sum test (or the Mann-Whitney U test) In statistics, the Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
© Copyright McGraw-Hill 2000
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.
Hypothesis Testing Judicial Analogy Hypothesis Testing Hypothesis testing  Null hypothesis Purpose  Test the viability Null hypothesis  Population.
AP Statistics Chapter 24 Comparing Means.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
NON-PARAMETRIC STATISTICS
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Copyright ©2011 Nelson Education Limited Inference from Small Samples CHAPTER 10.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
One Sample Inf-1 In statistical testing, we use deductive reasoning to specify what should happen if the conjecture or null hypothesis is true. A study.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
1 Nonparametric Statistical Techniques Chapter 18.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
Hypothesis Testing: The Difference Between Two Population Means
Nonparametric Statistics
Presentation transcript:

Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations. Develop a statistical test for the difference in means between two independent normal populations, assuming equal population variances. Develop a statistical test for the difference in means between two independent normal populations, assuming unequal population variances. Develop a nonparametric statistical test for the difference in means between two independent populations, assuming equal population variances.

Two-Sample-Means-2 Take soil samples at random locations within the site and at random locations at areas outside the site. Assume values at areas outside the site are unaffected by site activities that lead to contamination. Need to determine if concentration of contaminant at an old industrial site is greater than background levels from areas surrounding the site. Situation

Two-Sample-Means-3 1.Background level of contaminant is greater than the no-detect level (-5.0) [on a natural log scale]. 2.Site level of contaminant is greater than the no-detect level (-5.0). 3.Site level is different from Background level. 11 2 22 Is the situation this or this? Hypotheses of Interest

Two-Sample-Means-4 Background level of contaminant is greater than the no-detect level (-5.0). Site level of contaminant is greater than the no-detect level (-5.0). One sided t-tests: R.R. n t.05,n T. S. i = 1 => Background areas (B) i = 2 => Contamination site (S) Critical Values One- Sample Hypothesis Tests H 0 :  i  -5.0 H A :  i > -5.0

Two-Sample-Means-5 Y H 0 :  B   0 = -5.0 H A :  B >  0 = -5.0 T.S. R.R. Pr(Type I error) =  = 0.05 Reject H 0 if t > t ,n-1 T 0.05,6 =1.943 Conclusion: Since 5.90 > we reject H 0 and conclude that the true average background level is above Same test could be performed for contaminated site data. DATA and One-Sided T-tests

Two-Sample-Means-6 H 0 : Site level is not different from Background level. (  B =  S ) H A : Site level is different from Background level. (  B  S ) This requires comparing sample means from two “independent” samples, one from each population. T. S. the standard error of the difference of the two means. Obvious test statistic. Comparison Hypothesis

Two-Sample-Means-7 If the true variances of the two populations are known, we use the property of independent random variables that says: Var(X-Y) = Var(X) + Var(Y). From sampling Dist of T. S. Standard Error of the Difference of Two Means

Two-Sample-Means-8 True standard error of the difference of two means. Estimate of the standard error of the mean differences. Estimate of the common (Pooled) standard deviation. Assume the two populations have the same, or nearly the same true variance,   p.

Two-Sample-Means-9 Assume confidence level of (1-  ) 100 % Confidence Interval for difference

Two-Sample-Means-10 Test Statistic: Pr(Type I Error) =  Pooled Variances T-test Statistical test for difference of means

Two-Sample-Means-11 Y B Y S H 0 :  B -  S = 0 H A :  B -  s  0 H A : Average site level significantly different from background. BackgroundSite CONT

Two-Sample-Means-12 T.S. R.R. Reject if: Conclusion:Since > we reject H 0 and conclude that site concentration levels are significantly different from background. Pr(Type I Error) =  = 0.05

Two-Sample-Means-13 What if the two populations do not have the same variance? New estimate of standard error of difference. Test and CI is no longer exact - uses Satterthwaite’s approximate df value (df’). T.S.: Round df’ down to the nearest integer. Separate Variances CI and T-test C.I.:

Two-Sample-Means-14 Redo Test For site contamination example, assume  1   2 and redo test. T.S. R.R. Reject if: Conclusion: Since > we reject H 0 and conclude that site concentration levels are significantly different from background.

Two-Sample-Means-15 One-sided Test: Two-sided Test:  = Pr(Type I Error)  = Pr(Type II Error) Sample Size Determination (equal variances) (1-  )100% CI for μ 1 – μ 2 :

Two-Sample-Means-16 In our sample, a  of 2.34 was observed. What if we had wanted to be sure that a  of say 1 unit would be declared significant with: 0.05 =  = Pr(Type I Error) 0.10 =  = Pr(Type II Error) Assume a common population variance of  2 = 1. Two-sided test: One-sided test: Example of Sample Size Determination n=n 1 =n 2

Two-Sample-Means-17 Summary These two-sample inferences require the assumption of independent, normal population distributions. If we have reason to believe the two population variances are equal, then we should use the pooled variances method. This results in more powerful inferences. We need not worry about the normality assumption when both sample sizes are large, all results are still approximately correct. (CLT.) What to do when independence does not hold? Advanced! Partial solution next lecture (paired samples). What to do when we have small samples and we don’t believe the data are normal? Next slide…

Two-Sample-Means-18 Population variances assumed to be equal. Measurements (observations) are assumed to be independent from continuous distributions. Interest is whether the center of the two population distributions are the same or not. Also known as the Mann-Whitney U Test. Sign Test: the nonparametric equivalent of the one sample t-test. A class of nonparametric tests. These: The Wilcoxon Rank Sum Test Do not require data to have normal distributions. Seek to make inferences about the median, a more appropriate representation of the center of the population for highly skewed and/or very heavy-tailed distributions. In the Wilcoxon Rank Sum Test:

Two-Sample-Means-19 H 0 : Populations are identical If H 0 is true, when we put the data from the two samples together and sort them from lowest to highest, i.e. we rank them (lowest obs gets rank=1, 2 nd rank=2, etc., tied obs get average of ranks). The ranks of the observations from the two samples should be fairly well intermingled. Thus, the sum of the ranks from population 1 observations should be approximately equal to the sum of the ranks of population 2 observations. H A : Population 1 is shifted to the right of population 2. If H A is true, if we put the data from the two samples together then sort them lowest to highest, the sum of the ranks of population 1 observations should be greater than the sum of ranks of observations from population 2. Idea behind the Wilcoxon Test

Two-Sample-Means-20 H 0 : Populations are identical H A :1. Population 1 is shifted to right of population Population 1 is shifted to the left of population Populations 1 and 2 have different location parameters. 1.Reject H 0 if T > T U 2.Reject H 0 if T < T L 3.Reject H 0 if T>T U or T<T L R.R. Let T denote the sum of the ranks of population 1 observations. T.S. If n 1  10 and n 2  10 use T as the test statistic and Table 6 in Ott & Longnecker for critical values. Situation #1 Wilcoxon Test Statistic

Two-Sample-Means-21 If n 1 >10, n 2 >10 we use a normal approximation to the distribution of the sum of the ranks. T.S. and t j denotes the number of tied ranks in the j th group. 1.Reject H 0 if z > z  2.Reject H 0 if z < -z  3.Reject H 0 if |z|>z  /2  = Pr(Type I Error) R.R. Let T denote the sum of the ranks of population 1 observations. Situation #2

Two-Sample-Means-22 GroupValueRankPop 1 Ranks SUM74 = T R.R. :Reject H 0 if T>T U or T<T L n 1  10 and n 2  10 Two sided alternative hypothesis TL = 39 TU = 66 Table 6 n 1 = n 2 = 7 Conclusion: Reject H 0 Situation #1:

Two-Sample-Means-23 Wilcoxon Critical Values Table 6

Two-Sample-Means-24 Paired Data Situation (§ ) In this set of slides we will: Develop confidence intervals and test for the difference between two means that have been measured on the same or highly related experimental units. The underlying population of differences is assumed to be normally distributed. A nonparametric alternative that does not rely on normality will be discussed (§6.5).

Two-Sample-Means-25 Two analysts, supposedly of identical abilities, each measure the parts per million of a certain type of chemical impurity in drinking water. It is claimed that analyst 1 tends to give higher readings than analyst 2. To test this theory, each of six water samples is divided and then analyzed by both analysts separately. The data are as follows: Data are paired hence observations are not independent. Observations in the same row are more likely to be close to each other than are observations between rows. Situation

Two-Sample-Means-26 Because of the dependence within rows we don’t use the difference of two means but instead use the mean of the individual differences. written as Let y ij represent the i th observation for the j th sample, i=1,…,n, j=1,2. Compute d i = y i1 - y i2 the difference in responses for the i th observation, and then proceed as in the one sample t-test situation. the sample mean of the differences. the sample standard deviation of the differences. Paired Data Considerations

Two-Sample-Means-27 If the two populations were independent, the variance of the difference would be computed using the probability rule: Var( X - Y ) = Var(X) + Var(Y) But here the two populations are dependent and the above rule does not hold. Hence In above Example: i.e we don’t use a pooled variance estimator. We use the variance of the differences. Importance of Independence/Dependence on Variance of Difference Very different!

Two-Sample-Means-28 H 0 :  d = D 0 H a :1.  d > D 0 2.  d < D 0 3.  d  D 0 Test Statistic: Rejection Region: 1.Reject H 0 if t > t ,n-1 2.Reject H 0 if t < -t ,n-1 3.Reject H 0 if |t| > t  /2,n-1  -level confidence interval: For the Example, t = and t 0.05,5 = hence we do not reject H 0:  d =0 in testing situation 1, and conclude that there are no significant differences. 95% CI: 1.4  1.94 or ( -.54, 3.34 ) The paired t-test

Two-Sample-Means-29 T+ = sum of the positive ranks (T+ = 0 if no positive ranks). T- = sum of the negative ranks (T- = 0 if no negative ranks). T = smaller of T+, T- ignoring their signs. A nonparametric alternative to the paired t-test when the population distribution of differences are not normal. (Requires symmetry about the population median.) Test Construction: 1.Compute the differences in paired observations. 2.Rank the resulting obs smallest to largest. 3.All obs less than D 0 are prefaced with a negative sign. 4.All obs greater than D 0 are prefaced with a positive sign. 5.Discard any obs that are equal to D 0. 6.Let n be the resulting number of obs. Wilcoxon Signed-Rank Test for Paired Data (§6.5) Small values of |T-| support differences>D 0, small values of T+ support differences<D 0, small values of T support differences  D 0.

Two-Sample-Means-30 Test Statistics H 0 : The distribution of differences is symmetrical around D 0 (usually 0). H A :1.The differences tend to be larger than D 0. 2.The differences tend to be smaller than D 0. 3.Either 1 or 2 above are true (two-sided alternative). T.S. 1.T=|T- | 2.T=T+ 3.T=smaller(|T- |,T+) Critical value (from Table 7) 1.Reject if T  T ,n 2.Reject if T  T ,n 3.Reject if T  T  /2,n R.R. (n  50) Situation #1 T.S. (n>50) 1.Reject H 0 if z < -z  2.Reject H 0 if z < - z  3.Reject H 0 if z < -z  /2 R.R. Situation #2 Critical value (from Table 1)

Two-Sample-Means-31 DifferencesSigned Ranks For H A : Differences tend to be larger than 0, the test statistic is |T-| = 3 and the critical value is T 0.05,6 = 2. Hence we do not reject H 0 and conclude that there is no difference. Problem Illustration of calculations for large n (not the case here):

Two-Sample-Means-32 Concluding Comments The paired measurements (samples) case can be easily extended to more than two measurements. When repeated measurements are taken on an individual, we are in the same situation as with two paired samples, that is, the repeated measurements on an individual are expected to be more correlated than measurements among individuals. Solutions to the multiple repeated measurements case cannot follow the simple solution for two dependent samples. The final solution involves specifying not only what we expect to happen in the means of the sampling “times” but we have to specify the structure of the correlations between sampling times. It is advantageous to design a paired data experiment rather than an independent samples one. This helps to eliminate the confounding effect (masking of treatment differences) that sources of variation other than the treatments have on the experimental units.