Chapter 10: Inferences Involving Two Populations.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
BCOR 1020 Business Statistics
Chapter Goals After completing this chapter, you should be able to:
Inferences On Two Samples
Chapter 9: Inferences Involving One Population Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 9-1 Introduction to Statistics Chapter 10 Estimation and Hypothesis.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
A Decision-Making Approach
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Inferences About Process Quality
ESTIMATION AND HYPOTHESIS TESTING: TWO POPULATIONS
5-3 Inference on the Means of Two Populations, Variances Unknown
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests Basic Business Statistics 10 th Edition.
Chapter 9 Comparing Means
Statistical Inference for Two Samples
Chapter 10 Hypothesis Testing
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Lesson Comparing Two Means.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Comparing Two Population Means
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Two-Sample Inference Procedures with Means. Of the following situations, decide which should be analyzed using one-sample matched pair procedure and which.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Chapter 10: Inferences Involving Two Populations.
Chap 9-1 Two-Sample Tests. Chap 9-2 Two Sample Tests Population Means, Independent Samples Means, Related Samples Population Variances Group 1 vs. independent.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
© Copyright McGraw-Hill 2000
Lesson Comparing Two Means. Knowledge Objectives Describe the three conditions necessary for doing inference involving two population means. Clarify.
AP Statistics Chapter 24 Comparing Means.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
© Copyright McGraw-Hill 2004
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
1 ES Chapter 18 & 20: Inferences Involving One Population Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25.
AP Statistics. Chap 13-1 Chapter 13 Estimation and Hypothesis Testing for Two Population Parameters.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Chapter 10: The t Test For Two Independent Samples.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Chapter 9 -Hypothesis Testing
Chapter 9: Inferences Involving One Population
Chapter 8 Section 8.5 Testing µ1 - µ2 and p1 - p2 Independent Samples Hypothesis Testing Mr. zboril | Milford PEP.
Elementary Statistics
Elementary Statistics
Lesson Comparing Two Means.
What are their purposes? What kinds?
Hypothesis Testing: The Difference Between Two Population Means
Presentation transcript:

Chapter 10: Inferences Involving Two Populations

Independent and Dependent Samples The object is to compare means of two samples and draw conclusions about the differences in population means. Two basic kinds of samples: independent and dependent (paired). Which kind you have depends on the sources of two samples and how the data was collected.

Dependent Samples If one observation is collected for each sample from the same source, the samples are dependent. This is often called “pair data” because you get a pair of observations from one individual or experimental unit. Examples include pretest & posttest scores, weight before and after a diet, left eye and right eye acuity, etc. There is a one-to-one correspondence between an observation in one sample and an observation in the other sample.

Independent Samples Two samples are independent if there is no connection between an observation of one sample with a particular observation of the other. Also, there can be no connection in the sampling procedure (an individual selected for one in no way affects the selection of any individual in the other, including by exclusion)

Dependent Sample Examples The same test is given to all students at the beginning and end of a course to measure learning (one pair of scores per person). IQ tests are given to husband & wife pairs. A medical treatment is given to patients matched for condition, age, sex, race, weight, and other characteristics with patients in a control group.

Independent Sample Examples The same test is given to all students in two classes (no pairing of scores occurs). IQ tests are given to men and women without consideration of relationship between any of them. Subjects are randomly assigned to a treatment and a control group to test a new drug. No attempt is made to “match” them.

Difference of means for Paired Data When dependent samples are involved, the data is paired data. Paired data results from: –before and after studies, –a common source, or –from matched pairs. We will denote the random variables from the two samples by X 1 and X 2.

The meaning of X 1 and X 2 Two samples are being taken. For example, the pretest is one sample, and the posttest is another sample. Then X 1 represents a pretest score and X 2 represents a posttest score. However, there is an X 1 and an X 2 for each person taking the test.

Paired Data X1X1 X2X2 x 11 x 21 x 12 x 22 x 13 x 23 x 14 x 24 x 15 x 25 x 16 x 26 x 17 x 27 …… Here the random variable names are given by capital letters, and individual observations by small letters. The data appear side-by- side, each pair having the same second subscript. You cannot change the order of one column without destroying the relation- ship between the columns. That is what makes it “paired data.” There are the same number of observations, n, in each sample.

What do we want to know? In paired data studies, the parameter of interest is the mean difference between the groups. This is conceptually different from the difference between the means of the groups. In other words, the population of interest is actually the differences between X 1 and X 2. We define a new value, d=x 1 -x 2 as one observation taken from this population.

Why is this important? The mean difference between the groups and the difference between the means of the groups are the same number. But their sampling distributions are different! From, we calculate, the mean difference. Now, will have a normal distribution if X 1 and X 2 are normal or n>30 (approx).

Distribution of mean differences If we know is normally distributed, then we can use the same tests and confidence intervals that we learned for. We won’t bother with the “variance known” situation this time. We will calculate the variance from the sample and use the t distribution. In other words, treat the d’s as the sample. Find their mean and standard deviation.

Distribution of mean differences There is a population parameter,, that we are trying to estimate. The point estimate is, taken from a sample of n differences (d’s). The d’s have a standard deviation, which is calculated in the same way as s. The standard deviation of is This is no difference from what we had before, except for symbols!

Confidence Interval for Paired Differences A (1-α)100% CI for is given by:, where.

Example: Salt-free diets are often prescribed for people with high blood pressure. The following data was obtained from an experiment designed to estimate the reduction in diastolic blood pressure as a result of following a salt-free diet for two weeks. Assume diastolic readings to be normally distributed. Find a 99% confidence interval for the mean reduction. Question: How do you decide which way to subtract?

Solution: Population Parameter of Interest: The mean reduction (difference) in diastolic blood pressure. Determine the distribution to use: Assumptions: Both sample populations are assumed normal, σ unknown. Use t with df = 8  1 = 7. Confidence level: 1   = 0.99 Two-tailed situation,  /2 = t (df,  /2) = t (7, 0.005) = 3.50 Sample evidence: Sample information:

Calculate the error bound: A 99% confidence interval for  d is

Hypothesis Testing: When testing a null hypothesis about the mean difference, the test statistic is where t* has a t distribution with df = n  1. Example: The corrosive effects of various chemicals on normal and specially treated pipes were tested by using a dependent sampling plan. The data collected is summarized by where d is the amount of corrosion on the treated pipe subtracted from the amount of corrosion on the normal pipe.

Example (continued): Does this sample provide sufficient evidence to conclude the specially treated pipes are more resistant to corrosion? Use  = 0.05 a.Solve using the classical approach. b.Solve using the p-value approach. Solution: 1.State the hypotheses (you must say something about the direction of the difference): Test for the mean difference in corrosion, normal pipe - treated pipe. The null and alternative hypothesis: H 0 :  d = 0 (did not lower corrosion) H a :  d > 0 (did lower corrosion)

2.Determine the appropriate type of test: Assumptions: Assume corrosion measures are approximately normal, σ unknown. Use t-test for paired differences. 3. Define the rejection region: a. Right tailed test, Reject H 0 if t*>t (16,0.05) = b. Reject H 0 if p< Calculate the value of the test statistic:

5.State the conclusion: a.Decision: Reject H 0 because t*=4.896>1.75 b. Decision: Reject H 0 because p<.0001<α=.05. Conclusion: The treated pipes do not corrode as much as the normal pipes when subjected to chemicals.

Two Independent Samples Compare the means of two populations Parameter of interest: (  1 -  2 ) Base inferences on The parentheses indicate that we are thinking of the difference as one parameter Consider the general confidence interval formula, P±TS. We know what P is now. We need to know the distribution of to find T and S.

Distribution of. The sampling distribution of has a mean, The point estimate of is The standard deviation of is Since the variances are hardly ever known, we will have to estimate them.

Sample Standard Deviation The sample standard deviation of is The following assumptions are needed to use the above formula: –The samples are randomly selected from normally distributed populations –The samples are independent –There is no reason to believe σ 1 =σ 2 –The populations (not samples) are “large”

Distribution The t distribution will be used. Degrees of freedom: –If n 1 =n 2, no problem, df=n –Otherwise, df may be calculated by a complicated formula. Statistical computer software will do this automatically. –Alternatively, the smaller of n 1 -1 and n 2 -1 can be used as an approximation. (conservative— actual confidence level will be higher, actual p- value will be lower)

Confidence Interval Now we have all the information we need. P= T=t (df,α/2) S= A (1-α)100% confidence interval for (  1 -  2 ) is given by

Example: A recent study reported the longest average workweeks for non-supervisory employees in private industry to be chef and construction. Find a 95% confidence interval for the difference in mean length of workweek between chef and construction. Assume normality for the sampled populations. Solution: Parameter of interest:  1 -  2 where  1 is the mean hours/week for chefs and  2 is the mean hours/week for construction workers.

df = 11, the smaller of: n 1  1 = 18  1 = 17 and n 2  1 = 12  1 = 11.  = 0.05 t (df,  /2) = t (11, 0.025) = 2.20 A 95% Confidence interval for  1 -  2 is

Note: 1.Using a calculator, the confidence interval is.55 to This confidence interval is narrower than the approximate interval computed on the previous slide. This illustrates the conservative (wider) nature of the confidence interval when approximating the degrees of freedom.

Hypothesis Tests: To test a null hypothesis about the difference between two population means, use the test statistic where df is the smaller of df 1 or df 2 when computing t* without the aid of a computer. Note: The hypothesized difference between the two population means (  01   02 ) can be any specified value. The most common value is zero.

Example: A recent study compared a new drug to ease post- operative pain with the leading brand. Independent random samples were obtained and the number of hours of pain relief for each patient were recorded. The summary statistics are given in the table below. Is there any evidence to suggest the new drug provides longer relief from post-operative pain? Use  = 0.05 a.Solve using the p-value approach. b.Solve using the classical approach.

Solution: 1.The Hypotheses: H 0 :  1   2 = 0 (new drug relieves pain no longer) H a :  1   2 > 0 (new drug works longer to relieve pain) 2.The appropriate test: Assumptions: Both populations are assumed to be approximately normal. The samples were random and independently selected. Use t*, df = 9 3.Rejection Region: Reject if t*>t(9, 0.05) = 1.83 or p< Calculations:

4.(cont’d) The p-value: 5.The Conclusion: Decision: Reject H 0. Conclusion: There is evidence to suggest that the new drug provides longer relief from post-operative pain.

If independent samples of sizes n 1 and n 2 are drawn randomly from large populations with p 1 = P 1 (success) and p 2 = P 2 (success), respectively, then the sampling distribution of has these properties: 1.a mean 2.a standard error 3.an approximately normal distribution if n 1 and n 2 are sufficiently large. Difference of Two Proportions

Note: To ensure normality: 1.The sample sizes are both larger than The products n 1 p 1, n 1 q 1, n 2 p 2, n 2 q 2 are all larger than 5. Since p 1 and p 2 are unknown, these products are estimated by 3.The samples consist of less than 10% of respective populations. Confidence Intervals: 1.A confidence interval for p 1  p 2 is based on the unbiased sample statistic. 2.The confidence limits are found using the following formula:

Example: A consumer group compared the reliability of two similar microcomputers from two different manufacturers. The proportion requiring service within the first year after purchase was determined for samples from each of two manufacturers. Find a 98% confidence interval for p 1  p 2, the difference in proportions needing service.

Solution: 1.Population Parameter of Interest: p 1 -p 2 where p 1 is the proportion of computers needing service for manufacturer 1 and p 2 is the proportion of computers needing service for manufacturer 2. 2.Check the Assumptions: Sample sizes larger than 20. Products all larger than 5. should have an approximate normal distribution. Use Z distribution.

3.The Sample Evidence: Sample information: Point estimate: 4.The Confidence Interval: a.Confidence coefficients: z(  /2) = z(0.01) = 2.33 b.Error Bound:

c.Confidence limits: Hypothesis Tests for the difference of two proportions: Look at what we’ve done before, e.g.: All of our hypothesis tests have made use of this form:

Parts of the Test If the null hypothesis is there is no difference between proportions, and we can assume normality, a. the test statistic is z* b. the parameter estimate is c. the hypothesized value is 0 d. the standard error is … (more to come)

Let’s consider how we can construct a standard error term. Now the standard deviation of p 1 '  p 2 ' is actually However, if the null hypothesis is true, p 1 = p 2, so we can say

But we don’t know p and q! How can we estimate these from the sample? Under the null hypothesis, the proportions of the two samples are the same. So simply take all of the data and pool it together to estimate the common proportion. The test statistic becomes

Example: The proportions of defective parts from two different suppliers were compared. The following data were collected. Is there any evidence to suggest the proportion of defectives is different for the two suppliers? Use  = 0.01.

1.The null and alternative hypotheses: H 0 : p 1  p 2 = 0 (proportion of defectives the same) H a : p 1  p 2  0 (proportion of defectives different) 2.The type of test: Difference of proportions, with Samples are larger than 20. Products are larger than 5. Sampling distribution should be approximately normal. Use z* for difference of proportions. 3.Rejection region: Reject H 0 if z* > z(.005) = or z* < -z(.005) = Calculations:

4.Calculations cont’d: 5.Conclusion: Do not reject H 0 and conclude that there is no evidence to suggest the proportion of defectives is different for the two suppliers.