Chapter 10: Inferences Involving Two Populations.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
STATISTICAL INFERENCE PART V
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Chapter 9: Inferences for Two –Samples
BCOR 1020 Business Statistics
Chapter Goals After completing this chapter, you should be able to:
Inferences On Two Samples
Chapter 9: Inferences Involving One Population Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 9-1 Introduction to Statistics Chapter 10 Estimation and Hypothesis.
A Decision-Making Approach
Inferences about two proportions Assumptions 1.We have proportions from two simple random samples that are independent (not paired) 2.For both samples,
Inferences About Process Quality
ESTIMATION AND HYPOTHESIS TESTING: TWO POPULATIONS
5-3 Inference on the Means of Two Populations, Variances Unknown
CHAPTER 10 ESTIMATION AND HYPOTHESIS TESTING: TWO POPULATIONS Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests Basic Business Statistics 10 th Edition.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Chapter 9 Comparing Means
Statistical Inference for Two Samples
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Chapter 13: Linear Correlation and Regression Analysis
STATISTICAL INFERENCE PART VII
Comparing Two Population Means
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Chapter 10: Inferences Involving Two Populations.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Chapter 9: Inferences Involving One Population Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25.
Statistics Are Fun! Two-Sample Tests of Hypothesis
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Industrial Statistics 2
Chapter 14: Elements of Nonparametric Statistics.
Chap 9-1 Two-Sample Tests. Chap 9-2 Two Sample Tests Population Means, Independent Samples Means, Related Samples Population Variances Group 1 vs. independent.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
© Copyright McGraw-Hill 2000
Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the.
Lesson Comparing Two Means. Knowledge Objectives Describe the three conditions necessary for doing inference involving two population means. Clarify.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
© Copyright McGraw-Hill 2004
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
1 ES Chapter 18 & 20: Inferences Involving One Population Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25.
1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis Weight Waist Size.
AP Statistics. Chap 13-1 Chapter 13 Estimation and Hypothesis Testing for Two Population Parameters.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Chapter 9 Lecture 3 Section: 9.3. We will now consider methods for using sample data from two independent samples to test hypotheses made about two population.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Essential Statistics Chapter 191 Comparing Two Proportions.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Two-Sample Hypothesis Testing
Chapter 9: Inferences Involving One Population
Chapter 8 Hypothesis Testing with Two Samples.
Elementary Statistics
Comparing Populations
Chapter 9 Lecture 4 Section: 9.4.
Presentation transcript:

Chapter 10: Inferences Involving Two Populations

Chapter Goals Independent versus dependent samples. Compare two populations using: the mean of the paired differences, the difference between two means, and the difference between two proportions.

10.1: Independent and Dependent Samples Two basic kinds of samples: independent and dependent. The dependence or independence of two samples is determined by sources used for the data.

Source: Can be a person, an object, or anything that yields a piece of data. Dependent Sampling: The same set of sources or related sets are used to obtain the data representing both populations. Independent Sampling: Two unrelated sets or sources are used, one set from each population.

Example: An experiment is designed to study the effect of classical music on mathematical ability. Thirty-five students are selected at random and given a basic mathematics skills test. The following day the same 35 students first listen to 45 minutes of classical music, and then take a similar mathematics skills test. This sampling plan illustrates dependent sampling. The sources used for both samples (without classical music and with classical music) are the same. Note: Typically, when both a pretest and a posttest are used, the same subjects are used in the study. Thus, this kind of sampling plan usually leads to dependent samples.

Example: A university would like to compare the math SAT scores of male and female first-year students. Fifty males and 40 females are selected at random and their math SAT scores are recorded. This sampling plan illustrates independent sampling. The sources (students) used for each sample (male and female) were selected separately.

Example: The number of items produced by two different assembly lines is to be compared. Twenty five days are randomly selected for assembly line 1 and the number of items produced on each day is recorded. Forty randomly selected days for assembly line 2 are selected and the number of items produced on each day is recorded. This sampling plan illustrates independent sampling. The sources (assembly lines) used for each sample (line 1 and 2) were selected separately.

10.2: Inferences Concerning the Mean Difference Using Two Dependent Samples When dependent samples are involved, the data is paired data. Paired data results from: before and after studies, a common source, from matched pairs.

Paired difference: 1.Using the differences removes the dependence in the data. 2.Also removes the effect of otherwise uncontrolled factors. 3.The difference between the two population means, when dependent samples are used, is equivalent to the mean of the paired differences. 4.An inference about the mean of the paired differences is an inference about the difference of two means. 5.The mean of the sample paired differences is used as a point estimate for these inferences.

Sampling Distribution of 1.When paired observations are randomly selected from normal populations, the paired difference, d = x 1  x 2, will be approximately normally distributed about a mean  d with a standard deviation  d. 2.Use a t-test for one mean: make an inference about an unknown mean (  d ) where d has an approximately normal distribution with an unknown standard deviation (  d ). 3.Inferences are based on a sample of n dependent pairs of data and the t distribution with n  degrees of freedom. The assumption for inferences about the mean of paired differences (  d ): The paired data are randomly selected from normally distributed populations.

Confidence Interval: The 1   confidence interval for estimating the mean difference  d is found using the formula: where is the mean of the sample differences: and s d is the standard deviation of the sample differences:

Example: Salt-free diets are often prescribed for people with high blood pressure. The following data was obtained from an experiment designed to estimate the reduction in diastolic blood pressure as a result of following a salt-free diet for two weeks. Assume diastolic readings to be normally distributed. Find a 99% confidence interval for the mean reduction.

Solution: 1.Population Parameter of Interest: The mean reduction (difference) in diastolic blood pressure. 2.The Confidence Interval Criteria: a.Assumptions: Both sample populations are assumed normal. b.Test statistic: t with df = 8  1 = 7. c.Confidence level: 1   = Sample evidence: Sample information:

4.The Confidence Interval: a.Confidence coefficients: Two-tailed situation,  /2 = t(df,  /2) = t(7, 0.005) = 3.50 b.Maximum error: c.Confidence limits: 5.The Results:  to is the 99% confidence interval for  d.

Hypothesis Testing: When testing a null hypothesis about the mean difference, the test statistic is where t* has a t distribution with df = n  1. Example: The corrosive effects of various chemicals on normal and specially treated pipes were tested by using a dependent sampling plan. The data collected is summarized by where d is the amount of corrosion on the treated pipe subtracted from the amount of corrosion on the normal pipe.

Example (continued): Does this sample provide sufficient evidence to conclude the specially treated pipes are more resistant to corrosion? Use  = 0.05 a.Solve using the p-value approach. b.Solve using the classical approach. Solution: 1.The Set-up: a.Population parameter of concern: The mean difference in corrosion, normal pipe  treated pipe. b.The null and alternative hypothesis: H 0 :  d = 0 (  ) (did not lower corrosion) H a :  d > 0 (did lower corrosion)

2.The Hypothesis Test Criteria: a.Assumptions: Assume corrosion measures are approximately normal. b.Test statistic: c.Level of significance:  = The Sample Evidence: a.Sample information: b.Calculate the value of the test statistic:

4.The Probability Distribution (Classical Approach): a.Critical value: t(16,0.05) = 1.75 b.t* is in the critical region. 4.The Probability Distribution (p-Value Approach): a.The p-value: b.The p-value is smaller than the level of significance, . 5.The Results: a.Decision: Reject H 0. b.Conclusion: At the 0.05 level of significance, there is evidence to suggest the treated pipes do not corrode as much as the normal pipes when subjected to chemicals.

10.3: Inferences Concerning the Difference Between Means Using Two Independent Samples When comparing the means of two populations, consider the difference between their means:  1 -  2. Inferences based on

Distribution of If independent samples of sizes n 1 and n 2 are drawn randomly from large populations with means  1 and  2 and variances and, respectively, the sampling distribution of, the difference between the sample means, has 1.a mean, 2.a standard error, If both populations have normal distributions, then the sampling distribution of will also be normally distributed.

Note: 1.Preceding statement true for all sample sizes if the populations are normal and the population variances are known. 2.Population variances are usually unknown quantities. 3.Estimate the standard error by using the sample variances. The assumptions for Inferences About the Difference Between Two Means  1   2 : The samples are randomly selected from normally distributed populations, and the samples are selected in an independent manner.

No assumptions are made about the population variances. The t distribution will be used as the test statistic. Case 1: t distribution, df calculated. Used when a computer and statistical software computes the number of degrees of freedom. df is a function of both sample sizes and their relative sizes, and both sample variances and their relative sizes. Case 2: t distribution, df approximated. Used when completing the inference without the aid of a computer or calculator. Use the t distribution with the smaller of df 1 = n 1  1 or df 2 = n 2  1 degrees of freedom. This will give conservative results. The true level of confidence for an interval will be slightly higher. The true p-value and true level of significance will be slightly less than reported.

Note: When the difference between A and B is being discussed, it is customary to express the difference as larger subtract smaller so that the resulting difference is positive, A  B > 0. Confidence Intervals: The following formula is used for calculating the endpoints of the 1   confidence interval. where df is the smaller of df 1 or df 2 if no computer or calculator is used.

Example: A recent study reported the longest average workweeks for non-supervisory employees in private industry to be chef and construction. Find a 95% confidence interval for the difference in mean length of workweek between chef and construction. Assume normality for the sampled populations and that the samples were selected randomly. Solution: 1.Parameter of interest: The difference between the mean hours/week for chefs and the mean hours/week for construction workers,  1 -  2.

2.The Confidence Interval Criteria: a.Assumptions: Both populations are assumed normal and the samples were random and independently selected. b.Test statistic: t with df = 11; the smaller of n 1  1 = 18  1 = 17 or n 2  1 = 12  1 = 11. c.Confidence level: 1   = The Sample Evidence: Sample information given in the table. Point estimate for  1 -  2 : 4.The Confidence Interval: a.Confidence coefficients: t(df,  /2) = t(11, 0.025) = 2.20 (Table 6, Appendix B)

b.Maximum error: c.Confidence limits: 5.The Results:.33 to 7.87 is a 95% confidence interval for the difference in mean hours/week for chefs and construction workers.

Note: 1.Using a calculator, the confidence interval is.55 to This confidence interval is narrower than the approximate interval computed on the previous slide. This illustrates the conservative (wider) nature of the confidence interval when approximating the degrees of freedom.

Hypothesis Tests: To test a null hypothesis about the difference between two population means, use the test statistic where df is the smaller of df 1 or df 2 when computing t* without the aid of a computer or calculator. Note: The hypothesized difference between the two population means  1   2 can be any specified value. The most common value is zero.

Example: A recent study compared a new drug to ease post- operative pain with the leading brand. Independent random samples were obtained and the number of hours of pain relief for each patient were recorded. The summary statistics are given in the table below. Is there any evidence to suggest the new drug provides longer relief from post-operative pain? Use  = 0.05 a.Solve using the p-value approach. b.Solve using the classical approach.

Solution: 1.The Set-up: a.Parameter of concern: The difference between the mean time of pain relief for the new drug and that for the leading brand. b.The null and alternative hypotheses: H 0 :  1   2 = 0 (new drug relieves pain no longer) H a :  1   2 > 0 (new drug works longer to relieve pain) 2.The Hypothesis Test Criteria: a.Assumptions: Both populations are assumed to be approximately normal. The samples were random and independently selected. b.Test statistic: t*, df = 9 df = smaller of n 1  1 = 10  1 = 9 or n 2  1 = 17  1 = 16 c.Level of significance:  = 0.05.

3.The Sample Evidence: a.Sample information: Given in the table. b.Test statistic: 4.The Probability Distribution (Classical Approach): a.Critical value: t(df, 0.05) = t(9, 0.05) = 1.83 b.t* is in the critical region.

4.The Probability Distribution (p-Value Approach): a.The p-value: b.The p-value is smaller than the level of significance, . 5.The Results: a.Decision: Reject H 0. b.Conclusion: There is evidence to suggest that the new drug provides longer relief from post-operative pain. Note: Here are the results using Minitab. Two sample T for New Drug vs Leading Brand N Mean StDev SE Mean New Drug Leading % CI for mu New Drug - mu Leading: ( 0.03, 0.813) T-Test mu New Drug = mu Leading (vs >): T = 2.39 P = DF = 10

10.4: Inferences Concerning the Difference Between Proportions Using Two Independent Samples Often interested in making statistical comparisons between two proportions, percentages, or probabilities associated with two populations.

Recall: The properties of a binomial experiment. 1.The observed probability is where x is the number of observed successes in n trials p is the probability of success on an individual trial in a binomial probability experiment of n repeated independent trials. Goal: To compare two population proportions. Point Estimate: The difference between the observed proportions:

Sampling Distribution: If independent samples of sizes n 1 and n 2 are drawn randomly from large populations with p 1 = P 1 (success) and p 2 = P 2 (success), respectively, then the sampling distribution of has these properties: 1.a mean 2.a standard error 3.an approximately normal distribution if n 1 and n 2 are sufficiently large.

Note: To ensure normality: 1.The sample sizes are both larger than The products n 1 p 1, n 1 q 1, n 2 p 2, n 2 q 2 are all larger than 5. Since p 1 and p 2 are unknown, these products are estimated by 3.The samples consist of less than 10% of respective populations. The assumptions for inferences about the difference between two proportions p 1  p 2 : The n 1 random observations and the n 2 random observations forming the two samples are selected independently from two populations that are not changing during the sampling.

Confidence Intervals: 1.A confidence interval for p 1  p 2 is based on the unbiased sample statistic. 2.The confidence limits are found using the following formula:

Example: A consumer group compared the reliability of two similar microcomputers from two different manufacturers. The proportion requiring service within the first year after purchase was determined for samples from each of two manufacturers. Find a 98% confidence interval for p 1  p 2, the difference in proportions needing service.

Solution: 1.Population Parameter of Interest: The difference between the proportion of microcomputers needing service for manufacturer 1 and the proportion of microcomputers needing service for manufacturer 2. 2.The Confidence Interval Criteria: a.Assumptions: Sample sizes larger than 20. Products all larger than 5. should have an approximate normal distribution. b.Test statistic: z*. c.Confidence level: 1   = 0.98.

3.The Sample Evidence: Sample information: Point estimate: 4.The Confidence Interval: a.Confidence coefficients: z(  /2) = z(0.01) = 2.33

b.Maximum error: c.Confidence limits: d.Results:  to is a 98% confidence interval for the difference in proportions.

Hypothesis Tests: If the null hypothesis is there is no difference between proportions, the test statistic is Note: 1.The null hypothesis is p 1 = p 2, or p 1  p 2 = 0. 2.Nonzero differences between proportions are not discussed in this section. 3.The numerator of the formula above:

Note (continued): 4.Since the null hypothesis is p 1 = p 2, the standard error of the point estimate is where p = p 1 = p 2 and q = 1  p. 5.Use a pooled estimate for the common proportion. The test statistic becomes

Example: The proportion of defective parts from two different suppliers were compared. The following data was collected. Is there any evidence to suggest the proportion of defectives is different for the two suppliers? Use  = a.Solve using the classical approach. b.Solve using the p-value approach. Solution: 1.The Set-up: a.Population parameter of interest: The difference between the proportion of defectives for supplier 1 and the proportion of defectives for supplier 2.

b.The null and alternative hypotheses: H 0 : p 1  p 2 = 0 (proportion of defectives the same) H a : p 1  p 2  0 (proportion of devectives different) 2.The Hypothesis Test Criteria: a.Assumptions: Populations are large (number of parts supplied). Samples are larger than 20. Products are larger than 5. Sampling distribution should be approximately normal. b.Test statistic: z*. c.Level of significance:  = 0.01.

3.The Sample Evidence: a.Sample information: b.Test statistic:

4.The Probability Distribution (Classical Approach): a.Critical value: z(  /2) = z(0.005) = b.z* is not in the critical region. 4.The Probability Distribution (p-Value Approach): a.The p-value: b.The p-value is larger than the level of significance, . 5.The Results: a.Decision: Do not reject H 0. b.Conclusion: There is no evidence to suggest the proportion of defectives is different for the two suppliers.