Download presentation
Presentation is loading. Please wait.
Published byHillary Oliver Modified over 9 years ago
1
Chapter 10: Inferences Involving Two Populations
2
Chapter Goals Independent versus dependent samples. Compare two populations using: the mean of the paired differences, the difference between two means, and the difference between two proportions.
3
10.1: Independent and Dependent Samples Two basic kinds of samples: independent and dependent. The dependence or independence of two samples is determined by sources used for the data.
4
Source: Can be a person, an object, or anything that yields a piece of data. Dependent Sampling: The same set of sources or related sets are used to obtain the data representing both populations. Independent Sampling: Two unrelated sets or sources are used, one set from each population.
5
Example: An experiment is designed to study the effect of classical music on mathematical ability. Thirty-five students are selected at random and given a basic mathematics skills test. The following day the same 35 students first listen to 45 minutes of classical music, and then take a similar mathematics skills test. This sampling plan illustrates dependent sampling. The sources used for both samples (without classical music and with classical music) are the same. Note: Typically, when both a pretest and a posttest are used, the same subjects are used in the study. Thus, this kind of sampling plan usually leads to dependent samples.
6
Example: A university would like to compare the math SAT scores of male and female first-year students. Fifty males and 40 females are selected at random and their math SAT scores are recorded. This sampling plan illustrates independent sampling. The sources (students) used for each sample (male and female) were selected separately.
7
Example: The number of items produced by two different assembly lines is to be compared. Twenty five days are randomly selected for assembly line 1 and the number of items produced on each day is recorded. Forty randomly selected days for assembly line 2 are selected and the number of items produced on each day is recorded. This sampling plan illustrates independent sampling. The sources (assembly lines) used for each sample (line 1 and 2) were selected separately.
8
10.2: Inferences Concerning the Mean Difference Using Two Dependent Samples When dependent samples are involved, the data is paired data. Paired data results from: before and after studies, a common source, from matched pairs.
9
Paired difference: 1.Using the differences removes the dependence in the data. 2.Also removes the effect of otherwise uncontrolled factors. 3.The difference between the two population means, when dependent samples are used, is equivalent to the mean of the paired differences. 4.An inference about the mean of the paired differences is an inference about the difference of two means. 5.The mean of the sample paired differences is used as a point estimate for these inferences.
10
Sampling Distribution of 1.When paired observations are randomly selected from normal populations, the paired difference, d = x 1 x 2, will be approximately normally distributed about a mean d with a standard deviation d. 2.Use a t-test for one mean: make an inference about an unknown mean ( d ) where d has an approximately normal distribution with an unknown standard deviation ( d ). 3.Inferences are based on a sample of n dependent pairs of data and the t distribution with n degrees of freedom. The assumption for inferences about the mean of paired differences ( d ): The paired data are randomly selected from normally distributed populations.
11
Confidence Interval: The 1 confidence interval for estimating the mean difference d is found using the formula: where is the mean of the sample differences: and s d is the standard deviation of the sample differences:
12
Example: Salt-free diets are often prescribed for people with high blood pressure. The following data was obtained from an experiment designed to estimate the reduction in diastolic blood pressure as a result of following a salt-free diet for two weeks. Assume diastolic readings to be normally distributed. Find a 99% confidence interval for the mean reduction.
13
Solution: 1.Population Parameter of Interest: The mean reduction (difference) in diastolic blood pressure. 2.The Confidence Interval Criteria: a.Assumptions: Both sample populations are assumed normal. b.Test statistic: t with df = 8 1 = 7. c.Confidence level: 1 = 0.99 3.Sample evidence: Sample information:
14
4.The Confidence Interval: a.Confidence coefficients: Two-tailed situation, /2 = 0.005 t(df, /2) = t(7, 0.005) = 3.50 b.Maximum error: c.Confidence limits: 5.The Results: 1.957 to 3.957 is the 99% confidence interval for d.
15
Hypothesis Testing: When testing a null hypothesis about the mean difference, the test statistic is where t* has a t distribution with df = n 1. Example: The corrosive effects of various chemicals on normal and specially treated pipes were tested by using a dependent sampling plan. The data collected is summarized by where d is the amount of corrosion on the treated pipe subtracted from the amount of corrosion on the normal pipe.
16
Example (continued): Does this sample provide sufficient evidence to conclude the specially treated pipes are more resistant to corrosion? Use = 0.05 a.Solve using the p-value approach. b.Solve using the classical approach. Solution: 1.The Set-up: a.Population parameter of concern: The mean difference in corrosion, normal pipe treated pipe. b.The null and alternative hypothesis: H 0 : d = 0 ( ) (did not lower corrosion) H a : d > 0 (did lower corrosion)
17
2.The Hypothesis Test Criteria: a.Assumptions: Assume corrosion measures are approximately normal. b.Test statistic: c.Level of significance: = 0.05. 3.The Sample Evidence: a.Sample information: b.Calculate the value of the test statistic:
18
4.The Probability Distribution (Classical Approach): a.Critical value: t(16,0.05) = 1.75 b.t* is in the critical region. 4.The Probability Distribution (p-Value Approach): a.The p-value: b.The p-value is smaller than the level of significance, . 5.The Results: a.Decision: Reject H 0. b.Conclusion: At the 0.05 level of significance, there is evidence to suggest the treated pipes do not corrode as much as the normal pipes when subjected to chemicals.
19
10.3: Inferences Concerning the Difference Between Means Using Two Independent Samples When comparing the means of two populations, consider the difference between their means: 1 - 2. Inferences based on
20
Distribution of If independent samples of sizes n 1 and n 2 are drawn randomly from large populations with means 1 and 2 and variances and, respectively, the sampling distribution of, the difference between the sample means, has 1.a mean, 2.a standard error, If both populations have normal distributions, then the sampling distribution of will also be normally distributed.
21
Note: 1.Preceding statement true for all sample sizes if the populations are normal and the population variances are known. 2.Population variances are usually unknown quantities. 3.Estimate the standard error by using the sample variances. The assumptions for Inferences About the Difference Between Two Means 1 2 : The samples are randomly selected from normally distributed populations, and the samples are selected in an independent manner.
22
No assumptions are made about the population variances. The t distribution will be used as the test statistic. Case 1: t distribution, df calculated. Used when a computer and statistical software computes the number of degrees of freedom. df is a function of both sample sizes and their relative sizes, and both sample variances and their relative sizes. Case 2: t distribution, df approximated. Used when completing the inference without the aid of a computer or calculator. Use the t distribution with the smaller of df 1 = n 1 1 or df 2 = n 2 1 degrees of freedom. This will give conservative results. The true level of confidence for an interval will be slightly higher. The true p-value and true level of significance will be slightly less than reported.
23
Note: When the difference between A and B is being discussed, it is customary to express the difference as larger subtract smaller so that the resulting difference is positive, A B > 0. Confidence Intervals: The following formula is used for calculating the endpoints of the 1 confidence interval. where df is the smaller of df 1 or df 2 if no computer or calculator is used.
24
Example: A recent study reported the longest average workweeks for non-supervisory employees in private industry to be chef and construction. Find a 95% confidence interval for the difference in mean length of workweek between chef and construction. Assume normality for the sampled populations and that the samples were selected randomly. Solution: 1.Parameter of interest: The difference between the mean hours/week for chefs and the mean hours/week for construction workers, 1 - 2.
25
2.The Confidence Interval Criteria: a.Assumptions: Both populations are assumed normal and the samples were random and independently selected. b.Test statistic: t with df = 11; the smaller of n 1 1 = 18 1 = 17 or n 2 1 = 12 1 = 11. c.Confidence level: 1 = 0.95. 3.The Sample Evidence: Sample information given in the table. Point estimate for 1 - 2 : 4.The Confidence Interval: a.Confidence coefficients: t(df, /2) = t(11, 0.025) = 2.20 (Table 6, Appendix B)
26
b.Maximum error: c.Confidence limits: 5.The Results:.33 to 7.87 is a 95% confidence interval for the difference in mean hours/week for chefs and construction workers.
27
Note: 1.Using a calculator, the confidence interval is.55 to 7.65. 2.This confidence interval is narrower than the approximate interval computed on the previous slide. This illustrates the conservative (wider) nature of the confidence interval when approximating the degrees of freedom.
28
Hypothesis Tests: To test a null hypothesis about the difference between two population means, use the test statistic where df is the smaller of df 1 or df 2 when computing t* without the aid of a computer or calculator. Note: The hypothesized difference between the two population means 1 2 can be any specified value. The most common value is zero.
29
Example: A recent study compared a new drug to ease post- operative pain with the leading brand. Independent random samples were obtained and the number of hours of pain relief for each patient were recorded. The summary statistics are given in the table below. Is there any evidence to suggest the new drug provides longer relief from post-operative pain? Use = 0.05 a.Solve using the p-value approach. b.Solve using the classical approach.
30
Solution: 1.The Set-up: a.Parameter of concern: The difference between the mean time of pain relief for the new drug and that for the leading brand. b.The null and alternative hypotheses: H 0 : 1 2 = 0 (new drug relieves pain no longer) H a : 1 2 > 0 (new drug works longer to relieve pain) 2.The Hypothesis Test Criteria: a.Assumptions: Both populations are assumed to be approximately normal. The samples were random and independently selected. b.Test statistic: t*, df = 9 df = smaller of n 1 1 = 10 1 = 9 or n 2 1 = 17 1 = 16 c.Level of significance: = 0.05.
31
3.The Sample Evidence: a.Sample information: Given in the table. b.Test statistic: 4.The Probability Distribution (Classical Approach): a.Critical value: t(df, 0.05) = t(9, 0.05) = 1.83 b.t* is in the critical region.
32
4.The Probability Distribution (p-Value Approach): a.The p-value: b.The p-value is smaller than the level of significance, . 5.The Results: a.Decision: Reject H 0. b.Conclusion: There is evidence to suggest that the new drug provides longer relief from post-operative pain. Note: Here are the results using Minitab. Two sample T for New Drug vs Leading Brand N Mean StDev SE Mean New Drug 10 4.350 0.542 0.17 Leading 17 3.929 0.169 0.041 95% CI for mu New Drug - mu Leading: ( 0.03, 0.813) T-Test mu New Drug = mu Leading (vs >): T = 2.39 P = 0.019 DF = 10
33
10.4: Inferences Concerning the Difference Between Proportions Using Two Independent Samples Often interested in making statistical comparisons between two proportions, percentages, or probabilities associated with two populations.
34
Recall: The properties of a binomial experiment. 1.The observed probability is where x is the number of observed successes in n trials. 2. 3.p is the probability of success on an individual trial in a binomial probability experiment of n repeated independent trials. Goal: To compare two population proportions. Point Estimate: The difference between the observed proportions:
35
Sampling Distribution: If independent samples of sizes n 1 and n 2 are drawn randomly from large populations with p 1 = P 1 (success) and p 2 = P 2 (success), respectively, then the sampling distribution of has these properties: 1.a mean 2.a standard error 3.an approximately normal distribution if n 1 and n 2 are sufficiently large.
36
Note: To ensure normality: 1.The sample sizes are both larger than 20. 2.The products n 1 p 1, n 1 q 1, n 2 p 2, n 2 q 2 are all larger than 5. Since p 1 and p 2 are unknown, these products are estimated by 3.The samples consist of less than 10% of respective populations. The assumptions for inferences about the difference between two proportions p 1 p 2 : The n 1 random observations and the n 2 random observations forming the two samples are selected independently from two populations that are not changing during the sampling.
37
Confidence Intervals: 1.A confidence interval for p 1 p 2 is based on the unbiased sample statistic. 2.The confidence limits are found using the following formula:
38
Example: A consumer group compared the reliability of two similar microcomputers from two different manufacturers. The proportion requiring service within the first year after purchase was determined for samples from each of two manufacturers. Find a 98% confidence interval for p 1 p 2, the difference in proportions needing service.
39
Solution: 1.Population Parameter of Interest: The difference between the proportion of microcomputers needing service for manufacturer 1 and the proportion of microcomputers needing service for manufacturer 2. 2.The Confidence Interval Criteria: a.Assumptions: Sample sizes larger than 20. Products all larger than 5. should have an approximate normal distribution. b.Test statistic: z*. c.Confidence level: 1 = 0.98.
40
3.The Sample Evidence: Sample information: Point estimate: 4.The Confidence Interval: a.Confidence coefficients: z( /2) = z(0.01) = 2.33
41
b.Maximum error: c.Confidence limits: d.Results: 0.0124 to 0.1324 is a 98% confidence interval for the difference in proportions.
42
Hypothesis Tests: If the null hypothesis is there is no difference between proportions, the test statistic is Note: 1.The null hypothesis is p 1 = p 2, or p 1 p 2 = 0. 2.Nonzero differences between proportions are not discussed in this section. 3.The numerator of the formula above:
43
Note (continued): 4.Since the null hypothesis is p 1 = p 2, the standard error of the point estimate is where p = p 1 = p 2 and q = 1 p. 5.Use a pooled estimate for the common proportion. The test statistic becomes
44
Example: The proportion of defective parts from two different suppliers were compared. The following data was collected. Is there any evidence to suggest the proportion of defectives is different for the two suppliers? Use = 0.01. a.Solve using the classical approach. b.Solve using the p-value approach. Solution: 1.The Set-up: a.Population parameter of interest: The difference between the proportion of defectives for supplier 1 and the proportion of defectives for supplier 2.
45
b.The null and alternative hypotheses: H 0 : p 1 p 2 = 0 (proportion of defectives the same) H a : p 1 p 2 0 (proportion of devectives different) 2.The Hypothesis Test Criteria: a.Assumptions: Populations are large (number of parts supplied). Samples are larger than 20. Products are larger than 5. Sampling distribution should be approximately normal. b.Test statistic: z*. c.Level of significance: = 0.01.
46
3.The Sample Evidence: a.Sample information: b.Test statistic:
47
4.The Probability Distribution (Classical Approach): a.Critical value: z( /2) = z(0.005) = 2.58. b.z* is not in the critical region. 4.The Probability Distribution (p-Value Approach): a.The p-value: b.The p-value is larger than the level of significance, . 5.The Results: a.Decision: Do not reject H 0. b.Conclusion: There is no evidence to suggest the proportion of defectives is different for the two suppliers.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.