Download presentation
Presentation is loading. Please wait.
1
Business Statistics for Managerial Decision
Inference for proportions
2
Inference for Proportions
Some statistical studies concern variables measured in a scale of equal units such as dollars or grams. We have discussed inference about the mean of variables likes these in our previous lectures. Other studies record categorical variables, such as the race or occupation of a person, the make of a car, or type of complaint received from a customer. When we record categorical variables, our data consists of counts or percents obtained from counts.
3
Inference for Proportions
The parameters we want to do inference about in these settings are population proportions. Just as in the case of inference about population means, we may be concerned with a single population or with comparing two populations. Inference about one or two proportions is very similar to inference about means and it is based on sampling distributions that are approximately Normal.
4
Example: Work stress and personal life
The human resources manager of a chain restaurants is concerned that work stress may be affecting the chain’s employees. She asks a random sample of 100 employees to respond Yes or No to the question “Does work stress have a negative impact on your personal life?” Of these 68 say “yes.”
5
Example: Work stress and personal life
The Parameter of interest is the proportion of the chain’s employee who would answer “Yes” if asked. This is population proportion, which we call P. The statistic used to estimate the unknown parameter is the sample proportion
6
Inference for a Single Proportion
The sample proportion is a discrete random variable that can take the values 0, 1/100, 2/100, …, 99/100 or 1. The probability model for can be based on the Binomial distributions for counts. If the sample size n is very small, we must base tests and confidence intervals for P on the discrete distribution of . We can approximate the distribution of by a Normal distribution when the sample size is large.
7
Sampling Distribution of a Sample Proportion
Choose a SRS of size n from a large population that contains population proportion P of “successes.” Let be the sample proportion of successes, Then: As the sample size increases, the sampling distribution of becomes approximately Normal. The mean of the sampling distribution is P. The standard deviation of the sampling distribution is
8
Sampling Distribution of a Sample Proportion
The sampling distribution of the sample proportion of successes has approximately a Normal distribution.
9
Confidence Interval for a Single Proportion
The sample proportion is the natural estimator of the population proportion P. The traditional confidence interval for P is based on the Normal approximation to the distribution of Unfortunately, confidence intervals based on this statistic can be quite inaccurate, even for large samples. We can do better by moving sample proportion slightly away from 0 and 1. The following simple adjustment works very well in practice.
10
Confidence Interval for a Single Proportion
Wilson Estimate: Assume we have 4 additional observations, 2 of which are successes and 2 of which are failures. The new sample size is n + 4 and the count of successes is X+2. The estimator of the population proportion is
11
Confidence Interval for a Single Proportion
We base a confidence interval on the z statistic obtained by standardizing the Wilson estimate . The distribution of is close to the Normal distribution with mean P and standard deviation
12
Confidence Interval for a Single Proportion
Choose a SRS of size n from a large population with unknown proportion p of successes. The Wilson estimate of the population proportion is The standard error of is An approximate Level C confidence interval for P is Where z* is the value for the standard Normal density curve with C area between –z* and z*. Use this interval when sample size is at least n = 5 and the confidence level is 90% or more.
13
Example: estimating the effect of work stress
The sample survey in previous example found that 68 out of 100 employees agreed that work stress had a negative impact on their personal lives.The sample size is n = 100 and the count of successes is X = 68. The Wilson estimate of the proportion of all employees affected by work stress is The standard error is
14
Example: estimating the effect of work stress
The z critical value for 95% confidence is z* = 1.96, so the confidence interval is We are 95% confident that between 58.3% and 76.3% of the restaurant chain’s employees feel that work stress is damaging their personal lives.
15
Significance Test for a Single Proportion
The sample proportion is approximately Normal with mean and standard deviation For confidence interval we used the Wilson estimate and estimated the standard deviation from the data. When performing significance test, the null hypothesis specifies a value for p which we call p0. We assume the hypothesized p were actually true and substitute p0 for p in the expression for and then standardize
16
Significance Test for a Single Proportion
17
Example: Work stress A national survey of restaurant employees found that 75% said that work stress had a negative impact on their personal lives. A sample of 100 employees of a restaurant chain found that 68 answered “Yes” when asked, “does work stress have a negative impact on your personal life?” Is this good reason to think that the proportion of all employees of this chain who say “Yes” differs from the national proportion p0 = 0.75?
18
Example: Work stress To answer this question, we test H0: p = 0.75
Ha: P 0.75 The expected number of “Yes” and “No” responses are 100 0.75 = 75 and 1000.25 = 25 Both are greater than 10 , so we can use z test. Test statistic is
19
Example: Work stress From table A we find The P-value is
We conclude that the chain restaurant data are compatible with the survey results.
20
Choosing a Sample Size We want to see how to choose the sample size n to obtain a confidence interval with specified margin of error m for a population proportion. The margin of error for the confidence interval for a population proportion is: Choosing a confidence level C fixes the critical value z*.
21
Choosing a Sample Size The margin of error also depends on the the value of and the sample size n. We don’t know the value of until we gather data, therefore we must guess a value to use in the calculations. Let’s call the guess value p*. There are two ways to get p*. Use sample estimate from a pilot study or from similar studies done earlier. Use p* = 0.5. Because the margin of error is largest when , this choice gives a sample size that is somewhat larger than we really need for the confidence level we choose. It is a safe choice no matter what the data later show. Once we have chosen p* and the margin of error m that we want, we can find n we need to achieve this margin of error.
22
Choosing a Sample Size The level C confidence interval for a proportion p will have a margin of error approximately equal to a specified value m when the sample size satisfies Here z* is the critical value for confidence C, and p* is a guessed value for the proportion of successes in the future sample. The margin of error will be less than or equal to m if p* is chosen to be 0.5. The sample size required is then given by
23
Example: Planning a sample of customers
Your company has received complaints about its customer support service. You intend to hire a consulting company to carry out a sample survey of customers. Before contacting the consultant, you want some idea of the sample size you will have to pay for. One critical question is the degree of satisfaction with your customer service, measured on a five-point scale. You want to estimate the proportion P of your customers who are satisfied (That is , who choose either “satisfied” or “very satisfied,” the two highest levels on the five point scale).
24
Example: Planning a sample of customers
You want to estimate P with 95% confidence and a margin of error less than or equal to 3%. For planning purposes, you are willing to use p* = 0.5. The sample size required is: Round up to get n+4 = 1068 or n = 1064 (Always round up. Rounding down would give a margin of error slightly greater than 0.03.) Similarly for a 2.5% margin of error we have (after rounding up)
25
Comparing Two Proportions
We often want to compare the proportions of two groups (such as men and women) that have some characteristics. We call the two groups being compared Population 1 and population 2. The two population proportions of “Successes” P1 and P2. The data consist of two independent SRS The sample sizes are n1 from population 1 and n2 from population 2.
26
Comparing Two Proportions
The proportion of successes in each sample estimates the corresponding population proportion. Here is the notation we will use population population Sample Count of Sample proportion size successes proportion 1 P n X1 2 P n X2
27
Sampling Distribution of
Choose independent SRS of sizes n1 and n2 from two populations with proportions P1 and P2 of successes. Let be the difference between the two sample proportions of successes. Then as both sample sizes increase, the sampling distribution of D becomes approximately Normal. The mean of the sampling distribution is . The standard deviation of the sampling distribution is Since the two samples are independent we can apply the rules for means and variances of sums of random variables.
28
Sampling Distribution of
The sampling distribution of the difference of two sample proportions is approximately Normal. The mean and standard deviation are found from the two population proportions of successes, P1 and P2
29
Confidence Interval Just as in the case of estimating a single proportion, a small modification of the sample proportions greatly improves the accuracy of confidence intervals. The Wilson estimates of the two population proportions are
30
Confidence Interval The standard deviation of is approximately
To obtain a confidence interval for P1-P2, we replace the unknown parameters in the standard deviation by estimates to obtain an estimated standard deviation, or standard error.
31
Confidence Interval for Comparing Two Proportions
32
Example:”No Sweat” Garment Labels
Following complaints about the working conditions in some apparel factories both in the United States and Abroad, a joint government and industry commission recommended in 1998 that companies that monitor and enforce proper standards be allowed to display a “No Sweat” label on their product. A survey of U.S. residents aged 18 or older asked a series of questions about how likely they would be to purchase a garment under various conditions.
33
Example:”No Sweat” Garment Labels
For some conditions, it was stated that the garment had a “No Sweat” label; for others, there was no mention of such label. On the basis of of the responses, each person was classified as a “label user” or “ a “label nonuser.” About 16.5% of those surveyed were label users. One purpose of the study was to describe the demographic characteristics of users and nonusers.
34
Example:”No Sweat” Garment Labels
The study suggested that there is a gender difference in the proportion of label users. Here is a summary of the data. Let X denote the number of label users. population n X 1 (women) 2 (men)
35
Example:”No Sweat” Garment Labels
First calculate the standard error of the observed difference. The 95% confidence interval is
36
Example:”No Sweat” Garment Labels
With 95% confidence we can say that the difference in the proportions is between 0.04 and 0.16. Alternatively, we can report that the women are about 10% more likely to be label users than men, with a 95% margin of error of 6%. In this example we chose women to be the first population. Had we chosen men as the first population, the estimate of the difference would be negative (-0.104). Because it is easier to discuss positive numbers, we generally choose the first population to be the one with the higher proportion. The choice does not affect the substance of the analysis.
37
Significance Tests It is sometimes useful to test the null hypothesis that the two population proportions are the same. We standardize by subtracting its mean P1-P2 and then dividing by its standard deviation If n1 and n2 are large, the standardized difference is approximately N(0, 1). To estimate D we take into account the null hypothesis that P1 = P2.
38
Significance Tests If these two proportions are equal, we can view all of the data as coming from a single population. Let P denote the common value of P1 and P2. The standard deviation of is then
39
Significance Tests We estimate the common value of P by the overall proportion of successes in the two samples. This estimate of P is called the pooled estimate. To estimate the standard deviation of D, substitute for P in the expression for DP. The result is a standard error for D under the condition that the null hypothesis H0: P1 = P1 is true. The test statistic uses this standard error to standardize the difference between the two sample proportions.
40
Significance Tests for Comparing Two Proportions
41
Example:men, women, and garment labels.
The previous example presented the survey data on whether consumers are “label users” who pay attention to label details when buying a shirt. Are men and women equally likely to be label users? Here is the data summary: Population n X 1 (women) 2 (men)
42
Example:men, women, and garment labels
We compare the proportions of label users in the two populations (women and men) by testing the hypotheses H0:P1= P2 Ha:P1 P2 The pooled estimate of the common value of P is: This is the proportion of label users in the entire sample.
43
Example:men, women, and garment labels
The test statistic is calculated as follows: The observed difference is more than 3 standard deviation away from zero.
44
Example:men, women, and garment labels
The P-value is: Conclusion: 21% of women are label users versus only 11% of men; the difference is statistically significant.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.