1 Sampling Distributions Presentation 2 Sampling Distribution of sample proportions Sampling Distribution of sample means
2 Statistics VS Parameters Statistic – is a numerical value computed from a sample. Parameter – is a numerical value associated with a population. Essentially, we would like to know the parameter. But in most cases it is hard to know the parameter since the population is too large. So we have to estimate the parameter by some proper statistics computed from the sample.
3 Some Notation p = population proportion p = population proportion = sample proportion = sample proportion μ = population mean μ = population mean = sample mean = sample mean σ = standard deviation σ = standard deviation s = sample standard deviation s = sample standard deviation
4 A.Sampling Distribution of the Sample Proportion Situation 1: A survey is undertaken to determine the proportion of PSU students who engage in under-age drinking. The survey asks 200 random under-age students (assume no problems with bias). Suppose the true population proportion of those who drink is 60%. Thus, p = 0.6 and is the proportion in the sample who drink. Thus, p = 0.6 and is the proportion in the sample who drink.
5 Repeated Samples Imagine repeating this survey many times, and each time we record the sample proportion of those who have engaged in under-age drinking. What would the sampling distribution of look like? Sample (n=200) Sample Proportion Sample Proportion …… 150, , ,000 is a random variable assigning a value to each sample!
6 Histogram of for samples.
7 Sampling Distribution of Let X be the number of respondents who say they engage in under age drinking. Let X be the number of respondents who say they engage in under age drinking. X is binomial with n =200 and p =0.6. X is binomial with n =200 and p =0.6. So, we can calculate the probability of X for each possible outcome (0-200). The PDF is plotted below: So, we can calculate the probability of X for each possible outcome (0-200). The PDF is plotted below:
8 Sampling Distribution of Since X ~Bin (n =200, p =0.6), the sampling distribution of is the same as that of the binomial distribution divided by n. Since X ~Bin (n =200, p =0.6), the sampling distribution of is the same as that of the binomial distribution divided by n. Therefore we have Therefore we have
9 Sampling Distribution of - Cont. Using the Normal approximation to the binomial distribution we have that the sampling distribution of is approximately Normal with mean p and std. dev. Using the Normal approximation to the binomial distribution we have that the sampling distribution of is approximately Normal with mean p and std. dev. i.e. i.e. The conditions for this approximation to be valid are: The conditions for this approximation to be valid are: 1. The sample selected from the population is random. 2. The sample must be large enough, np and n(1-p) MUST be greater than 5, and should be greater than 10.
10 Example: Recent studies have shown that about 20% of American adults fit the medical definition of being obese. Recent studies have shown that about 20% of American adults fit the medical definition of being obese. A large medical clinic would like to estimate what percent of their patients are obese, so they take a random sample of 100 patients and find that 18 percent are obese. A large medical clinic would like to estimate what percent of their patients are obese, so they take a random sample of 100 patients and find that 18 percent are obese. Suppose in truth, the same percentage holds for the patients of the medical clinic as for the general population, 20%. Suppose in truth, the same percentage holds for the patients of the medical clinic as for the general population, 20%. Give notation and the numerical value for the following. Give notation and the numerical value for the following.
11 Problem - Cont. a.The population proportion of obese patients in the medical clinic: b.The proportion of obese patients in the sample of 100 patients: c.The mean of the sampling distribution of : d.The standard deviation of the sampling distribution of : e.The variance of the sampling distribution of :
12 B. Sampling Distribution of the Sample Mean Situation 2: The mean height of women age 20 to 30, X, is normally distributed (bell-shaped) with a mean of 65 inches and a standard deviation of 3 inches. i.e. X ~N(65,9) A random sample of 200 women was taken and the sample mean recorded. A random sample of 200 women was taken and the sample mean recorded. Now IMAGINE taking MANY samples of size 200 from the population of women. For each sample we record the. What is the sampling distribution of ?
13 Histograms for the Distribution of X and X -Bar Original Population of Women: X= height of random woman Distribution of Sample Means: X-bar = mean of random sample of size 200.
14 Normal Data Consider a Normal random variable X with mean μ and standard deviation σ, Consider a Normal random variable X with mean μ and standard deviation σ, X ~N( μ, σ 2 ). The sampling distribution of the sample mean of X for a sample of size n is Normal with The sampling distribution of the sample mean of X for a sample of size n is Normal withi.e.
15 Skewed or Non-Normal Data Situation 3: In a college survey, students were asked to report the number of cd’s they own. Clearly CDs is a right skewed data set. Suppose our population looked something like this, let us take repeated samples from this population and see what the sample mean looks like.
16 Suppose we take repeated samples of size n = 4, 8, 16, 32 n = 4 n = 32n = 16 n = 8
17 Statistics From Skewed Data Using that CD sample as the population, Using that CD sample as the population, µ = 87.6, σ = 87.8 µ = 87.6, σ = 87.8 The sample means from the previous slide had the following summary statistics: The sample means from the previous slide had the following summary statistics: Sample Size Mean of X-bar Std. Dev. of X-bar n = n = n = 16 n = n = 32 n = Note: that the mean remains constant, and the std. deviation decreases as the sample size increases!
18 Central Limit Theorem For non-normal data coming from a population with mean µ and standard deviation σ the sampling distribution of the sample mean is approximately normal with For non-normal data coming from a population with mean µ and standard deviation σ the sampling distribution of the sample mean is approximately normal with Conditions: The above is true if the sample size is large enough, usually n > 30 is sufficient.
19 What next? We have shown that both the sampling distribution of the sample proportion, and the sampling distribution of the sample mean are both normal under certain conditions. We have shown that both the sampling distribution of the sample proportion, and the sampling distribution of the sample mean are both normal under certain conditions. Now we can use what we know about normal distributions to make conclusions about and ! Now we can use what we know about normal distributions to make conclusions about and ! In the following we will see how to use the values of the statistics (p-hat, x-bar) to make inferences about the parameters (p, µ). In the following we will see how to use the values of the statistics (p-hat, x-bar) to make inferences about the parameters (p, µ).
20 Exercise 1 The population proportion is Consider the following questions. The population proportion is Consider the following questions. 1. Find the sampling distribution of p-hat for each of the following sample sizes n=100, n=200, n= What is the probability that a sample proportion will be within ±.04 of the population proportion for each of these sample sizes? 3. What is the advantage of larger sample size?
21 Exercise 2 A certain antibiotic in known to cure 85% of strep bacteria infections. A scientist wants to make sure the drug does not lose its potency over time. He treats 100 strep patients with a 1 year old supply of the antibiotic. Let be the proportion of individuals who are cured. A certain antibiotic in known to cure 85% of strep bacteria infections. A scientist wants to make sure the drug does not lose its potency over time. He treats 100 strep patients with a 1 year old supply of the antibiotic. Let be the proportion of individuals who are cured. ASSUME the drug has NOT lost potency, answer the following questions… 1.What is the sampling distribution of ? Draw a picture 2.If we repeated this study many times we would expect 95% of to fall within what interval? 3.What is the probability that more than 90% in the sample are cured?
22 Exercise 3 A newspaper conducts a poll to determine the proportion of adults who favor a certain candidate. They ask a random sample of 800 people whether or not they favor that candidate (Assume no bias!). Suppose the true proportion of adults who favor the candidate is 58%. A newspaper conducts a poll to determine the proportion of adults who favor a certain candidate. They ask a random sample of 800 people whether or not they favor that candidate (Assume no bias!). Suppose the true proportion of adults who favor the candidate is 58%. 1. The newspaper records the sample proportion who favor the candidate. What is the sampling distribution of the sample proportion? Draw a picture of its PDF (center it correctly and include the appropriate scale). 2. What is the probability that the newspaper would have recorded a sample proportion greater than 62%? 3. What is the probability that less than 50% of the newspaper respondents would support this candidate? 4. What is the probability that a randomly selected individual favors this candidate?
23 Exercise 4 Suppose the number of calories FIT students consume in a day is normally distributed with mean 2000 and standard deviation 300. Suppose the number of calories FIT students consume in a day is normally distributed with mean 2000 and standard deviation About 95% of PSU students have a daily caloric intake between what two values? 2. What is the probability that a randomly selected individual consumed between 1800 and 2100 calories yesterday? 3. Suppose I take a random sample of 36 students and recorded the number of calories each consumed on a given day. Describe the sampling distribution of the sample mean. 4. Draw a picture of the sampling distribution of the sample mean (center it correctly and include the appropriate scale). 5. If I take a sample of size 36 from the student body, what is the probability that the sample mean will be less than 2050?
24 Exercise 5 Assume the length of trout living in the Susquehanna River is normally distributed with mean of 14 inches and standard deviation of 2 inches. A random sample of 16 trout is taken from the river. Assume the length of trout living in the Susquehanna River is normally distributed with mean of 14 inches and standard deviation of 2 inches. A random sample of 16 trout is taken from the river. 1. What is the sampling distribution of the average trout length (i) in a sample of size 16 (ii) in a sample of size 100? 2. What happens to the sampling distribution of the sample mean as the sample size increases? (Draw a picture) 3. What is the probability that a random sample of 16 trout will provide a sample mean within one in of the population mean? 4. What is the probability that a random sample of 100 trout will provide a sample mean within one in of the population mean? 5. What is the advantage of a larger sample size when one is attempting to estimate the population mean?