Sampling and Estimating Population Percentages and Averages Math 1680.

Sampling and Estimating Population Percentages and Averages Math 1680

Overview Introduction Picking a Sample Chance Errors in Sampling Estimating the Population Percentage Estimating the Population Average Summary

Introduction In real life, we often want to know about a population of individuals  Voters for an election  Families who watch TV  Students at a university Most of the time the population is far too big to be able to question directly

Introduction Instead, we take a sample of the population, determine the parameters of the sample, and then use this to infer the parameters of the population at large  Characteristics take the form of statistics Average height Median exam score Proportion of voters in favor of a candidate  Can adjust these statistics to apply to the population Which would better represent the population (all other things being equal), a large sample or a small sample?

Picking a Sample Survey tools include  Interview  Mail  Phone  Internet Sample types include  Convenience  Quota  Simple Random

Picking a Sample Generally, the best bet is to make the sample random  Allows the surveyor no choice in determining who will be interviewed  Provides the best defense against bias Simple random sampling is one way to do this  Draw randomly as if from a box

Picking a Sample Sources of bias  People who respond to voluntary surveys tend to have different parameters than people who don’t respond (volunteer bias)  People tend to exaggerate/round off things such as their weight or age when asked, depending on the context of the question  The wording of the questions or the nature of the interviewer can suggest certain types of answers from people (response bias)  Phone surveys in particular are susceptible to volunteer bias and also tend to over-represent the middle class In general, poor people and rich people are less likely to have listed numbers

Picking a Sample Give at least two types of bias (with examples) that one encounters when trying to conduct a phone survey by looking up numbers randomly in a phone book Many people have their numbers unlisted. Many people (especially the ones with Caller ID) will hang up the phone or choose not to answer, creating a non- response bias.

Picking a Sample In order to obtain a representative sample of 10,000 American citizens, a polling organization takes one morning to go door-to- door in the capital city of each state until they have enough people to get that state’s percentage out of the 10,000  Find at least two problems with this sampling procedure and justify why they are problems The capital city will not represent the entire state, especially rural states like Wyoming. Going in the morning will eliminate anyone who is working that morning.

Picking a Sample (Hypothetical) The teacher of a 400-person class wants to determine whether or not to curve his first exam. His exams are graded by four different teacher’s assistants. Since there are so many students, the teacher does not want to look through every exam himself. Instead, he pulls out the first 5 tests from each TA’s section and takes the average grade. He then makes a curve based on this average.  Give at least two possible confounding factors in this process. The TA’s may not have graded to the same level, and some TA’s may have sorted their test scores, either alphabetically or by grade. Especially in the later, the average from the first five tests will not be representative of all the students.

Chance Errors in Sampling Warning: The formulae and methods used in the rest of the lecture only apply to simple random samples  If the sample in question is not obtained by this method, using these formulae will give meaningless results!

Chance Errors in Sampling Suppose we are dealing with a population of 10,000 students, 5,300 of which are women and 4,700 of which are men  If we were to sample 100 people at random, how many would we expect to be women?  What percentage would we expect to be women? 53 53%

Chance Errors in Sampling Suppose we are dealing with a population of 10,000 students, 5,300 of which are women and 4,700 of which are men  If we were to sample 100 people, how far off of our expected value are we likely to be?  By what percentage are we likely to be off? ≈ 5 ≈ 5%

Chance Errors in Sampling The expected value for a percentage (%EV) in a sample size of n is the probability of drawing a “1” from the corresponding box

Chance Errors in Sampling The standard error for a percentage (%SE n ) in a sample size of n, if we draw with replacement, is the standard deviation of the box, divided by the square root of the number of draws Note that %SE n depends on n, while %EV does not!

Chance Errors in Sampling As sample size increases, %SE n goes to 0% Also, dividing the SD by the square root of the sample size is exact only when the draws are being made with replacement  Since in sampling we usually do not re-interview the people we sample, this formula is only an approximation  If the number of people we sample is small relative to the population size, then this approximation is good enough  If not, we need a correction factor

Chance Errors in Sampling Approximate %SE n and exact %SE n for the number of men in a sample of size n, for different sample sizes  Population (N) is 10,000 5,000 are men n%SE n (exact)%SE n (approximate) 1004.98%5% 9001.62%1.7% 1,0001.5%1.6% 6,4000.38%0.63%

Chance Errors in Sampling Suppose we are dealing with a population of 10,000 students, 5,300 of which are women and 4,700 of which are men  If we were to sample 100 people at random, what is the probability that we get between 45% and 50% men? ≈ 38%

Chance Errors in Sampling To verify that simple random sampling is accurate, we need to see that there is a high probability of being “close” to the %EV We can use the normal curve to answer this kind of question  Find the expected value and standard error for the percentage  Standardize the range in question according to the %EV and %SE n  Find the corresponding area under the curve by using the normal table

Chance Errors in Sampling According to the 2000 US Census, of Americans 25 years or older, 80.4% are high school graduates and 24.4% have at least a Bachelor’s degree. There were about 281 million Americans in 2000.  If one was to take a simple random sample of 900 Americans age 25 or more, what is the probability that the sample would contain less than 79% high school graduates? ≈ 14.5%

Chance Errors in Sampling According to the 2000 US Census, of Americans 25 years or older, 80.4% are high school graduates and 24.4% have at least a Bachelor’s degree. There were about 281 million Americans in 2000.  What is the probability that the sample would contain between 23% and 26% college graduates? ≈ 70%

Chance Errors in Sampling (Hypothetical) A polling organization wants to find out if a simple random sample really achieves the correct demographic proportions. They decide to conduct surveys in Dallas, where they know before-hand from the US Census that 50.4% of Dallas residents are male, and that 38.8% of families in Dallas have a married couple heading them. The population of Dallas at that time was 1,188,580.  If they take a simple random sample of 5,000 people, estimate the probability that their sample would contain less than 49% males ≈ 2.4%

Chance Errors in Sampling (Hypothetical) A polling organization wants to find out if a simple random sample really achieves the correct demographic proportions. They decide to conduct surveys in Dallas, where they know before-hand from the US Census that 50.4% of Dallas residents are male, and that 38.8% of families in Dallas have a married couple heading them. The population of Dallas at that time was 1,188,580.  If they take a simple random sample of 500,000 people estimate the probability that their sample would contain less than 49% males 0%

Chance Errors in Sampling (Hypothetical) A polling organization wants to find out if a simple random sample really achieves the correct demographic proportions. They decide to conduct surveys in Dallas, where they know before-hand from the US Census that 50.4% of Dallas residents are male, and that 38.8% of families in Dallas have a married couple heading them. The population of Dallas at that time was 1,188,580.  If they take a simple random sample of 10,000 people, estimate the probability that their sample would contain between 38.5% and 39.5% families headed by married couples ≈ 66%

Estimating the Population Percentage Often, we are interested in estimating a population’s percentage about some parameter  Parameters can be seen as answers to a yes/no question Do you favor this candidate? Do you smoke? To estimate the percentage, we start by taking a simple random sample and calculating its average and SD

Estimating the Population Percentage A simple random sample of 1,000 people is taken, of which 543 are Democrats  What percentage of the population do we expect to be Democrats?  How far off do we expect to be? The expected percentage in the population is just the percentage in the sample 54.3%

Estimating the Population Percentage Ideally, we would set up our box model for the entire population and calculate the SD, then divide by the square root of the sample size However, we don’t know the population SD  Instead, we just use the sample SD in its place, making the assumption that it should reflect the population SD When estimating %SE n for the expected percentage in the population, use the sample’s SD in your calculations  So the %SE in this case is… 1.58%

Estimating the Population Percentage In the previous example, we saw that we expected the population to be 54.3% Democrats, and we expected to be off by about 1.6% %EV is really a random variable  If the sample was large enough, we can assume %EV is approximately normal and say that the interval 54.3 ± 1.6% is the 68% confidence interval for the percentage of Democrats in the population  What would the 95% confidence interval be?  What would the 99% confidence interval be? 54.3 ± 3.2% 54.3 ± 4.8%

Estimating the Population Percentage Warning: If the sample percentage is very close to 0% or 100%, a very large sample is needed to use the normal approximation!  One way to check this is to calculate the sample SD If the SD is close to 50%, a small sample will allow for a normal approximation If the SD is well below 50%, very large samples are needed to use the normal approximation

Estimating the Population Percentage At this point, we are tempted to say that there is a 95% chance that the true percentage of Democrats falls between 51.1% and 57.5%  This is not so Remember that the true percentage of Democrats in the population is determined by the entire population Sample percentages are random numbers determined by the people we sample  It is correct to say that if we were to take 100 samples and calculate 100 different 95% confidence intervals, then about 95 of them should encompass the true percentage of Democrats

Estimating the Population Percentage Homer Simpson decides to run against Mayor Quimby for the leadership of Springfield  After a grueling campaign, Election Day finally arrives, and the early exit polls show that Homer has the votes of 58 out of the 100 people polled Find the 95% confidence interval for the percentage of votes for Homer  Can Homer break out the Duff champagne? 58% ± 9.88%, no

Estimating the Population Percentage Homer Simpson decides to run against Mayor Quimby for the leadership of Springfield  A few hours later, the exit polls show a turn for the worse for Homer, with only 485 out of 1,000 sampled voting for him Calculate the 95% confidence interval for the percentage of votes for Homer  Is he out of the running? 48.5% ± 3.16%, no

Estimating the Population Percentage Homer Simpson decides to run against Mayor Quimby for the leadership of Springfield  By midnight, 10,000 votes have been counted, and Homer has 5,204 of them One last time, calculate the 95% confidence interval for the percentage of votes for Homer  Has Quimby’s regime finally been toppled? 52.04% ± 1.0%, it appears so

Estimating the Population Average Often, we are interested in estimating a population’s average on some parameter  Height  IQ  Income As before, to estimate the average, we start by taking a simple random sample and calculating its average and SD

Estimating the Population Average The expected value for the average (mEV) in a sample size of n is the average of the corresponding box

Estimating the Population Average The standard error for the average (mSE n ) in a sample size of n, if we draw with replacement, is the standard deviation of the box, divided by the square root of the number of draws Note that mSE n depends on n, while mEV does not!

Estimating the Population Average Just as with percentages, we estimate the standard error for the population’s average by applying the sample’s SD in place of the population SD If our sample is large enough, we can assume the distribution on the sample average is approximately normal  This allows us to obtain confidence intervals for the population average

Estimating the Population Average (Hypothetical) A large company takes a simple random sample of 500 employees and asks them how long they have worked there  The employees averaged 4.2 years with an SD of 1.3 Give a 95% confidence interval for the average length of employment at the company (4.08 years, 4.32 years)

Estimating the Population Average (Hypothetical) Out of a county containing a college, a simple random sample of 1000 people is taken  From the sample, the average level of education (years of school completed, not counting Kindergarten) is 14 years, with an SD of 2 years Give an expected value and standard error for the average educational level of people in the county Have 68% of the county’s residents completed 14 ± 0.063 years of schooling? 14 ± 0.0632 years No, the SD for the population is 2, not 0.0632 years. On top of this, we can’t determine if the population’s education level is normally distributed.

Estimating the Population Average (Hypothetical) As a way of measuring the quality of English education in one high school, the senior class is required to take the ACT The English department wants to see an English ACT average of 25  Of the class, 125 tests are checked  The average English ACT score was a 25.4, with an SD of 2.2 Can you say with 99.7% confidence that the English department will get the average it desires? No, because the 99.7% confidence interval is (24.81, 25.99), which includes the goal of 25.

Summary When trying to determine characteristics of a large population, researchers use smaller samples to infer the population characteristics  Ideal samples are randomly selected  Simple random samples can be modeled with a box and ticket model A simple random sample will give an accurate representation of the population, provided that it is large enough

Summary We use a sample percentage/average to estimate the population percentage/average  We find the standard error by assuming that the sample SD in place of the population SD If the sample is large enough, we can assume the sample percentage/average as a random variable is approximately normal  We can calculate confidence intervals around our sample percentage/average to narrow down the location of the population percentage/average

Sampling and Estimating Population Percentages and Averages Math 1680.

Similar presentations

Presentation on theme: "Sampling and Estimating Population Percentages and Averages Math 1680."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sampling and Estimating Population Percentages and Averages Math 1680.

Similar presentations

Presentation on theme: "Sampling and Estimating Population Percentages and Averages Math 1680."— Presentation transcript:

Similar presentations

About project

Feedback