Sample Surveys Chapter 12.

Sample Surveys Chapter 12

Objectives Population Sample Sample survey Bias Randomization
Sample size Census Parameter Statistic Simple random sample Sampling frame Stratified random sample Cluster sample Multistage sample Systematic sample Pilot Voluntary response bias Convenience sample Undercoverage Nonreponse bias Response bias Sampling variability

Data Collection The quality of the results obtained from any statistical method is only as good as the data used. The reliability and accuracy of the data affect the validity of the results of a statistical analysis. The reliability and accuracy of the data depend on the method of collection Conclusion: “Garbage in, means garbage out”

Background We have learned ways to display, describe, and summarize data, but have been limited to examining the particular batch of data we have. To make decisions, we need to go beyond the data at hand and to the world at large. Let’s investigate three major ideas that will allow us to make this stretch…

Idea 1: Examine a Part of the Whole
The first idea is to draw a sample. We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. We settle for examining a smaller group of individuals—a sample—selected from the population.

Population and Sample Population: The collection of all individuals or items under consideration in a statistical study. The population is determined by what we want to know. Sample: That part of the population from which information is obtained. The sample is determined by what is practical and should be representative of the population.

Example: Population vs. Sample
If we have data on all the individuals who have climbed Mt. Everest, then we have population data. On the other hand, if our data come from some of the climbers, we have sample data.

Idea 1: Examine a Part of the Whole
Sampling is a natural thing to do. Think about sampling something you are cooking—you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole.

Idea 1: Examine Part of the Whole
Opinion polls are examples of sample surveys, designed to ask questions of a small group of people in the hope of learning something about the entire population. Professional pollsters work quite hard to ensure that the sample they take is representative of the population. If not, the sample can give misleading information about the population.

Sample Sample Surveys solicit information from individuals.
Types of sample surveys; opinion polls by Personal interview Telephone interview Questionnaire by mail or internet

Bias Definition: Any systematic failure of a sample to represent its population. Sampling methods that, by their nature, tend to over- or under- emphasize some characteristics of the population are said to be biased. Bias is the bane of sampling—the one thing above all to avoid. There is usually no way to fix a biased sample and no way to salvage useful information from it. The best way to avoid bias is to select individuals for the sample at random. The value of deliberately introducing randomness is one of the great insights of Statistics.

Types of Bias Undercoverage Voluntary Response Bias Convenience Sample
Nonresponse Bias Response Bias

Undercoverage A sampling scheme that fails to sample part of the population or that gives a part of the population less representation than it has in the population suffers from undercoverage. A classic example of undercoverage is the Literary Digest voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats. Undercoverage is often a problem with convenience samples.

Example: Literary Digest Poll
1936 presidential election Literary Digest magazine poll. The survey team asked a sample of the voting population whether they would vote for Franklin D. Roosevelt, the democratic candidate or Alfred Landon, the republican candidate. Based on the results, the magazine predicted an easy win for Landon.

Result When the actual results were in, Roosevelt won by a landslide.
What happened? The sample was obtained from among people who owned a car or had a telephone. In 1936, that group included mostly rich people and they historically voted republican. The response rate was low, less than 25% of those polled responded. A disproportionate number of those responding were Landon supporters. Whatever the reason for the poll’s failure, the sample was not representative of the population.

Voluntary Response Bias
When choice rather than randomization is used to obtain a sample, the sample suffers from voluntary response bias. Voluntary response bias occurs when sample members are self-selected volunteers. An example would be call-in radio shows that solicit audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.). The resulting sample tends to over represent individuals who have strong opinions.

Convenience Sample Is obtained exactly as its name suggests, by sampling individuals who are conveniently available. Convenience samples are often not representative of the population of interest because each individual in the population is not equally convenient to sample. The classic example of a convenience sample is standing at a shopping mall and selecting shoppers as they walk by to fill out a survey.

Nonresponse Bias Occurs in a sample design when individuals selected for the sample fail to respond, cannot be contacted, or decline to participate. A common problem with mail surveys. Response rate is often low (5% - 30%), making mail surveys vulnerable to nonresponse bias.

Response Bias Anything in a survey that influences responses falls under the heading of response bias. Examples are biased wording of survey questions, lack of privacy while being surveyed, and appearance of the interviewer. Both Question Bias and Interviewer Bias are examples of response bias.

Response Bias - Question Bias
Wording of the questions or the questions themselves lead to bias. People often don’t want to be perceived as having unpopular or unsavory views and so may not respond truthfully. Example: Given that the threat of nuclear war is higher now than it has ever been in human history, and the fact that a nuclear war poses a threat to the very existence of the human race, would you favor an all-out nuclear test ban? Question is biased in favor of a nuclear test ban.

Response Bias - Question Bias

Response Bias - Interviewer Bias
The sex, age, race, dress, attitude, or actions of the interviewer and how the interviewer asks the questions have an influence on the way a subject responds. Example: A male interviewer asking sex related questions to women. To prevent this, interviewers must be trained to remain neutral throughout the interview. They must also pay close attention to the way they ask each question. If an interviewer changes the way a question is worded, it may impact the respondent's answer.

Idea 2: Randomize Randomization can protect you against factors that you know are in the data. It can also help protect against factors you are not even aware of. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. Randomizing makes sure that on the average the sample looks like the rest of the population.

Randomizing Not only does randomizing protect us from bias, it actually makes it possible for us to draw inferences about the population when we see only a sample. Such inferences are among the most powerful things we can do with Statistics. But remember, it’s all made possible because we deliberately choose things randomly.

Idea 3: It’s the Sample Size
How large a random sample do we need for the sample to be reasonably representative of the population? It’s the size of the sample, not the size of the population, that makes the difference in sampling. Exception: If the population is small enough and the sample is more than 10% of the whole population, the population size can matter. The fraction of the population that you’ve sampled doesn’t matter. It’s the sample size itself that’s important.

Sample Size Sample Size
Is the number of individuals selected from our population. The size of the population does not dictate the size of the sample. A sample of size 100 may work equally well for a population of 1000 or 10,000 as long as it is a random sample of the population of interest. Example: A ladle of soup gives us the same information regarding the seasoning of the soup regardless of the size of the pot it is taken from as long as the pot is well stirred (random samples). The general rule is that the sample size should be no more than 10% of the population size.

Does a Census Make Sense?
Why bother determining the right sample size? Wouldn’t it be better to just include everyone and “sample” the entire population? Such a special sample is called a census. Often includes a collection of related demographic information (age, race, gender, occupation, income, etc.). Definition: A sample that consists of the entire population (tries to count every individual). Example: US census – an official, periodic (every 10 years) inventory of the entire population of the US.

Does a Census Make Sense?
There are problems with taking a census: It can be difficult to complete a census—there always seem to be some individuals who are hard (or expensive) to locate or hard to measure; or it may be impractical - food. Populations rarely stand still. Even if you could take a census, the population changes while you work, so it’s never possible to get a perfect measure. Taking a census may be more complex than sampling.

Populations and Parameters
Models use mathematics to represent reality. Parameters are the key numbers in those models. A parameter that is part of a model for a population is called a population parameter. Rarely know the true value of a population parameter; we estimate it from sampled data. We use data to estimate population parameters. Any summary found from the data (sample) is a statistic. The statistics that estimate population parameters are called sample statistics.

Populations and Parameters

Notation We typically use Greek letters to denote parameters and Latin letters to denote statistics.

Simple Random Samples We draw samples because we can’t work with the entire population. We need to be sure that the statistics we compute from the sample reflect the corresponding parameters accurately. A sample that does this is said to be representative.

Simple Random Samples We will insist that every possible sample of the size we plan to draw has an equal chance to be selected. Such samples also guarantee that each individual has an equal chance of being selected. With this method each combination of people has an equal chance of being selected as well. A sample drawn in this way is called a Simple Random Sample (SRS). An SRS is the standard against which we measure other sampling methods, and the sampling method on which the theory of working with sampled data is based.

Simple Random Samples Requirements for Simple Random Sample (SRS)
Every sample of size n from the population has an equal chance of being selected and Every member of the population has an equal chance of being included in the sample. The preferred method – probability is the highest that the sample is representative of the population than for any other sampling method. Least chance of sample bias.

Simple Random Samples To select a sample at random, we first need to define where the sample will come from. The sampling frame is a list of individuals from which the sample is drawn. If the sampling frame is not equal to the population of interest and is different from the population in some way that may affect the response variable, the sample will be biased. Example: If we are interested in obtaining information about H.S. students in Florida but obtain our sample of students from a list of private schools, then our sampling frame is not reflective of the population of interest nor is our sample.

Methods of SRS Place names (population) in a hat and draw out a handful (sample). Computer/TI-83 software. Table of random digits A long string of the digits 0,1,2,3,4,5,6,7,8,9 with these two properties Each entry in the table is equally likely to be any of the ten digits 0 through 9. The entries are independent of each other, that is, knowledge of one part of the table gives no information about any other part.

Choosing a SRS Once we have our sampling frame, the easiest way to choose an SRS is to assign a random number to each individual in the sampling frame. Label Assign a numerical label to every individual in the population. Use as few digits (digit group) as possible. Table Enter Random Digit Table at any line. Use Random Digit Table to select digit groups at random and thereby select the sample.

SRS Example Use a random digit table to pick a random sample of 30 cars from a population of 500 cars. Label - Assign each car a different number from 001 to 500 (3 digit group). Table – Enter Table B on line 108 (can begin anywhere) and regroup the digits in groups of 3 (because our labels have 3 digits). Then select the sample.

SRS Example Select the first 30 digit groups that are within the range of your labels to make up the SRS. SRS – 407, 202, 417, 249, 436, 179, 090, 336, 009, 193, 239, etc.

Your Turn: Suppose 80 students are taking an AP Statistics course and the teacher wants to randomly pick out a sample of 10 students to try out a practice exam. Select a SRS of 10 students. Solve – use the following Table B beginning at line 108

Solution Label – Assign the students numbers 01 – 80.
Table – select the first 10 digit groups between 01 and 80, ignoring repeats, to make up the sample. The sample – 60, 07, 20, 24, 17, 49, 43, 61, 79, 09

TI-83/84 Random Digits Use RANDINT function (MATH/PRB/5:RANDINT)
RANDINT(lower limit, upper limit, number of digits) RANDINT(0,9,5) – generates 5 random integers between 0-9. RANDINT(1,6,7) – generates 7 random integers between 1-6, simulate rolling die 7 times. RANDINT(0,99,10) – generates 10 two digit numbers from RAND, sets TI-83 to the same random digits.

Simple Random Samples Samples drawn at random generally differ from one another. Each draw of random numbers selects different people for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample differences sampling variability.

Sampling Variability Sampling Variability
Is the natural tendency of randomly drawn samples to differ, one from another. Sampling variability is not an error, just the natural result of random sampling. Statistics attempts to minimize, control, and understand variability so that informed decisions can drawn from the data despite their variation. Although samples vary, when we use chance to select them, they do not vary haphazardly but rather according to the laws of probability.

Example: Sample Variability
Each of four major news organizations surveys likely voters and separately reports that the percentage favoring the incumbent candidate is 53.5%, 54.1%, 52%, and 54.2%, respectively. What is the correct percentage? Did three or more of the news organizations make a mistake?

Solution There is no way of knowing the correct population percentage from the information given. The four surveys led to four statistics, each an estimate of the population parameter. No one made a mistake unless there was a bad survey. Sampling variation is natural.

Other Sampling Designs

Sampling Designs The sampling design is the method used to chose the sample. All statistical sampling designs incorporate the idea that chance (randomness), rather than choice, is used to select the sample. The value of deliberately introducing randomness is one of the great insights of Statistics. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. It does that by making sure that on the average the sample looks like the rest of the population.

Sampling Designs Stratified Sampling Cluster Sampling
Multistage Sampling Systematic Sampling

Stratified Sampling Simple random sampling is not the only fair way to sample. More complicated designs may save time or money or help avoid sampling problems. All statistical sampling designs have in common the idea that chance, rather than human choice, is used to select the sample.

Stratified Sampling Designs used to sample from large populations are often more complicated than simple random samples. Sometimes the population is first sliced into homogeneous groups, called strata, before the sample is selected. Then simple random sampling is used within each stratum before the results are combined. This common sampling design is called stratified random sampling.

Stratified Sampling Summary – Stratified Random Sampling
Is a sampling method in which the population is first broken up into homogeneous groups called strata. These strata are made up of individuals similar in some way that may affect the response variable. Simple random sampling is then used within each stratum before the results are combined.

Stratified Sampling With this procedure we can acquire information about the whole population each stratum the relationships among strata. Examples of strata Sex Male Female Age under 20 20-30 31-40 41-50 Occupation professional clerical blue-collar

Stratified Sampling There are several ways to build a stratified sample. For example, keep the proportion of each stratum in the population. Stratum Income Population proportion under $15, % ,000-29, % ,000 30% 300 4 over $50, % 50 Stratum size

Stratified Sampling Example: Suppose a TV station is interested in obtaining information from its viewers regarding the events they are most likely to watch during their coverage of the Olympics. Since men and women may differ significantly in their choice of events, a sample that stratifies by gender can help reduce variation in the results.

Stratified Sampling

Stratified Sampling The most important benefit is Stratifying can reduce the variability of our results. When we restrict by strata, additional samples are more like one another, so statistics calculated for the sampled values will vary less from one sample to another. Stratified random sampling can reduce bias. Stratified sampling can also help us notice important differences among groups.

Cluster Sampling Sometimes stratifying isn’t practical and simple random sampling is difficult. Splitting the population into similar parts or clusters can make sampling more practical. Then we could select one or a few clusters at random and perform a census within each of them. This sampling design is called cluster sampling. If each cluster fairly represents the full population, cluster sampling will give us an unbiased sample.

Cluster Sampling Summary – Cluster Sampling
Divide the population into heterogeneous groups called clusters. Take an SRS of some of the clusters. Every member of the cluster is included in the sample. Usually used to reduce the cost of obtaining a sample. Extensively used by government agencies and certain private research organizations.

Cluster Sampling Example: In conducting a survey of school children in a large city, we could first randomly select 5 schools and then include all the children from each selected school. Although cluster sampling can save time and money, it does have disadvantages. Ideally, each cluster should mirror the entire population. However, that is often not the case, as members of a cluster are frequently more homogeneous than the members of the population as a whole.

Cluster Sampling

Cluster Sampling Cluster sampling is not the same as stratified sampling. We stratify to ensure that our sample represents different groups in the population, and sample randomly within each stratum. Strata are internally homogeneous, but differ from one another. Clusters are more or less alike, are internally heterogeneous and each resembling the overall population. We select clusters to make sampling more practical or affordable.

Multistage Sampling Sometimes we use a variety of sampling methods together. Sampling schemes that combine several methods are called multistage samples. Most surveys conducted by professional polling organizations use some combination of stratified and cluster sampling as well as simple random sampling.

Multistage Sampling Example: A national polling service may stratify the country by geographical regions, select a random sample of cities from each region, and then interview a cluster of residents in each city.

Systematic Samples Sometimes we draw a sample by selecting individuals systematically. For example, you might survey every 10th person on an alphabetical list of students. To make it random, you must still start the systematic selection from a randomly selected individual. When there is no reason to believe that the order of the list could be associated in any way with the responses sought, systematic sampling can give a representative sample.

Systematic Samples Method of sampling in which the sample is selected in some predetermined way. For example, we may obtain a list of our population of interest and from that list choose every fifth individual to be part of the sample. Although each individual has an equal chance of being chosen, this method is not a SRS because each possible sample of size n individuals does not have an equal chance of being chosen.

Systematic Samples Example: If we are choosing a sample of 30 students from the 300 students in the senior class by selecting every 10th student from the alphabetical directory, the first 30 students on the list will never all be chosen as the sample group. Easier to execute than SRS. Usually provides results comparable to SRS.

Systematic Samples

Systematic Samples Systematic sampling can be much less expensive than true random sampling. When you use a systematic sample, you need to justify the assumption that the systematic method is not associated with any of the measured variables.

Defining the “Who” The Who of a survey can refer to different groups, and the resulting ambiguity can tell you a lot about the success of a study. To start, think about the population of interest. Often, you’ll find that this is not really a well-defined group. Even if the population is clear, it may not be a practical group to study.

Defining the “Who” Second, you must specify the sampling frame.
Usually, the sampling frame is not the group you really want to know about. The sampling frame limits what your survey can find out.

Defining the “Who” Then there’s your target sample.
These are the individuals for whom you intend to measure responses. You’re not likely to get responses from all of them. Nonresponse is a problem in many surveys.

Defining the “Who” Finally, there is your sample—the actual respondents. These are the individuals about whom you do get data and can draw conclusions. Unfortunately, they might not be representative of the sample, the sampling frame, or the population.

Defining the “Who” At each step, the group we can study may be constrained further. The Who keeps changing, and each constraint can introduce biases. A careful study should address the question of how well each group matches the population of interest.

Defining the “Who” One of the main benefits of simple random sampling is that it never loses its sense of who’s Who. The Who in a SRS is the population of interest from which we’ve drawn a representative sample. (That’s not always true for other kinds of samples.)

The Valid Survey It isn’t sufficient to just draw a sample and start asking questions. A valid survey yields the information we are seeking about thepopulation we are interested in. Before you set out to survey, ask yourself: What do I want to know? Am I asking the right respondents? Am I asking the right questions? What would I do with the answers if I had them; would they address the things I want to know?

The Valid Survey These questions may sound obvious, but there are a number of pitfalls to avoid. Know what you want to know. Understand what you hope to learn and from whom you hope to learn it. Use the right frame. Be sure you have a suitable sampling frame. Tune your instrument. The survey instrument itself can be the source of errors - too long yields less responses.

The Valid Survey Ask specific rather than general questions.
Ask for quantitative results when possible. Be careful in phrasing questions. A respondent may not understand the question or may understand the question differently than the way the researcher intended it. Even subtle differences in phrasing can make a difference.

The Valid Survey Be careful in phrasing answers.
It’s often a better idea to offer choices rather than inviting a free response.

The Valid Survey The best way to protect a survey from unanticipated measurement errors is to perform a pilot survey. A pilot is a trial run of a survey you eventually plan to give to a larger group.

What Can Go Wrong?—or, How to Sample Badly
Sample Badly with Volunteers: In a voluntary response sample, a large group of individuals is invited to respond, and all who do respond are counted. Voluntary response samples are almost always biased, and so conclusions drawn from them are almost always wrong. Voluntary response samples are often biased toward those with strong opinions or those who are strongly motivated. Since the sample is not representative, the resulting voluntary response bias invalidates the survey.

Sample Badly, but Conveniently: In convenience sampling, we simply include the individuals who are convenient. Unfortunately, this group may not be representative of the population. Convenience sampling is not only a problem for students or other beginning samplers. In fact, it is a widespread problem in the business world—the easiest people for a company to sample are its own customers.

Sample from a Bad Sampling Frame: An SRS from an incomplete sampling frame introduces bias because the individuals included may differ from the ones not in the frame. Undercoverage: Many of these bad survey designs suffer from undercoverage, in which some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population. Undercoverage can arise for a number of reasons, but it’s always a potential source of bias.

What Else Can Go Wrong? Watch out for nonrespondents.
A common and serious potential source of bias for most surveys is nonresponse bias. No survey succeeds in getting responses from everyone. The problem is that those who don’t respond may differ from those who do. And they may differ on just the variables we care about.

Don’t bore respondents with surveys that go on and on and on and on…
What Else Can Go Wrong? Don’t bore respondents with surveys that go on and on and on and on… Surveys that are too long are more likely to be refused, reducing the response rate and biasing all the results.

Work hard to avoid influencing responses.
What Else Can Go Wrong? Work hard to avoid influencing responses. Response bias refers to anything in the survey design that influences the responses. For example, the wording of a question can influence the responses: Given the fact that those who understand Statistics are smarter and better looking than those who don’t, don’t you think it is important to take a course in Statistics?

How to Think About Biases
Look for biases in any survey you encounter before you collect the data—there’s no way to recover from a biased sample of a survey that asks biased questions. Spend your time and resources reducing biases. If you possibly can, pilot-test your survey. Always report your sampling methods in detail.

What have we learned? A representative sample can offer us important insights about populations. It’s the size of the sample, not its fraction of the larger population, that determines the precision of the statistics it yields. There are several ways to draw samples, all based on the power of randomness to make them representative of the population of interest: Simple Random Sample, Stratified Sample, Cluster Sample, Systematic Sample, Multistage Sample

What have we learned? Bias can destroy our ability to gain insights from our sample: Nonresponse bias can arise when sampled individuals will not or cannot respond. Response bias arises when respondents’ answers might be affected by external influences, such as question wording or interviewer behavior.

What have we learned? Bias can also arise from poor sampling methods:
Voluntary response samples are almost always biased and should be avoided and distrusted. Convenience samples are likely to be flawed for similar reasons. Even with a reasonable design, sample frames may not be representative. Undercoverage occurs when individuals from a subgroup of the population are selected less often than they should be.

What have we learned? Finally, we must look for biases in any survey we find and be sure to report our methods whenever we perform a survey so that others can evaluate the fairness and accuracy of our results.

Assignment Pg 288 – 291: #1, 3, 7 – 17,

Sample Surveys Chapter 12.

Similar presentations

Presentation on theme: "Sample Surveys Chapter 12."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sample Surveys Chapter 12.

Similar presentations

Presentation on theme: "Sample Surveys Chapter 12."— Presentation transcript:

Similar presentations

About project

Feedback