Excursions in Modern Mathematics, 7e: 13.2 - 2Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.

Slides:



Advertisements
Similar presentations
Where do data come from and Why we don’t (always) trust statisticians.
Advertisements

Chapter 8: Mass Media and Public Opinion Section 2
By:Heather Gubser Objective 26i
Public Opinion Polling How Public Opinion is Measured (and Mismeasured)
Sampling.
Chapter 7: Data for Decisions Lesson Plan
Copyright © 2010 Pearson Education, Inc. Slide
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
OCR Nationals Level 3 Unit 3. March 2012 M Morison Know what is meant by ‘bias’ in a study Understand that bias needs to be eliminated from a study so.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Literary Digest Poll 1936 election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called the election since 1916 Sample size: 2.4 million!
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
§ Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is.
Chapter 12 Sample Surveys. At the end of this chapter, you should be able to Identify populations, samples, parameters and statistics for a given problem.
Chapter 12 Sample Surveys
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.
Sampling Methods.
LT 4.1—Sampling and Surveys Day 3 Notes--Bias
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health
SAMPLING:REQUIREMENTS OF A GOOD SAMPLE
C1, L2, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
What is statistics? Statistics is the science of dealing with data.
Famous Quotes There are three kinds of lies: lies, damned lies and statistics. Benjamin Disraeli Figures don’t lie; liars figure. Mark Twain Statistics.
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
SAMPLING Nuances of sample size determination Brett Oppegaard, Washington State University Vancouver Language, Texts and Technology, Spring 2011.
Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating.
Copyright © 2009 Pearson Education, Inc. Publishing as Longman. The 1936 Literary Digest Presidential Election Poll Case Study: Special Topic Lecture Chapter.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
7. Logic of Sampling Jin-Wan Seo, Professor Dept. of Public Administration, University of Incheon.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Chapter 1: The Nature of Statistics
American Government and Politics Today Chapter 6 Public Opinion and Political Socialization.
DATA COLLECTION METHODS Sampling
Designing Social Inquiry week 4 I36005 Soohyung Ahn Case Study 1936 PRESIDENTIAL ELECTION : Roosevelt VS Landon.
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Chapter 12 Sample Surveys *Sample *Bias *Randomizing *Sample Size.
Sampling Design Notes Pre-College Math.
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.
Chapter 12 Sample Surveys
Slide 12-1 Copyright © 2004 Pearson Education, Inc.
Chapter 2 Lesson 2.2a Collecting Data Sensibly 2.2: Sampling.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.1 Chapter Five Data Collection and Sampling.
STT 421 Day 7: September 28, 2015 September 28, 2015
Random Samples 12/5/2013. Readings Chapter 6 Foundations of Statistical Inference (Pollock) (pp )
Bias in Sampling. Definitions Bias = where the results of the sample are not representative of the population Three sources of Bias in Sampling –Sampling.
SECTION 4.1. INFERENCE The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population.
Statistics – OR 155 Section 1 J. S. Marron, Professor Department of Statistics and Operations Research.
 Elections: The voice of the people. › Frequently interpreted as voters acceptance or rejection of a party platform. › Affected by many factors and give.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling 13.3.
Bias in Survey Sampling. Bias Due to Unrepresentative Samples A good sample is representative. This means that each sample point represents the attributes.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Sample Surveys. Terminologies Investigators usually want to generalize about a class of individuals. This class is called the population. For example,
Ten percent of U. S. households contain 5 or more people
THE EFFECT OF SAMPLING BIAS ON BIG DATA BY USING THE READERS DIGEST POLL OF THE 1936 ELECTION AS A CASE STUDY, WE EXAMINE HOW THE SAMPLE OF DATA USED AFFECTS.
Sources of Error In Sampling
Sample Surveys.
Sampling.
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Bias On-Level Statistics.
Inference for Sampling
Sampling.
COLLECTING STATISTICAL DATA
Presentation transcript:

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling 13.3 Random Sampling 13.4Sampling: Terminology and Key Concepts 13.5The Capture-Recapture Method 13.6Clinical Studies

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The practical alternative to a census is to collect data only from some members of the population and use that data to draw conclusions and make inferences about the entire population. Statisticians call this approach a survey (or a poll when the data collection is done by asking questions). The subgroup chosen to provide the data is called the sample, and the act of selecting a sample is called sampling. A Survey

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Ideally, every member of the population should have an opportunity to be chosen as part of the sample, but this is possible only if we have a mechanism to identify each and every member of the population. In many situations this is impossible. Say we want to conduct a public opinion poll before an election. The population for the poll consists of all voters in the upcoming election, but how can we identify who is and is not going to vote ahead of time? We know who the registered voters are, but among this group there are still many nonvoters. A Survey

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The first important step in a survey is to distinguish the population for which the survey applies (the target population) and the actual subset of the population from which the sample will be drawn, called the sampling frame. The ideal scenario is when the sampling frame is the same as the target population–that would mean that every member of the target population is a candidate for the sample. When this is impossible (or impractical), an appropriate sampling frame must be chosen. A Survey

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. A CNN/USA Today/Gallup poll conducted right before the November 2, 2004, national election asked the following question: “If the election for Congress were being held today, which party’s candidate would you vote for in your congressional district, the Democratic Party’s candidate or the Republican Party’s candidate?” Example 13.5Sampling Frames Can Make a Difference

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. When the question was asked of 1866 registered voters nationwide, the results of the poll were 49% for the Democratic Party candidate, 47% for the Republican Party candidate, 4% undecided. When exactly the same question was asked of 1573 likely voters nationwide, the results of the poll were 50% for the Republican Party candidate, 47% for the Democratic Party candidate, 3% undecided. Example 13.5Sampling Frames Can Make a Difference

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Clearly, one of the two polls had to be wrong, because in the first poll the Democrats beat out the Republicans, whereas in the second poll it was the other way around. The only significant difference between the two polls was the choice of the sampling frame–in the first poll the sampling frame used was all registered voters, and in the second poll the sampling frame used was all likely voters. Example 13.5Sampling Frames Can Make a Difference

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Although neither one faithfully represents the target population of actual voters, using likely voters instead of registered voters for the sampling frame gives much more reliable data. (The second poll predicted very closely the average results of the 2004 congressional races across the nation.) So, why don’t all pre-election polls use likely voters as a sampling frame instead of registered voters? Example 13.5Sampling Frames Can Make a Difference

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The answer is economics. Registered voters are relatively easy to identify–every county registrar can produce an accurate list of registered voters. Not every registered voter votes, though, and it is much harder to identify those who are “likely” to vote. Typically, one has to look at demographic factors (age, ethnicity, etc.) as well as past voting behavior to figure out who is likely to vote and who isn’t. Doing that takes a lot more effort, time, and money. Example 13.5Sampling Frames Can Make a Difference

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The basic philosophy behind sampling is simple and well understood–if we have a sample that is “representative” of the entire population, then whatever we want to know about a population can be found out by getting the information from the sample. If we are to draw reliable data from a sample, we must (a) find a sample that is representative of the population, and (b) determine how big the sample should be. These two issues go hand in hand, and we will discuss them next. Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Sometimes a very small sample can be used to get reliable information about a population, no matter how large the population is. This is the case when the population is highly homogeneous. The more heterogeneous a population gets, the more difficult it is to find a representative sample. The difficulties can be well illustrated by taking a look at the history of public opinion polls. Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The U.S. presidential election of 1936 pitted Alfred Landon, the Republican governor of Kansas, against the incumbent Democratic President, Franklin D. Roosevelt. At the time of the election, the nation had not yet emerged from the Great Depression, and economic issues such as unemployment and government spending were the dominant themes of the campaign. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The Literary Digest, one of the most respected magazines of the time, conducted a poll a couple of weeks before the election. The magazine had used polls to accurately predict the results of every presidential election since 1916, and their 1936 poll was the largest and most ambitious poll ever. The sampling frame for the Literary Digest poll consisted of an enormous list of names that included: CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. (1) every person listed in a telephone directory anywhere in the United States, (2) every person on a magazine subscription list, and (3) every person listed on the roster of a club or professional association. From this sampling frame a list of about 10 million names was created, and every name on this list was mailed a mock ballot and asked to mark it and return it to the magazine. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Based on the poll results, the Literary Digest predicted a landslide victory for Landon with 57% of the vote, against Roosevelt’s 43%. Amazingly, the election turned out to be a landslide victory for Roosevelt with 62% of the vote, against 38% for Landon. The difference between the poll’s prediction and the actual election results was a whopping 19%, the largest error ever in a major public opinion poll. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. For the same election, a young pollster named George Gallup was able to predict accurately a victory for Roosevelt using a sample of “only” 50,000 people. In fact, Gallup also publicly predicted, to within 1%, the incorrect results that the Literary Digest would get using a sample of just 3000 people taken from the same sampling frame the magazine was using. What went wrong with the Literary Digest poll and why was Gallup able to do so much better? CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The first thing seriously wrong with the Literary Digest poll was the sampling frame, consisting of names taken from telephone directories, lists of magazine subscribers, rosters of club members, and so on. Telephones in 1936 were something of a luxury, and magazine subscriptions and club memberships even more so, at a time when 9 million people were unemployed. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. When it came to economic status the Literary Digest sample was far from being a representative cross section of the voters. This was a critical problem, because voters often vote on economic issues, and given the economic conditions of the time, this was especially true in CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. When the choice of the sample has a built-in tendency (whether intentional or not) to exclude a particular group or characteristic within the population, we say that a survey suffers from selection bias. It is obvious that selection bias must be avoided, but it is not always easy to detect it ahead of time. Even the most scrupulous attempts to eliminate selection bias can fall short. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The second serious problem with the Literary Digest poll was the issue of nonresponse bias. In a typical survey it is understood that not every individual is willing to respond to the survey request (and in a democracy we cannot force them to do so). Those individuals who do not respond to the survey request are called nonrespondents, and those who do are called respondents. The percentage of respondents out of the total sample is called the response rate. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. For the Literary Digest poll, out of a sample of 10 million people who were mailed a mock ballot only about 2.4 million mailed a ballot back, resulting in a 24% response rate. When the response rate to a survey is low, the survey is said to suffer from nonresponse bias. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. One of the significant problems with the Literary Digest poll was that the poll was conducted by mail. This approach is the most likely to magnify nonresponse bias, because people often consider a mailed questionnaire just another form of junk mail. Of course, given the size of their sample, the Literary Digest hardly had a choice. This illustrates another important point: Bigger is not better, and a big sample can be more of a liability than an asset. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The Literary Digest story has two morals: (1) You’ll do better with a well-chosen small sample than with a badly chosen large one, and (2) watch out for selection bias and nonresponse bias. CASE STUDY 2THE 1936 LITERARY DIGEST POLL

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. One commonly used short-cut in sampling is known as convenience sampling. In convenience sampling the selection of which individuals are in the sample is dictated by what is easiest or cheapest for the data collector, never mind trying to get a representative sample. A classic example of convenience sampling is when interviewers set up at a fixed location such as a mall or outside a supermarket and ask passersby to be part of a public opinion poll. Convenience Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. A different type of convenience sampling occurs when the sample is based on self- selection–the sample consists of those individuals who volunteer to be in it. Self- selection is the reason why many Area Code 800 polls are not to be trusted. Convenience sampling is not always bad–at times there is no other choice or the alternatives are so expensive that they have to be ruled out. Convenience Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. We should keep in mind, however, that data collected through convenience sampling are naturally tainted and should always be scrutinized (that’s why we always want to get to the details of how the data were collected). More often than not, convenience sampling gives us data that are too unreliable to be of any scientific value. With data, as with so many other things, you get what you pay for. Convenience Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Quota sampling is a systematic effort to force the sample to be representative of a given population through the use of quotas– the sample should have so many women, so many men, so many blacks, so many whites, so many people living in urban areas, so many people living in rural areas, and so on. The proportions in each category in the sample should be the same as those in the population. Quota Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. If we can assume that every important characteristic of the population is taken into account when the quotas are set up, it is reasonable to expect that the sample will be representative of the population and produce reliable data. Quota Sampling

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. George Gallup had introduced quota sampling as early as 1935 and had used it successfully to predict the winner of the 1936, 1940, and 1944 presidential elections. Quota sampling thus acquired the reputation of being a “scientifically reliable” sampling method, and by the 1948 presidential election all three major national polls–the Gallup poll, the Roper poll, and the Crossley poll–used quota sampling to make their predictions. CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. For the 1948 election between Thomas Dewey and Harry Truman, Gallup conducted a poll with a sample of approximately 3250 people. Each individual in the sample was interviewed in person by a professional interviewer to minimize nonresponse bias, and each interviewer was given a very detailed set of quotas to meet–for example, 7 white males under 40 living in a rural area, 5 black males over 40 living in a rural area, 6 white females under 40 living in an urban area, and so on. CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. By the time all the interviewers met their quotas, the entire sample was expected to accurately represent the entire population in every respect: gender, race, age, and so on. Based on his sample, Gallup predicted that Dewey, the Republican candidate, would win the election with 49.5% of the vote to Truman’s 44.5% (with third-party candidates Strom Thurmond and Henry Wallace accounting for the remaining 6%). CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The Roper and Crossley polls also predicted an easy victory for Dewey. The actual results of the election turned out to be almost the exact reverse of Gallup’s prediction: Truman got 49.9% and Dewey 44.5% of the national vote. Truman’s victory was a great surprise to the nation as a whole. So convinced was the Chicago Daily Tribune of Dewey’s victory that it went to press on its early edition for November 4, 1948, with the headline “Dewey defeats Truman.” CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. The picture of Truman holding aloft a copy of the Tribune and his famous retort CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION “Ain’t the way I heard it” have become part of our national folklore.

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. To pollsters and statisticians, the erroneous predictions of the 1948 election had two lessons: (1) Poll until election day, and (2) quota sampling is intrinsically flawed. What’s wrong with quota sampling? After all, the basic idea behind it appears to be a good one: Force the sample to be a representative cross section of the population by having each important characteristic of the population proportionally represented in the sample. CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. Where do we stop? No matter how careful we might be, we might miss some criterion that would affect the way people vote, and the sample could be deficient in this regard. An even more serious flaw in quota sampling is that, other than meeting the quotas, the interviewers are free to choose whom they interview. This opens the door to selection bias. Looking back over the history of quota sampling, we can see a clear tendency to overestimate the Republican vote. CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. In 1936, using quota sampling, Gallup predicted that the Republican candidate would get 44% of the vote, but the actual number was 38%.In 1940, the prediction was 48%, and the actual vote was 45%; in 1944, the prediction was 48%, and the actual vote was 46%. Gallup was able to predict the winner correctly in each of these elections, mostly because the spread between the candidates was large enough to cover the error. In 1948, Gallup (and all the other pollsters) simply ran out of luck. CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. It was time to ditch quota sampling. The failure of quota sampling as a method for getting representative samples has a simple moral: Even with the most carefully laid plans, human intervention in choosing the sample can result in selection bias. CASE STUDY 3THE 1948 PRESIDENTIAL ELECTION