Sampling Design Questions, questions, questions –Do you support U.S. role in Iraq?

Slides:



Advertisements
Similar presentations
Response bias People respond differently to how they believe People respond differently to how they believe Deliberate bias Deliberate bias Do you agree.
Advertisements

AP Statistics Section 5.1 B More on Sampling. Methods for sampling from large populations spread out over a wide area are usually more complex than an.
Where do data come from and Why we don’t (always) trust statisticians.
Sampling.
Copyright © 2010 Pearson Education, Inc. Slide
* Students will be able to identify populations and samples. * Students will be able to analyze surveys to see if there is bias. * Students will be able.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Literary Digest Poll 1936 election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called the election since 1916 Sample size: 2.4 million!
About BIAS…. Bias A systematic error in measuring the estimateA systematic error in measuring the estimate favors certain outcomesfavors certain outcomes.
§ Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is.
GATHERING AND PRODUCING DATA.
JAMM 444: Public Opinion Survey methodology Comparing survey methods Planning your surveys.
Chapter 12 Sample Surveys. At the end of this chapter, you should be able to Identify populations, samples, parameters and statistics for a given problem.
Chapter 12 Sample Surveys
LT 4.1—Sampling and Surveys Day 3 Notes--Bias
AP Statistics!!! Test Review Sampling Error ◦ Occurs in the act of choosing the sample ◦Undercoverage – certain members of population are ‘left.
How We Form Political Opinions Political Opinions Personal Beliefs Political Knowledge Cues From Leaders.
Dear Readers, If you had it to do all over again, would you have children? Ann Landers Ann Landers posed the question to the readers of her advice column.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Chapter 12: AP Statistics
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
C1, L2, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
SAMPLING Nuances of sample size determination Brett Oppegaard, Washington State University Vancouver Language, Texts and Technology, Spring 2011.
Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating.
Copyright © 2009 Pearson Education, Inc. Publishing as Longman. The 1936 Literary Digest Presidential Election Poll Case Study: Special Topic Lecture Chapter.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
Political Science 30: Political Inquiry Drawing a Good Sample.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Sample Surveys.  The first idea is to draw a sample. ◦ We’d like to know about an entire population of individuals, but examining all of them is usually.
Introduction to Sampling “If you don’t believe in sampling, the next time you have a blood test tell the doctor to take it all.”
Chapter 12 Designing Good Samples. Doubting the Holocaust? An opinion poll conducted in 1992 for the American Jewish Committee asked: Does it seem possible.
Measurements, Mistakes and Misunderstandings in Sample Surveys Lecture 1.
Sample surveys and polls. YearSample size WinnerGallup prediction Election result Error 1936~50,000Roosevelt55.7% ↑62.5%-6.8% 1940~50,000Roosevelt52.0%
DATA COLLECTION METHODS Sampling
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Sampling Design Notes Pre-College Math.
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Part III Gathering Data.
Random and Non-Random samples 12/3/2013. Readings Chapter 6 Foundations of Statistical Inference (Pollock) (pp )
Chapter 12 Sample Surveys
AP STATISTICS LESSON AP STATISTICS LESSON DESIGNING DATA.
AP STATISTICS Section 5.1 Designing Samples. Objective: To be able to identify and use different sampling techniques. Observational Study: individuals.
Bias in Sampling. Definitions Bias = where the results of the sample are not representative of the population Three sources of Bias in Sampling –Sampling.
Political Beliefs and Public Opinion. Political efficacy The belief that one’s political participation really matters.
Part III – Gathering Data
I can identify the difference between the population and a sample I can name and describe sampling designs I can name and describe types of bias I can.
 An observational study observes individuals and measures variable of interest but does not attempt to influence the responses.  Often fails due to.
5.3: SAMPLING. Errors in Sampling Sampling Errors- Errors caused by the act of taking a sample. Makes sample results inaccurate. Random Sampling Error.
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
Chapter 5 Sampling and Surveys. Section 5.3 Sample Surveys in the Real World.
Designing Studies In order to produce data that will truly answer the questions about a large group, the way a study is designed is important. 1)Decide.
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
We’ve been limited to date being given to us. But we can collect it ourselves using specific sampling techniques. Chapter 12: Sample Surveys.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
5.1: Designing Samples. Important Distinction Observational Study – observe individuals and measure variables but do not attempt to influence the responses.
Chapter 11 Sample Surveys. How do we gather data? Surveys Opinion polls Interviews Studies –Observational –Retrospective (past) –Prospective (future)
Introduction/ Section 5.1 Designing Samples.  We know how to describe data in various ways ◦ Visually, Numerically, etc  Now, we’ll focus on producing.
Ten percent of U. S. households contain 5 or more people
MATH Section 6.1. Sampling: Terms: Population – each element (or person) from the set of observations that can be made Sample – a subset of the.
Chapter 5: Producing Data 5.1 – Designing Samples "An approximate answer to the right question is worth a good deal more than the exact answer to an approximate.
Unit 4--Lesson 2. Lesson Objectives At the end of the lesson, students can: Identify common issues with sampling and surveys Design an experiment using.
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Section 5.1 Designing Samples
Chapter 10 Samples.
Bias On-Level Statistics.
CHAPTER 12 Sample Surveys.
Inference for Sampling
Chapter 4 Sampling Design.
Wednesday, October 19, 2016 Warm-up
Presentation transcript:

Sampling Design Questions, questions, questions –Do you support U.S. role in Iraq? –What % of lettuce shipment is bad? –How many children are obese? –What’s the price of gas at the pump across Minnesota? Practically impossible to poll entire population Use a part to make conclusions about the whole Idea #1: Use a SAMPLE to make conclusions about the POPULATION But sample must be representative of population

Polling began in Pennsylvania Harrisburg Pennsylvanian in 1824 predicted Andrew Jackson the victor –He did win the popular vote –But, like Al Gore, he didn’t win the electoral votes and John Quincy Adams took the election Straw polls were convenience samples Solicited opinions of “man on the street” No science of sampling for 100 years Conventional wisdom: bigger the better

1936 election and the Literary Digest survey Magazine had predicted every election since 1916 Sent out 10 million surveys---and 2.4 million responded They said: Landon would win 57% of the vote What happened: 62% Roosevelt landslide

What went wrong? Sample not representative Lists came from subscriptions, phone directories, club members Phones were a luxury in 1936 Selection Bias toward the rich Voluntary response: Republicans were angry and more likely to respond Context: Great Depression – 9 million unemployed –Real income down 33% –Massive discontent, strike waves Economy was main issue in the election

Idea #2: Randomize Randomization insures sample is representative of population Randomization protects against bias Simple Random Sample (SRS): every combination of people has equal chance to be selected How to do it right

Some examples of non-random, biased samples 100 people at the Mall of America 100 people in front of the Metrodome after a Twins game 100 friends, family and relatives 100 people who volunteered to answer a survey question on your web site 100 people who answered their phone during supper time The first 100 people you see after you wake up in the morning

Is blind chance better than careful planning and selection? Another classic fiasco 1948 Election: Truman versus Dewey Ever major poll predicted Dewey would win by 5 percentage points Truman showing the Chicago Daily Tribune headline the morning after the 1948 election.

What went wrong? Pollsters tried to design a representative sample Quota Sampling Each interviewer assigned a fixed quota of subjects in numerous categories (race, sex, age) In each category, interviewers free to choose Left room for human choice and inevitable bias Republicans were wealthier, better educated, and easier to reach –Had telephones, permanent addresses, “nicer” neighborhoods Interviewers chose too many Republicans

Quota Sampling biased Republican bias in Gallup Poll Quota sampling eventually abandoned for random sampling Repeated evidence points to superiority of random sampling YearPrediction of GOP vote Actual GOP vote Error in favor of GOP

How large a sample? Not 10 million, not even 10,000! Remarkably it doesn’t depend on size of population, as long as population is at least 100 times larger than sample Idea #3: Validity of the sample depends on the sample size, not population size Like tasting a flavor at the ice cream shop SRS of 100 will be as accurate on Carleton College as in New York City! Most polls today rely on 1,000-2,000 people

Gallup Poll record in presidential elections since 1948 Year Sample Size Winning candidate Gallup prediction Election result Error 19525,385Eisenhower51.0%55.4%4.4% 19568,144Eisenhower59.5%57.8%1.7% 19608,015Kennedy51.0%50.1%0.9% 19646,625Johnson64.0%61.3%2.7% 19684,414Nixon43.0%43.5%0.5% 19723,689Nixon62.0%61.8%0.2% 19763,439Carter49.5%51.1%1.6% 19803,500Reagan51.6%55.3%3.7% 19843,456Reagan59.0%59.2%0.2% 19884,089Bush56.0%53.9%2.1% 19922,019Clinton49.0%43.2%5.8% 1996Clinton52.0%50.1%1.9% 2000Bush48.0%47.9%0.1%

A peek ahead... A good rule of thumb is that the margin of error in a sample is, where n is the sample size. For n = 1,600, that’s 2.5%. Most political polls report margins of error between 2-3%. The rule of thumb margin of error doesn’t depend on population size, only on sample size

Other sampling schemes Stratified sampling Goal: Random sample of 240 Carleton students To insure representation across disciplines, divide population into strata –Arts and Literature 20%- Humanities 15% –Social Sciences 30%- Math/Natural Sciences 35% Choose 240 x.20 = 48 Arts and Literature 240 x.15 = 36 Humanities 240 x.30 = 72 Social Sciences 240 x.35 = 84 Math/Natural Sciences Within strata, choose a simple random sample

Stratified sampling Advantages: Sample will be representative for the strata; Can gain precision of estimate Disadvantages: Logistically difficult; must know about the population; May not be possible Note Stratified sample is not a simple random sample Every possible group of 240 students is not equally likely to be selected

Cluster sampling – an example Warehouse contains 10,000 window frames stored on pallets Goal: Estimate how many frames have wood rot Determining if a frame has wood rot is costly Sample 500 window frames Pallets numbered 1 to 400 Each pallet contains 20 to 30 window frames Sample pallets, not windows. Pick SRS of 20 pallets from population of 400. Cluster sample consists of all frames on each pallet

Cluster sampling Door-to-door surveys – City blocks are the clusters Airlines get customer opinions –Individual flights are the clusters Advantage: Much easier to implement depending on context Disadvantage: Greater sampling variability; less statistical accuracy

Who likes Statistics?

Most common forms of bias  Response bias  Anything that biases/influences responses  Non-response bias  When a large fraction of those sampled don’t respond, such as  Voluntary response bias  Most common source of bias in polls

Sampling badly: Convenience sampling  Sample individuals who are at hand  Survey students on the Quad or in Sayles or in Stats class  Internet polls are prime suspects  American Family Association online poll on gay marriage

You critique it ► Before 2000 election: What to do with large government surplus ► (1) “Should the money be used for a tax cut, or should it be used to fund new government programs?” ► (2) “Should the money be used for a tax cut, or should it be spent on programs for education, the environment, health care, crime-fighting, and military defense?” ► (1): 60% for tax cut; (2): 22% for tax cut

Another type of response bias “Some “Some people say that the 1975 Public Affairs Act should be repealed. Do you agree or disagree that it should be repealed.” Washington Post, Post, Feb Results: Results: For repeal: 24%, Against repeal: 19%, No opinion: 57% No No such thing thing as the Public Affairs Act!

Non-response  Non-respondents can be very different from respondents  Student surveys at end of term had about 20% response rate  General Social Survey ( has % response rate, with 90 minute survey!  Huge variability in media and government response rates  Typically, media rates at about 25%; government at about 50%.  Takes large amount of money, time, and training to insure good response.

Do you believe the poll? What questions should you ask? Who carried out survey? Who carried out survey? What is the population? What is the population? How was sample selected? How was sample selected? How large was the sample? How large was the sample? What was the response rate? What was the response rate? How were subjects contacted? How were subjects contacted? When was the survey conducted? When was the survey conducted? What are the exact questions asked? What are the exact questions asked?