§ 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is.

Slides:



Advertisements
Similar presentations
VI. Sampling: (Nov. 2, 4) Frankfort-Nachmias & Nachmias (Chapter 8 – Sampling and Sample Designs) King, Keohane and Verba (Chapter 4) Barbara Geddes
Advertisements

Chapter 8: Mass Media and Public Opinion Section 2
How to survey data without adding bias.
Public Opinion Polling How Public Opinion is Measured (and Mismeasured)
Sampling.
Sampling Methods.
Experimental Design Statistics Introduction Remember, population and sample Samples –1523 randomly chosen voters –6 Black capped chickadees –The.
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Introduction to Data Analysis.
Literary Digest Poll 1936 election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called the election since 1916 Sample size: 2.4 million!
Bush's lead gets smaller in poll By Susan Page, USA TODAY WASHINGTON — President Bush leads Sen. John Kerry by 8 percentage points among likely voters,
JAMM 444: Public Opinion Survey methodology Comparing survey methods Planning your surveys.
The eternal tension in statistics.... Between what you really really want (the population) but can never get to...
Chapter 4 How to get the Data Part1 n In the first 3 lectures of this course we spoke at length about what care we should take in conducting a study ourselves.
The Logic of Sampling. Political Polls and Survey Sampling In the 2000 Presidential election, pollsters came within a couple of percentage points of estimating.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Chapter 12 Sample Surveys
Sample Surveys Ch. 12. The Big Ideas 1.Examine a Part of the Whole 2.Randomize 3.It’s the Sample Size.
How We Form Political Opinions Political Opinions Personal Beliefs Political Knowledge Cues From Leaders.
Sample Design.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Qualitative and Quantitative Sampling
Sampling: Theory and Methods
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
What is statistics? Statistics is the science of dealing with data.
SAMPLING Nuances of sample size determination Brett Oppegaard, Washington State University Vancouver Language, Texts and Technology, Spring 2011.
Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
7. Logic of Sampling Jin-Wan Seo, Professor Dept. of Public Administration, University of Incheon.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Measurements, Mistakes and Misunderstandings in Sample Surveys Lecture 1.
Chapter 8 section 2 “Measuring Public Opinion”
7-1 Chapter Seven SAMPLING DESIGN. 7-2 Selection of Elements Population Element the individual subject on which the measurement is taken; e.g., the population.
Section 1.2 ~ Sampling Introduction to Probability and Statistics Ms. Young.
DATA COLLECTION METHODS Sampling
Designing Social Inquiry week 4 I36005 Soohyung Ahn Case Study 1936 PRESIDENTIAL ELECTION : Roosevelt VS Landon.
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Chapter 12 Sample Surveys *Sample *Bias *Randomizing *Sample Size.
Sampling Design Notes Pre-College Math.
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.
Population and Sampling
Sampling Chapter 1. EQT 373 -L2 Why Sample? Selecting a sample is less time-consuming than selecting every item in the population (census). Selecting.
Chapter 15 Sampling and Sample Size Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
AP STATISTICS Section 5.1 Designing Samples. Objective: To be able to identify and use different sampling techniques. Observational Study: individuals.
Part III – Gathering Data
SECTION 4.1. INFERENCE The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population.
Statistics – OR 155 Section 1 J. S. Marron, Professor Department of Statistics and Operations Research.
 Elections: The voice of the people. › Frequently interpreted as voters acceptance or rejection of a party platform. › Affected by many factors and give.
Statistical Reasoning
LIS 570 Selecting a Sample.
I can identify the difference between the population and a sample I can name and describe sampling designs I can name and describe types of bias I can.
Get out homework. Get out notes.. SECTION 5.1 CONTINUED Designing Samples.
Chapter 3 Sampling Techniques. Chapter 3 – Sampling Techniques When conducting a survey, it is important to choose the right questions to ask and to select.
Status for AP Congrats! We are done with Part I of the Topic Outline for AP Statistics! (20%-30%) of the AP Test can be expected to cover topics from chapter.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling 13.3.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
Sample Surveys. Terminologies Investigators usually want to generalize about a class of individuals. This class is called the population. For example,
Chapter 11 Sample Surveys. How do we gather data? Surveys Opinion polls Interviews Studies –Observational –Retrospective (past) –Prospective (future)
Ten percent of U. S. households contain 5 or more people
THE EFFECT OF SAMPLING BIAS ON BIG DATA BY USING THE READERS DIGEST POLL OF THE 1936 ELECTION AS A CASE STUDY, WE EXAMINE HOW THE SAMPLE OF DATA USED AFFECTS.
Unit 2 Review. Developing a Thesis A thesis is a question or statement that the research will answer When writing a thesis, ask: Is it specific? Are the.
Last lecture summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures.
Sampling.
Bias On-Level Statistics.
Inference for Sampling
Chapter 4 Sampling Design.
Sampling Population – any well-defined set of units of analysis; the group to which our theories apply Sample – any subset of units collected in some manner.
COLLECTING STATISTICAL DATA
Presentation transcript:

§ Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack- beatings are up a shocking 900%? ” The Simpsons Homer: “ Aw, people can come up with statistics to prove anything, Kent. Forty percent of all people know that. ” The Simpsons, `Homer the Vigilante`

So… what is statistics, anyway?  The gathering, organizing, interpreting and understanding of data.

The Population  The complete set of individuals or objects about which we are seeking information is referred to as the population.  A silly example: Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.

The N - value  If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.

The N - value (A Point or Two)  This value is often difficult to make--and therefore can require various adjustments. For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.

The N - value (A Point or Two)  This value can change with time.  In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to- year.

Article 1, Section 2: [...] Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct. Example: U.S. Census

What exactly is the census used for?  Determination of taxes and political representation.  Collects demographic information that is used to determine the allocation of federal money to state and local governments.  Used to calculate government statistics like the Consumer Price Index.

Example: U.S. Census Why are we even talking about this?  In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous.  Today = The Exact Opposite.  The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor.  There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.

Surveys  Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole.  This process is what math-and- science-y types refer to as a survey.  The selected subgroup is called a sample.

Surveys (cont’d)  There are two major issues when setting up a survey: 1. You need a sample that is a good representative of the population being studied. 2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.

Example: Literary Digest and the 1936 Presidential Election  Literary Digest was a popular magazine that had accurately predicted the winner in the five elections prior to  That year, the publication ambitiously decided to poll 10 million Americans.  The individuals contacted came from magazine subscription lists and telephone directory listings.

Example: Literary Digest and the 1936 Presidential Election  When the results came in 2.4 million people had responded and the survey predicted that the vote would end Landon: 57%FDR: 43%  What actually happened?

 The actual results were FDR: 61%Landon: 36.5%Other: 2.5% (Landon, in fact, did not even carry his home state.)

 George Gallup, however, made an accurate prediction with a sample of only 50,000 people.  Why were his results superior?

 George Gallup, however, made an accurate prediction with a sample of only 50,000 people.  Why were his results superior? There are two main reasons: 1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias. 2. Out of 10 million people contacted only 24% replied. This example of what is called nonresponse bias only magnified the first problem.  George Gallup, however, made an accurate prediction with a sample of only 50,000 people.  Why were his results superior? There are two main reasons: 1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias. 2. Out of 10 million people contacted only 24% replied. This example of what is called nonresponse bias only magnified the first problem.

Example: “Ain’t the way I heard it.”  In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile.  With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election of Dewey: 49.5%Truman: 44.5% Thurmond, Wallace, etc: 6%

 The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5%  So, what went wrong this time?

 The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5%  So, what went wrong this time?  There are too many characteristics you could use for your quota.  The methods used in 1948 did not take economic status into account and oversampled Republican voters.  Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.  The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5%  So, what went wrong this time?  There are too many characteristics you could use for your quota.  The methods used in 1948 did not take economic status into account and oversampled Republican voters.  Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

Lessons to take from these occurrences...  A small, well-chosen sample is better than a poorly-chosen large one.  Selection bias and nonresponse bias need to be taken into account.  Don’t stop surveying early.  Quota sampling is flawed.

Random Sampling  Random sampling: methods in which a level of chance is used to choose a sample  Simple random sampling: a larger scale version of picking names out of a hat.  The problem with simple random sampling is one of practicality.

Random Sampling  The solution--used in modern opinion polling--is stratified sampling.  This method breaks the population down into strata (categories) and then randomly choose a sample from the strata.  The strata are then divided into substrata and the process is continued…

Example: Opinion Polls  Modern opinion polls construct their strata as follows: 1. The nation is divided into “size of community” strata. 2. These strata are divided by geographic location. 3. Communities in each geographic region are picked randomly. 4. Wards, precincts and households are then found randomly.

Example: Opinion Polls  The result is an efficient method that generally yields accurate results.  Usually, people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.

§ Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack- beatings are up a shocking 900%? ” The Simpsons Homer: “ Aw, people can come up with statistics to prove anything, Kent. Forty percent of all people know that. ” The Simpsons, `Homer the Vigilante`

So… what is statistics, anyway?  The gathering, organizing, interpreting and understanding of data.

The Population  The complete set of individuals or objects about which we are seeking information is referred to as the population.  A silly example: Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.

The N - value  If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.

The N - value (A Point or Two)  This value is often difficult to make--and therefore can require various adjustments. For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.

The N - value (A Point or Two)  This value can change with time.  In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to- year.

Article 1, Section 2: [...] Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct. Example: U.S. Census

What exactly is the census used for?  Determination of taxes and political representation.  Collects demographic information that is used to determine the allocation of federal money to state and local governments.  Used to calculate government statistics like the Consumer Price Index.

Example: U.S. Census Why are we even talking about this?  In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous.  Today = The Exact Opposite.  The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor.  There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.

Surveys  Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole.  This process is what math-and- science-y types refer to as a survey.  The selected subgroup is called a sample.

Surveys (cont’d)  There are two major issues when setting up a survey: 1. You need a sample that is a good representative of the population being studied. 2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.

Example: Literary Digest and the 1936 Presidential Election  Literary Digest was a popular magazine that had accurately predicted the winner in the five elections prior to  That year, the publication ambitiously decided to poll 10 million Americans.  The individuals contacted came from magazine subscription lists and telephone directory listings.

Example: Literary Digest and the 1936 Presidential Election  When the results came in 2.4 million people had responded and the survey predicted that the vote would end Landon: 57%FDR: 43%  What actually happened?

 The actual results were FDR: 61%Landon: 36.5%Other: 2.5% (Landon, in fact, did not even carry his home state.)

 George Gallup, however, made an accurate prediction with a sample of only 50,000 people.  Why were his results superior?

 George Gallup, however, made an accurate prediction with a sample of only 50,000 people.  Why were his results superior? There are two main reasons: 1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias. 2. Out of 10 million people contacted only 24% replied. This example of what is called nonresponse bias only magnified the first problem.  George Gallup, however, made an accurate prediction with a sample of only 50,000 people.  Why were his results superior? There are two main reasons: 1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias. 2. Out of 10 million people contacted only 24% replied. This example of what is called nonresponse bias only magnified the first problem.

Example: “Ain’t the way I heard it.”  In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile.  With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election of Dewey: 49.5%Truman: 44.5% Thurmond, Wallace, etc: 6%

 The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5%  So, what went wrong this time?

 The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5%  So, what went wrong this time?  There are too many characteristics you could use for your quota.  The methods used in 1948 did not take economic status into account and oversampled Republican voters.  Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.  The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5%  So, what went wrong this time?  There are too many characteristics you could use for your quota.  The methods used in 1948 did not take economic status into account and oversampled Republican voters.  Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

Lessons to take from these occurrences...  A small, well-chosen sample is better than a poorly-chosen large one.  Selection bias and nonresponse bias need to be taken into account.  Don’t stop surveying early.  Quota sampling is flawed.

Random Sampling  Random sampling: methods in which a level of chance is used to choose a sample  Simple random sampling: a larger scale version of picking names out of a hat.  The problem with simple random sampling is one of practicality.

Random Sampling  The solution--used in modern opinion polling--is stratified sampling.  This method breaks the population down into strata (categories) and then randomly choose a sample from the strata.  The strata are then divided into substrata and the process is continued…

Example: Opinion Polls  Modern opinion polls construct their strata as follows: 1. The nation is divided into “size of community” strata. 2. These strata are divided by geographic location. 3. Communities in each geographic region are picked randomly. 4. Wards, precincts and households are then found randomly.

Example: Opinion Polls  The result is an efficient method that generally yields accurate results.  Usually, people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.

Slide 0