By Boon Xuan, Mei Ying and Fatin

Slides:



Advertisements
Similar presentations
Inference about a Population Proportion
Advertisements

TOPIC:Political Beliefs and Behaviors: Measuring Public Opinion AIM – How is Public Opinion Measured? Do Now: Complete Poll on Texting and Driving.
Sample Size.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
About BIAS…. Bias A systematic error in measuring the estimateA systematic error in measuring the estimate favors certain outcomesfavors certain outcomes.
Section Decision Making with Data  NOT ALL DATA IS GOOD DATA!  “Do not put faith in what statisticians say until you have carefully considered.
7-3F Unbiased and Biased Samples
Sample Of size 2 Of size 3 1 A,B=3,1 2 A,B,C=3,1,5 3 A,C=3,5 4
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
DATA COLLECTION METHODS Sampling
Designing Social Inquiry week 4 I36005 Soohyung Ahn Case Study 1936 PRESIDENTIAL ELECTION : Roosevelt VS Landon.
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Sampling Design Notes Pre-College Math.
Making Inferences. Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to.
Bias in Sampling. Definitions Bias = where the results of the sample are not representative of the population Three sources of Bias in Sampling –Sampling.
SECTION 4.1. INFERENCE The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population.
CHAPTER 7, THE LOGIC OF SAMPLING. Chapter Outline  A Brief History of Sampling  Nonprobability Sampling  The Theory and Logic of Probability Sampling.
Bias in Survey Sampling. Bias Due to Unrepresentative Samples A good sample is representative. This means that each sample point represents the attributes.
Ten percent of U. S. households contain 5 or more people
Math III U9D5 Warm-up: 1. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation,
THE EFFECT OF SAMPLING BIAS ON BIG DATA BY USING THE READERS DIGEST POLL OF THE 1936 ELECTION AS A CASE STUDY, WE EXAMINE HOW THE SAMPLE OF DATA USED AFFECTS.
LOOKING AT SOME BASICS Can you tell the difference?
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
Today we will… Identify the information handling questions which will be assessed in Higher Modern Studies.
Chapter 12 Sample Surveys.
Sources of Error In Sampling
9.3 Hypothesis Tests for Population Proportions
Problem 1 Suppose 40% of registered voters in a certain town are Democrats. You take a simple random sample of 80 voters. If your survey avoids biases,
Overview of probability and statistics
A very common paradigm in statistical studies:
Measuring Public Opinion
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Warm Up Assume that you are a member of the Family Research Council and have become increasingly concerned about the drug use by professional sports.
The Diversity of Samples from the Same Population
Check Your Assumptions
Political Attitudes and Public Opinion
Data Collection Principles
Bias On-Level Statistics.
Inference for Sampling
Chapter 4 Sampling Design.
Sampling.
Sampling Population – any well-defined set of units of analysis; the group to which our theories apply Sample – any subset of units collected in some manner.
Fill in questionnaire.
How do I study different sampling methods for collecting data?
Quantitative design: Ungraded review questions
Lecture 2: Data Collecting and Sampling
The Argumentative Essay
Answer the following questions
Public Opinion Chapter 10.
a) Survey c) Simulation e) Simulation b) Observation d) Experiment
Alg 2/Trig Honors – Stats Unit Day 8
Introduction to Statistics
EVALUATING STATISTICAL REPORTS
Sampling and Study Design
Should You Believe a Statistical Study?
Public Opinion Polls.
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations Greenland et al (2016)
Program Evaluation, Archival Research, and Meta-Analytic Designs
WHAT IS PUBLIC OPINION? DEFINITIONS
Chapter 5: Producing Data
Pull 2 samples of 10 pennies and record both averages (2 dots).
Chapter 2.1 Research Methods
Public Opinion and Polling
Lesson – Teacher Notes Standard:
Statistics and Probability-Part 1
Quantitative design: Ungraded review questions
COLLECTING STATISTICAL DATA
Randomization and Bias
ELEMENTARY STATISTICS, BLUMAN
Samples and Surveys population: all the members of a set sample: a part of the population convenience sample: select any members of the population who.
Presentation transcript:

By Boon Xuan, Mei Ying and Fatin Problem with Sampling By Boon Xuan, Mei Ying and Fatin Based on: Are First-Borns more likely to attend Harvard? Case Study by Anthony Millner and Raphael Calel (2012)

Overview Background Michael Sandel Problems with his Claim Base rate fallacy Bayes’ Theorem Lack of Data Conclusion

Background There are 75 to 85 percent of Harvard students who are first-borns and Michael Sander showed this by asking his class to raise their hands when they are first-born. From this, he suggested that birth order has a significant level of effect on the amount of effort put in studies of the child.

Michael Sandel Who is he? He is an American political philosopher and a political philosophy professor at Harvard University. His course “Justice” is the first Harvard course to be made freely available online and on television. It has been viewed by tens of millions of people around the world, including in China, where Sandel was named the “most influential foreign figure of the year.” (China Newsweek).

Problem with his claim Base rate Fallacy Lack of Data Sampling Bias Also called the base rate neglect or base rate bias. It is a formal fallacy whereby if presented with related base rate information and a specific information, the mind tends to ignore the general information and focus more on the specific information. Lack of Data Sampling Bias

Base rate fallacy What is Sandel really doing? He is finding out the probability that you are a first-born in Harvard when what he really wants to find is the probability that you are in Harvard when you are nth-born. P(1st-born | Harvard) INSTEAD of P(Harvard | nth-born)

Base rate fallacy Another Example of Base rate fallacy (drunk drivers): A group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random, and force the driver to take a breathalyzer test. It indicates that the driver is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk? Many would answer as high as 0.95, but the correct probability is about 0.02.

Base rate fallacy Another Example of Base rate fallacy (drunk drivers): To find the probability, what we need is to use Bayes’ theorem OR an easier explanation would be given 1000 drivers, 1 driver is drunk and is confirmed there is a true positive result using the breathalyzer. 999 Drivers are not drunk and among them, there are 5 percent of the drivers with false positive results with 49.95 of them. Hence, the probability of one of the drivers among the 50.95 positive results is really drunk is 0.02.

Bayes’ Theorem To really find the probability of either nth-born in Harvard or the driver is drunk given that the breathalyzer indicates he or she is drunk, we need Bayes’ Theorem. What is it? It describes the probability of an event based on prior knowledge of the conditions that might be linked to the event.

Bayes’ Theorem Usage of Bayes’ theorem to find the probability of drunk drivers when the breathalyzer shows a positive result: What we need to find - P(drunk | positive) Given - P(drunk) = 0.001 P(sober) = 0.999 P(positive | drunk) = 1.00 P(positive | sober) = 0.05 P(positive) = (1.00 x 0.001) + (0.05 x 0.999) = 0.05095

Bayes’ Theorem Usage of Bayes’ theorem to find the probability of drunk drivers when the breathalyzer shows a positive result: What we need to find - P(drunk | positive) Formula = P(drunk | positive) = ( P(positive | drunk) x P(drunk) ) / P(positive) = (1.00 x 0.001) / 0.05095 = 0.019627

Lack of Data There are limitless number of intermediate possibilities such as fertility rate that play a part in explaining that birth-order does in fact affect whether a child is smart enough to enter Harvard University. From the information that Sandel gave us, it is not possible to determine that birth-order effect is the only variable that affects the probability of you getting into Harvard.

Sampling Bias What is it? How did Sander unknowingly commit this? It is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling. How did Sander unknowingly commit this? By restricting the population of study to his class only, he is excluding the students of Harvard University and therefore, his sample may not be representative of Harvard’s population.

Sampling Bias Real Life examples on how it may affect us: During 1936, in the early days of opinion polling, the American Literary Digest Magazine collected over 2 million postal surveys and predicted that the Republican candidate in the US presidential election, Alf Landon would win Franklin Roosevelt by a large margin. However, the result was the exact opposite. The sample collected from readers of the magazine included an over-representation of the rich and hence as a group, more likely to vote for the Republican candidate. During 1948 presidential election night, the Chicago Tribune printed the headlines wrongly as their editor trusted the results of a phone survey and the telephones then were not widely used yet. Therefore, not representative of the general population.

Sampling Bias How can we reduce Sampling Bias? Avoid Judgement Sampling or Convenience Sampling Make sure that the target population is defined properly and the sample frame match it as close as possible.

Conclusion Be careful of making the mistake of neglecting the base rate Gather enough reliable data to substantiate your claims Do make sure to reduce sampling bias THANK YOU!