STT 421 Day 7: September 28, 2015 September 28, 2015

Slides:



Advertisements
Similar presentations
Where do data come from and Why we don’t (always) trust statisticians.
Advertisements

MAT 1000 Mathematics in Today's World. Last Time 1.What does a sample tell us about the population? 2.Practical problems in sample surveys.
Sampling.
AP Stat Trivia Review.
Sample Data Population Inference A very common paradigm in statistical studies:
Sections – 1.4: Other Effective Sampling Methods – 1.5: Bias in Sampling – 1.6: The Design of Experiments General goals – Collect data effectively – Avoid.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Literary Digest Poll 1936 election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called the election since 1916 Sample size: 2.4 million!
The eternal tension in statistics.... Between what you really really want (the population) but can never get to...
Chapter 12 Sample Surveys. At the end of this chapter, you should be able to Identify populations, samples, parameters and statistics for a given problem.
Chapter 4 How to get the Data Part1 n In the first 3 lectures of this course we spoke at length about what care we should take in conducting a study ourselves.
1954 Salk polio vaccine trials ► Biggest public health experiment ever ► Polio epidemics hit U.S. in 20 th century ► Struck hardest at children ► Responsible.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health
Chapter 13 Experiments and Observational Studies.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
C1, L2, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
–Population: The collection of objects or individuals. N-value: The number of individuals in the population. N-value: The number of individuals.
SAMPLING Nuances of sample size determination Brett Oppegaard, Washington State University Vancouver Language, Texts and Technology, Spring 2011.
C1, L3-4, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
STA Lecture 31 STA 291 Lecture 3 Data type: –Categorical/Qualitative and –Quantitative/Numerical within categorical (nominal and ordinal) within.
Copyright © 2009 Pearson Education, Inc. Publishing as Longman. The 1936 Literary Digest Presidential Election Poll Case Study: Special Topic Lecture Chapter.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
Political Science 30: Political Inquiry Drawing a Good Sample.
 If you have your parent letter, please turn in at my desk (scissors on my desk).  Get out your homework and materials for notes!
7. Logic of Sampling Jin-Wan Seo, Professor Dept. of Public Administration, University of Incheon.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Chapter 12 Designing Good Samples. Doubting the Holocaust? An opinion poll conducted in 1992 for the American Jewish Committee asked: Does it seem possible.
An Overview of Statistics
DATA COLLECTION METHODS Sampling
Designing Social Inquiry week 4 I36005 Soohyung Ahn Case Study 1936 PRESIDENTIAL ELECTION : Roosevelt VS Landon.
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 1 Introduction to Statistics 1-4/1.5Collecting Sample Data.
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Chapter 12 Sample Surveys
Chapter 1 Statistical Thinking What is statistics? Why do we study statistics.
Statistics for fun and profit Chris Williams, Ph.D. Department of Statistics University of Idaho.
Chapter 3.1.  Observational Study: involves passive data collection (observe, record or measure but don’t interfere)  Experiment: ~Involves active data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Section 1.4 Collecting Sample Data  If sample data are not collected in an appropriate.
Lecture 2 Dustin Lueker.  Convenience sample ◦ Selecting subjects that are easily accessible to you  Volunteer sample ◦ Selecting the first two subjects.
Bias in Sampling. Definitions Bias = where the results of the sample are not representative of the population Three sources of Bias in Sampling –Sampling.
Political Beliefs and Public Opinion. Political efficacy The belief that one’s political participation really matters.
 Elections: The voice of the people. › Frequently interpreted as voters acceptance or rejection of a party platform. › Affected by many factors and give.
Lecture 2 Dustin Lueker.  Parameter ◦ Numerical characteristic of the population  Calculated using the whole population  Statistic ◦ Numerical characteristic.
Chapter 3 Producing Data. Observational study: observes individuals and measures variables of interest but does not attempt to influence the responses.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Collecting Data: Sampling SECTION 1.2 Sample versus Population Statistical.
1-1 Design Unit 1 Study Designs For Data Collection Three basic study designs 1. Controlled Experiments (Chapt. 1) 2. Observational studies (Chapt. 2)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
CHAPTER 9: Producing Data Experiments ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
Section 5.2 Designing Experiments AP Statistics October 27 th, 2014.
Statistic for the day: Payment that 5 former employees of Du Pont in 1988 demanded for keeping the formula for Lycra secret: $10,000,000 Assignment: Read.
Sample Surveys. Terminologies Investigators usually want to generalize about a class of individuals. This class is called the population. For example,
Ten percent of U. S. households contain 5 or more people
THE EFFECT OF SAMPLING BIAS ON BIG DATA BY USING THE READERS DIGEST POLL OF THE 1936 ELECTION AS A CASE STUDY, WE EXAMINE HOW THE SAMPLE OF DATA USED AFFECTS.
LOOKING AT SOME BASICS Can you tell the difference?
 Let’s look at the Salk Vaccine experiment we first looked at on Wednesday.  In 1954, 200,475 children were given a treatment of Salk vaccine injections.
Experiments and Double Blinding By Veronica Coronado and Olivia Barth.
Collecting Data: Sampling
A very common paradigm in statistical studies:
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Take-home quiz due! Get out materials for notes!
Bias On-Level Statistics.
Inference for Sampling
MA151 Lecture 2: Sampling methods
Chapter 2.1 Research Methods
1954 Salk polio vaccine trials
STA 291 Fall 2009 Lecture 2 Dustin Lueker.
If you have your parent letter, please turn in at my desk (scissors on my desk). Get out your homework and materials for notes!
Presentation transcript:

STT 421 Day 7: September 28, 2015 September 28, 2015 STT 421: Vince Melfi

Sample Surveys Want to learn something about a (often large) group called the population. We only can collect data on a subset of the population, called the sample. We’d like the sample to be “representative” of the population. If a sampling method over or under represents an important characteristic, it’s called biased. September 28, 2015 STT 421: Vince Melfi

Literary Digest Poll (1936) Goal: Predict the outcome of the 1936 presidential election between Roosevelt and Landon Literary digest magazine mailed out 10 million surveys and got 2.4 million responses. Of those who responded, 57% preferred Landon to Roosevelt. On the basis of this (large!) sample, Literary Digest predicted a landslide victory for Landon September 28, 2015 STT 421: Vince Melfi

Literary Digest Poll (1936) George Gallup, a pollster, also tried to predict the outcome of the election He had a smaller sample size of 50,000. But he selected his sample via “quota sampling” where he tried to get proportions in his sample matching those in the population for important groups. For example, the sample should have the same proportion of middle class urban women, lower class rural men, etc. September 28, 2015 STT 421: Vince Melfi

Literary Digest Poll (1936) Roosevelt won the election by a landslide Gallup’s poll predicted this. Literary digest went out of business shortly after 1936 Gallup polls are still conducted today. (But they don’t use “quota sampling” any more. There are better methods that we’ll learn about.) September 28, 2015 STT 421: Vince Melfi

Literary Digest Poll (1936) What went wrong for Literary Digest? They found their 10 million names in three places Their own readers (who tended to be affluent) Telephone registries (in 1936, at the height of the depression, many poorer people had no phone) Automobile registries (in 1936, at the height of the depression, many poorer people had no phone) So the sample wasn’t representative of the population. In fact it overrepresented the wealthy September 28, 2015 STT 421: Vince Melfi

Randomization How do we avoid bias even if we don’t know much about the population? The key idea is randomization. By choosing people “at random” we guard against potential biases. There are many sampling methods that employ randomization. One of the most basic is “simple random sampling.” September 28, 2015 STT 421: Vince Melfi

Population and Sample The population is the group we’re interested in. Numerical characteristics of the population are called parameters. The sample is the group we’re able to collect data on Numerical characteristics of the sample are called statistics. September 28, 2015 STT 421: Vince Melfi

Population and Sample Example: 1936 election prediction. Population is all those who will vote. Parameter of interest is p, the proportion of those who vote who will vote for Roosevelt Statistic we’d calculate from the sample is the proportion in the sample who say they’ll vote for Roosevelt, denoted 𝑝 September 28, 2015 STT 421: Vince Melfi

Simple Random Sample A simple random sample of size n is drawn in such a way that every sample of size n from the population has the same chance of being selected. Example: Population is A, B, C, D. n=2 {A,B}, {A,C}, {A,D}, {B,C}, {B,D}, {C, D} are all the samples of size 2. All should have the same chance of being selected. September 28, 2015 STT 421: Vince Melfi

“Good” samples aren’t so easy to obtain Example: In an election poll, how do you determine who will actually vote, to avoid having people in your sample who are registered voters but won’t vote? Even ignoring this, how do you deal with people who refuse to answer, who lie, who will change their vote by the time of the election, etc? September 28, 2015 STT 421: Vince Melfi

The Salk polio vaccine study Polio was a very feared disease in the first half of the 20th century Franklin Roosevelt contracted polio and was partially paralyzed Polio is caused by a virus Not all cases of polio cause severe symptoms: Some mild cases are hard to distinguish from other illnesses February 13, 2013 STT 200: Vince Melfi

The Salk polio vaccine study Two references (class material largely drawn from the second): “Polio: An American Story.” by David Oshinsky “The Biggest Public Health Experiment Ever: The 1954 Field Trial of the Salk Poliomyelitis Vaccine.” by Paul Meier February 13, 2013 STT 200: Vince Melfi

The early 1950s In the early 1950s there were two vaccines under development that had substantial promise A “live virus” vaccine developed by Albert Sabin A “killed virus” vaccine developed by Jonas Salk Based on preliminary data, it was decided to do a large-scale study of the effectiveness of the Salk vaccine The vaccine was NOT expected to be 100% effective February 13, 2013 STT 200: Vince Melfi

A Simple Study Safety of the vaccine was not a worry A simple plan: Make the vaccine available as widely as possible; let subjects (or their parents) volunteer to get the vaccine. See whether and how much the rate of polio drops This is an observational study February 13, 2013 STT 200: Vince Melfi

Which of these are potential problems with the simple idea of distributing the vaccine widely and comparing the rate of polio with that in the past? (a) If the rate drops, we don’t know whether the drop is due to the vaccine or other factors (b) Those who volunteer may have different health characteristics than those who do not (c) Since polio is hard to diagnose, doctors who know a patient is vaccinated might be less likely to diagnose polio February 13, 2013 STT 200: Vince Melfi

February 13, 2013 STT 200: Vince Melfi

Adding a control group A control group (people who would not have the opportunity to receive the vaccine) can help with some of the issues A suggestion: Offer (but do not require) vaccination for all second graders (the treatment group) Don’t offer vaccination to others First and third graders form the control group February 13, 2013 STT 200: Vince Melfi

Which of these are potential problems with the modified study which includes a control group? (a) Those who volunteer may have different health characteristics than those who do not (b) Since polio is hard to diagnose, doctors who know a patient is vaccinated might be less likely to diagnose polio (c) There may be differences between the treatment and control group that affect the results February 13, 2013 STT 200: Vince Melfi

Experiment vs observational study Adding a control group moves us closer to a designed experiment February 13, 2013 STT 200: Vince Melfi

An experimental study Assign children at random to one of two groups: “Treatment” group: receives the polio vaccine “Placebo control” group: receives an injection of an innocuous serum that does not affect polio Children, parents, physicians, not allowed to know which children are in the control group and which are in the treatment group (a double-blind study) February 13, 2013 STT 200: Vince Melfi

Sample Size Polio was relatively rare, about 50 cases per 100,000 The vaccine was not expected to be 100% effective without further refinement Clearly a large sample size would be needed to detect effectiveness February 13, 2013 STT 200: Vince Melfi

If the incidence of polio is 50 per 100,000, the vaccine is 50% effective, and there are 40,000 children in the treatment group and 40,000 in the control group, how many children in the treatment group would be expected to contract polio? 20 40 50 10 February 13, 2013 STT 200: Vince Melfi

Results of first study Group Size # Poiio Cases Rate (per 100,000) Vaccine (2nd grade) 221,988 56 25 No vaccine (1st and 3rd grade) 725,173 391 54 Refused vaccine (2nd grade) 123,605 44 February 13, 2013 STT 200: Vince Melfi

Results of second study Group Size # Poiio Cases Rate (per 100,000) Vaccinated 200, 745 57 28 Placebo 201,229 142 71 February 13, 2013 STT 200: Vince Melfi