Download presentation
Presentation is loading. Please wait.
Published byMarvin Poole Modified over 9 years ago
1
STT 421 Day 7: September 28, 2015 September 28, 2015
STT 421: Vince Melfi
2
Sample Surveys Want to learn something about a (often large) group called the population. We only can collect data on a subset of the population, called the sample. We’d like the sample to be “representative” of the population. If a sampling method over or under represents an important characteristic, it’s called biased. September 28, 2015 STT 421: Vince Melfi
3
Literary Digest Poll (1936)
Goal: Predict the outcome of the 1936 presidential election between Roosevelt and Landon Literary digest magazine mailed out 10 million surveys and got 2.4 million responses. Of those who responded, 57% preferred Landon to Roosevelt. On the basis of this (large!) sample, Literary Digest predicted a landslide victory for Landon September 28, 2015 STT 421: Vince Melfi
4
Literary Digest Poll (1936)
George Gallup, a pollster, also tried to predict the outcome of the election He had a smaller sample size of 50,000. But he selected his sample via “quota sampling” where he tried to get proportions in his sample matching those in the population for important groups. For example, the sample should have the same proportion of middle class urban women, lower class rural men, etc. September 28, 2015 STT 421: Vince Melfi
5
Literary Digest Poll (1936)
Roosevelt won the election by a landslide Gallup’s poll predicted this. Literary digest went out of business shortly after 1936 Gallup polls are still conducted today. (But they don’t use “quota sampling” any more. There are better methods that we’ll learn about.) September 28, 2015 STT 421: Vince Melfi
6
Literary Digest Poll (1936)
What went wrong for Literary Digest? They found their 10 million names in three places Their own readers (who tended to be affluent) Telephone registries (in 1936, at the height of the depression, many poorer people had no phone) Automobile registries (in 1936, at the height of the depression, many poorer people had no phone) So the sample wasn’t representative of the population. In fact it overrepresented the wealthy September 28, 2015 STT 421: Vince Melfi
7
Randomization How do we avoid bias even if we don’t know much about the population? The key idea is randomization. By choosing people “at random” we guard against potential biases. There are many sampling methods that employ randomization. One of the most basic is “simple random sampling.” September 28, 2015 STT 421: Vince Melfi
8
Population and Sample The population is the group we’re interested in.
Numerical characteristics of the population are called parameters. The sample is the group we’re able to collect data on Numerical characteristics of the sample are called statistics. September 28, 2015 STT 421: Vince Melfi
9
Population and Sample Example: 1936 election prediction.
Population is all those who will vote. Parameter of interest is p, the proportion of those who vote who will vote for Roosevelt Statistic we’d calculate from the sample is the proportion in the sample who say they’ll vote for Roosevelt, denoted 𝑝 September 28, 2015 STT 421: Vince Melfi
10
Simple Random Sample A simple random sample of size n is drawn in such a way that every sample of size n from the population has the same chance of being selected. Example: Population is A, B, C, D. n=2 {A,B}, {A,C}, {A,D}, {B,C}, {B,D}, {C, D} are all the samples of size 2. All should have the same chance of being selected. September 28, 2015 STT 421: Vince Melfi
11
“Good” samples aren’t so easy to obtain
Example: In an election poll, how do you determine who will actually vote, to avoid having people in your sample who are registered voters but won’t vote? Even ignoring this, how do you deal with people who refuse to answer, who lie, who will change their vote by the time of the election, etc? September 28, 2015 STT 421: Vince Melfi
12
The Salk polio vaccine study
Polio was a very feared disease in the first half of the 20th century Franklin Roosevelt contracted polio and was partially paralyzed Polio is caused by a virus Not all cases of polio cause severe symptoms: Some mild cases are hard to distinguish from other illnesses February 13, 2013 STT 200: Vince Melfi
13
The Salk polio vaccine study
Two references (class material largely drawn from the second): “Polio: An American Story.” by David Oshinsky “The Biggest Public Health Experiment Ever: The 1954 Field Trial of the Salk Poliomyelitis Vaccine.” by Paul Meier February 13, 2013 STT 200: Vince Melfi
14
The early 1950s In the early 1950s there were two vaccines under development that had substantial promise A “live virus” vaccine developed by Albert Sabin A “killed virus” vaccine developed by Jonas Salk Based on preliminary data, it was decided to do a large-scale study of the effectiveness of the Salk vaccine The vaccine was NOT expected to be 100% effective February 13, 2013 STT 200: Vince Melfi
15
A Simple Study Safety of the vaccine was not a worry
A simple plan: Make the vaccine available as widely as possible; let subjects (or their parents) volunteer to get the vaccine. See whether and how much the rate of polio drops This is an observational study February 13, 2013 STT 200: Vince Melfi
16
Which of these are potential problems with the simple idea of distributing the vaccine widely and comparing the rate of polio with that in the past? (a) If the rate drops, we don’t know whether the drop is due to the vaccine or other factors (b) Those who volunteer may have different health characteristics than those who do not (c) Since polio is hard to diagnose, doctors who know a patient is vaccinated might be less likely to diagnose polio February 13, 2013 STT 200: Vince Melfi
17
February 13, 2013 STT 200: Vince Melfi
18
Adding a control group A control group (people who would not have the opportunity to receive the vaccine) can help with some of the issues A suggestion: Offer (but do not require) vaccination for all second graders (the treatment group) Don’t offer vaccination to others First and third graders form the control group February 13, 2013 STT 200: Vince Melfi
19
Which of these are potential problems with the modified study which includes a control group? (a) Those who volunteer may have different health characteristics than those who do not (b) Since polio is hard to diagnose, doctors who know a patient is vaccinated might be less likely to diagnose polio (c) There may be differences between the treatment and control group that affect the results February 13, 2013 STT 200: Vince Melfi
20
Experiment vs observational study
Adding a control group moves us closer to a designed experiment February 13, 2013 STT 200: Vince Melfi
21
An experimental study Assign children at random to one of two groups:
“Treatment” group: receives the polio vaccine “Placebo control” group: receives an injection of an innocuous serum that does not affect polio Children, parents, physicians, not allowed to know which children are in the control group and which are in the treatment group (a double-blind study) February 13, 2013 STT 200: Vince Melfi
22
Sample Size Polio was relatively rare, about 50 cases per 100,000
The vaccine was not expected to be 100% effective without further refinement Clearly a large sample size would be needed to detect effectiveness February 13, 2013 STT 200: Vince Melfi
23
If the incidence of polio is 50 per 100,000, the vaccine is 50% effective, and there are 40,000 children in the treatment group and 40,000 in the control group, how many children in the treatment group would be expected to contract polio? 20 40 50 10 February 13, 2013 STT 200: Vince Melfi
24
Results of first study Group Size # Poiio Cases Rate (per 100,000) Vaccine (2nd grade) 221,988 56 25 No vaccine (1st and 3rd grade) 725,173 391 54 Refused vaccine (2nd grade) 123,605 44 February 13, 2013 STT 200: Vince Melfi
25
Results of second study
Group Size # Poiio Cases Rate (per 100,000) Vaccinated 200, 745 57 28 Placebo 201,229 142 71 February 13, 2013 STT 200: Vince Melfi
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.