Experimental Design Statistics 2126
Introduction Remember, population and sample Samples –1523 randomly chosen voters –6 Black capped chickadees –The first 18 people you run into in the mall
Samples A sample is considered biased if it differs from the population in some systematic way Nobody cares about samples, they tend to care about populations
Problems leading to bias Undercoverage The 1936 US election poll by the Literary Digest Landon in a landslide! Umm except it was FDR in a landslide Funny though, Gallup got it right
Sampling Frame Sort of an operational definition of a population Phone book is pretty good So a poor frame will lead to a poor sample
Another problem Nonresponse People that do not participate or do not answer More and more people distrust people calling them on the phone The people that return a survey are different than those that did not, so it leads to bias
Again.. Self selection Sorta goes along with nonresponse Accidental or voluntary response sample Only those that show the initiative participate ‘in our admittedly unscientific poll…’ Given way too much credence in the media
Bias Even with a good sample, bias can come in to play Leading questions Order of questions Experimenter effects
So how do you get an unbiased sample? You want a sample that is representative Measure everything? You cannot know everything Random sampling works! People don’t understand randomness BTW
Simple random sample Using a random number table Or drawing names from a hat But, don’t you pick them thinking you can make something random
Stratified random sample When you have a heterogeneous population and you want to be sure the get representation from various subgroups So you divide the population into homogeneous strata Sample from each stratum Either proportionally or equally
Multistage random sample Sort of going down level by level Used a lot in sociological research Keep sampling, and going down, until you get to individuals
Cluster Sampling Sampling frame is non existent Yet a list of clusters of units would be easy to get So sample randomly from clusters Then give test or whatever to every member of the cluster
Making causal claims For a causal relationship you need three things 1) Correlation 2) Temporal precedence 3) Elimination of other causes To do number 3 you want to manipulate one variable, and hold everything else constant
example Does money make people happy? Correlation between money and happiness But what way does it go? Give groups of people different amounts of money, say 5 vs 10 dollars Now measure happiness
Money money money Have we eliminated alternative explanations? Well maybe the groups were different at the outset How do we know? Could measure, but you would have to measure everything
What to do what to do… Trust randomness Randomization Ensures that our manipulation is what made any difference show up Internal vs. external validity
Important terms Independent variable –We control this Dependent variable –We measure this