Statistical Study Design: 1. Identify individuals of interest 2. Specify variables and protocols for measurements 3. Decide on sampling method 4. Collect the data 5. Use appropriate descriptive and inferential techniques 6. Identify possible errors
Data Collection: Census Sampling Simulation Experiment Observation Survey Measurements or observations of an entire population
Data Collection: Census Sampling Simulation Experiment Observation Survey Measurements or observations of a part of a population
Data Collection: Census Sampling Simulation Experiment Observation Survey Use a mathematical or physical model to reproduce a dangerous or difficult situation
Data Collection: Census Sampling Simulation Experiment Observation Survey Apply a treatment to part of a population, observe resulting change in variable of study. Must be controlled “Completely randomized experiment” Placebo Double blind
Data Collection: Census Sampling Simulation Experiment Observation Survey Observations and measurements are made in a way that will not change the variable of study.
Data Collection: Census Sampling Simulation Experiment Observation Survey Can be used for either a census or a sampling. Can easily be biased...
Bias - When the sample doesn’t really represent the population Are questions asked in a neutral way? “Don’t you think the President could be doing a better job?” Will the location provide an appropriate cross-section? Conducting a survey on US military presence in 3 rd world countries - outside the gates of an army base. Is the surveyor neutral? Uniformed police officer surveying college students on drug use.
Lurking variables A variable that is not measured in the study, but has some influence over the measured variables. WW II bombing accuracy studies: - Higher altitude ↔ lower accuracy - Different bomber models have different accuracies - The presence of enemy fighters ↔ higher accuracy What’s the lurking variable? Another example: # of firefighters at a fire ↔ damage done
Confounded variables Confounded variables both vary in the study, and in a way that makes it difficult to tell which one is causing the effect. Offer both a low interest rate and low fee to one group of customers and a higher interest rate and higher fee to another. You'll never be able to tell whether customers were more motivated by the difference in interest rate or the difference in fee.
Sampling Techniques: Simple random Stratified Systematic Cluster Convenience Every possible sample of the same size has an equal chance of being chosen
Sampling Techniques: Population has multiple segments - each segment is represented proportionally Simple random Stratified Systematic Cluster Convenience
Sampling Techniques: Sample every N th member starting at some random point Simple random Stratified Systematic Cluster Convenience
Sampling Techniques: Population is broken into subgroups - survey all of one or more subgroups Simple random Stratified Systematic Cluster Convenience
Sampling Techniques: Sample whatever members are easily at hand Likely to be biased! Simple random Stratified Systematic Cluster Convenience