Statistics The science of collecting, analyzing, and interpreting data. Planning A Study Using The Statistical Problem Solving Process: Ask a question of interest Collect some data Analyze and describe the data Make a conclusion, answering the question of interest
2 Types of Studies Observational Study Experimental Study -Record data observed or surveyed -No treatments imposed -Used to describe a group or situation -Impose treatments on subjects -Record results and compare groups -Used to see if the treatments cause a change in the response Experimental Study
Measuring Data from Study Subjects or Experimental Units Various Variables Explanatory (independent, x) variable: the treatment in an experiment or group label in an observation (may not exist in observational studies) Response (dependent, y) variable: the result measured in the end of every experimental and observational study Confounding variable: a variable that might exist in a study that influences the response but can’t be separated from the explanatory variable
Example of confounding A study sites that a group of children who had certain vaccinations were more likely to develop autism than a group of children who did not receive those same vaccinations. Does this mean that vaccinations cause autism? Explanatory: Response: Possible confounding: Effect of confounding: Whether or not they were vaccinated Whether or not they developed autism Vaccination group could have also given children some new diet or supplement that non-vaccination group didn’t give Vaccination group’s higher rate of autism may be tied to diet or supplement rather than vaccination
Census The systematical collection of data on every single subject in the population. When the population is large, it will be time consuming and expensive. Video on census/American Community Survey use at Target: http://www.census.gov/multimedia/www/videos/stats_in_action.php?intcmp=sldr4 Difference between ACS and Current Population Survey: http://www.census.gov/people/laborforce/publications/ACS-CPS_Comparison_Report.pdf
Observational Studies Subjects are randomly selected and asked questions or observed in a particular setting. Subjects are not influenced in how they respond.
Good Survey Questions Avoid unnecessary complexity to question Avoid misleading questions Randomize ordering of questions Ensure confidentiality Avoid influencing the subject by tone, appearance, or suggestion http://www.learner.org/vod/vod_window.html?pid=152 Video 17, start at 4:46, 2.5 min
Sources of bias in surveys If a selection process consistently obtains values too high or too low, then bias exists. Some group may be under (or over) represented. Response Bias: influencing the response in some way -Non-response bias: a group is left out because they feel uncomfortable, too busy, etc. Selection Bias : not randomly selected from the entire population of interest
Sampling Vocabulary Population of Interest the set of people or things you wish to know something about Sampling frame a list of all subjects from which the sample is taken What is the difference between the sampling frame and the population of interest? Sample a portion of the population that is selected to represent the population of interest Random sampling a way of getting a sample that reduces selection bias How could we ensure a sample is randomly selected? When is the sampling frame not the same as the population of interest?
Population Random Selection Sample
Sampling Methods Simple Random Sample (SRS) Stratified Random Sampling Cluster Sampling Systematic Sampling Multi-Stage Sampling Random Digit Dialing Self-Selected Sample Convenience Sample Judgment Sample “Quickie Polls” SRS and Stratified sampling methods are tested on the AP exam.
Simple Random Sampling From the entire population every unit has the same chance of belonging to the sample and every possible grouping of specified size has same chance of being selected. Like drawing names out of a hat
Stratified Sample vs. Cluster Sample some from all all from some 1st divide population into groups (strata), then take a Simple Random Sample from each strata (one or more slips from each hat) 1st divide population into groups (cluster), then randomly select some clusters and sample everyone in that cluster (all slips from one or two hats)
Systematic Sampling Random Digit Dialing From a list, randomly choose starting point (4th entry), and divide into consecutive segments (every 10 names), then sample at that same point in each segment (4, 14, 24, 34,…) Sample that approximates a SRS of all households that have telephones with a specific exchange (512-266-) Pew Research: http://www.people-press.org/methodology/sampling/random-digit-dialing-our-standard-method/
Samples typically resulting in biased results Self-Selected Sample--radio station call-in Convenience Sample--surveying folks in a mall who appear willing to talk to you Judgment Sample – surveying those you pick as an “expert” selector “Quickie Polls”--hastily designed, poorly pre-tested, one night survey sample for evening news show
Random Number Table 19223 95034 05756 28713 96409 12531 42544 82853 73676 47150 Assign a number label to each unit in the population Read numbers from table from left to right, starting anywhere. The subjects selected for the sample are those read from the table. Repeats or those not a part of the list are ignored.
Sampling & Lays potato chips http://www.learner.org/vod/vod_window.html?pid=152 Video 16, start at 6:35, about 2 minutes Nielsen tv ratings http://www.nielsen.com/us/en/nielsen-solutions/nielsen-measurement/nielsen-tv-measurement.html