Stat 217 – Day 10 Review
Last Time Judging “spread” of a distribution “Empirical rule”: In a mound-shaped symmetric distribution, roughly 68% of observations fall within one standard deviation of the mean, 95% within two standard deviations of the mean 2SD = width of middle 68% of distribution Z-scores measure the relative position of an observation and provide us a unitless measuring stick for how far an observation falls from mean Very useful for comparing values from different distributions Boxplots – visual display of five number summary Helpful for comparing distributions (spread, center)
Comments on HW 2 Problem 2: Identify terms Sampling frame is the list of the population used to select the sample Does not include the response variable information! (b) average number of words on a page of a textbook (d) tend to gain an average of 15 lbs?
Comments on HW 2 Act 5-14: Studies from Blink (a) and (b) only had response variables, observational studies (c) and (d) had 2 variables and the explanatory variable was randomly assigned, experiments So in (c) and (d) can potentially draw cause and effect conclusions “Generalizability” means can you take information from sample and apply it to the larger population? “There was not a significance difference in SAT performance in the sample so I don’t think there is in the population as well” Yes if have random sample, so maybe only in (a)
Comments on HW 2 Question 4: Hand hold (b) Can the status of the EV be determined by Ashleigh? Gender of participant vs. gender of researcher Random sampling vs. random assignment
Comments on HW 2 Cause and effect vs. generalizing to population YesNo Yes No Were groups randomly assigned? Were obs units randomly selected? Can draw cause-and- effect conclusions Can generalize to larger population
Comments on HW 2 Question 5: AIDS testing Most of you got the table right but then read the wrong proportion from the table Of those who tested positive, what proportion had AIDS = 4885/78515 =.062 Of those who have AIDS, what proportion test positive = 4885/5000 =.977 (sensitivity) Positive testNegative testTotal Carries AIDS virus(2) 4885(2) 115(1) 5000 Does not carry AIDS(3)73630(3) (1) 995,000 Total(4) 78515(4) ,000,000
Lab 2 Notes (model online) Comparing groups Are people yawning a lot vs. does the yawn seed group yawn more often Overall proportion vs. Difference in conditional proportions 4.4% vs. 4.4 percentage points Yawned “a lot more” vs. “yawned a lot more often” Interpreting p-value vs. conclusions from p- value Probably want to explicitly compare p-value to some cut-off
Lab 2 Notes Interpretation of p-value If those subjects were going to yawn, regardless of which condition they were in, how often would the random assignment process alone lead to such a large difference in the conditional proportions? Each dot represents one (fake) random assignment Observation units = 1000 fake random assignments Variable = difference in conditional proportions Roughly 51% of fake random assignments (null model) saw a difference at least this large Don’t consider this a small p-value since >.05
Lab 2 Notes Effect of sample size
Challenge Question Why was “random assignment” used in the study? Why did we shuffle the cards and deal them into 2 groups?
Lab 3 Randomization distribution If everyone was going to remember the same number of letters regardless of which sequence they got, how often would the random assignment process alone lead to such a big difference in the group means? Each dot represents one random assignment Observation units = 1000 fake random assignments Variable = difference in group means Where is the observed difference in means in this distribution?
About Exam 1 50 minutes, 50 points Will include one of the self-check activities Bring calculator, pencil, eraser Could be asked to use Minitab and/or to interpret Minitab output No cell phone calculators (square root) One 8.5x11 sheet of own notes Both sides ok I will supply paper
Some advice for studying Review handout, problems online See also p. 627? Review lecture notes, text, hws, labs See me for old homework, inclass activities Work problems Start with ideas that we have emphasized more often
Some advice during exam If you get stuck on a problem, move on later parts, later problems Try to hit the highlights in your answer (e.g., not all sources of bias, just the most serious) Be succinct (think before you write) Read the question carefully Show all of your work, explain well communication points Read entire question before writing anything
Some big, big ideas Observational units, variable Random assignment vs. random sampling Implementation Purpose Consequence (Scope of conclusions) What see in sample vs. saying something beyond the sample Statistic vs. Parameter Statistical significance Interpretations, reasoning Properties, “what if” questions… How are you deciding this?
Activity 4-19: Voter Turnout (p. 70) Statistic:.682 proportion claiming to vote Parameter:.490 proportion claiming to vote What are some possible explanations for why these values differ? Those in sample do not represent population Those in sample were not honest Statistics vary from sample to sample and may differ from parameter by chance Which of these explanations can we eliminate?
No longer believe it was just “by chance”…
Questions?