Stat 512 – Day 5 Statistical significance with quantitative response variable
Last Time – Summarizing Quantitative Variables Graphical summaries: (parallel) dotplots, boxplots, stemplots, histograms Shape (skewed?, “even”?), center, spread, unusual observations Try several different graphs, scalingsscalings Numerical summaries Center: median (five-number summary), mean Mean = average of all values (not “resistant”) Median = “typical” value Spread: interquartile range (IQR=Q3-Q1), standard deviation
Last Time – Summarizing Quantitative Variables (cont.) Interquartile range Width of middle 50% of data values Length of box in boxplot 1978 IQR = = 23 min 2003 IQR = = 11 min Without outliers IQR = = 11 min
Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR Standard deviation Want to compare the distance of the observations from the mean Deviation from mean: y i - Absolute deviations Squared deviations
Old Faithful 1978 SD = 13 minutes 2003 SD = 8.5 minutes Without outliers SD=6.9 ( SD is not resistant!)
Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR Width of middle 50% of data values Length of box in boxplot Standard deviation, s Want to compare the distance of the observations from the mean Loose interpretation as a typical deviation from the mean of the data values
Example 3 What do we mean by variability? Most among classes A, B, C Least among classes A, B, C Most between C and D Most between D and E What about F?
PP 4 Quartiles Vocabulary “Normal” “In lower quartile” Center vs. spread SF vs. Raleigh (59.5 and 57 degrees) Quality control, freezer, distance of homeruns, test scores in a class, real estate prices, drumstick breakage strength, times of cross country team, medical team response time, commuting times, earthquake strengths, weight loss
Notes on Using Minitab Worksheets vs. Projects Saving graph windows Stacked vs. unstacked data
What’s Left? Have learned how to perform descriptive statistics with a quantitative response variable Found is a difference in average rainfall amounts between seeded and unseeded clouds (442 acre-feet vs acre-feet). Are you convinced that this reflects a true treatment effect from cloud seeding?
Example 1: Sleep Deprivation and Visual Learning “Visual discrimination learning requires post- training sleep,” Stickgold, R., James, L., & Hobson, J.A. Nature Neuroscience, 2: , 2000.
Example 1 Sleep groupSample size Mean improvement Median improvement Deprived Unrestricted
Example Summary These data come from a randomized, comparative experiment. The dotplots and descriptive statistics reveal that the sleep- deprived subjects tended to have lower improvements than those permitted unrestricted sleep. But is this difference statistically significant?
How Decide?
All possible random assignments
Example Summary (cont.) Randomization alone rarely produced group differences in group means as extreme as in the actual study (the p-value is less than.01). Thus, we have fairly strong evidence that the learning improvements are genuinely lower with the sleep-deprived subjects. Moreover, because this was a randomized experiment, we can draw a causal conclusion that the sleep deprivation was the cause.
Example 2 Actual study Hypothetical data
Example 3: Lifetimes of Notables Writers (n=20)Scientists (n=20) Leaf unit = 1 year 09004
Example Summary - descriptive Graphical and numerical summaries reveal that scientists do tend to live longer than writers, and the difference in median lifetimes is 10 years (76 for scientists, 66 for writers). Both distributions are roughly symmetric, perhaps a bit skewed to the left. The lifetimes vary more for the writers in that they range from the 20s through 90 years, as opposed to scientists ranging from the 40s through the 90s, but on the other hand, the writers’ lifetimes have a strong concentration in the 60s. Neither group has obvious outliers.
Example Summary- inferential The simulation reveals that the approximate p-value for comparing the group means is about.07. This suggests that if there was no difference between the groups, it is unlikely, but not terribly so, for such a large difference to occur by chance alone. However, we cannot attribute the longer lifetimes to the choice of occupation, because this observational study does not control for confounding variables. One explanation for the observed tendency is that scientists require more formal training in order to succeed than writers, so someone who dies young but famous is more likely to have achieved fame as a writer than as a scientist.
flexibilty
For Thursday PP 5 See HW handout for graph Reading Finally start talking about selecting the observational/experimental units in the first place HW 3 Perhaps some time for groups to meet together and brainstorm?