Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 512 – Day 5 Statistical significance with quantitative response variable.

Similar presentations


Presentation on theme: "Stat 512 – Day 5 Statistical significance with quantitative response variable."— Presentation transcript:

1 Stat 512 – Day 5 Statistical significance with quantitative response variable

2 Last Time – Summarizing Quantitative Variables Graphical summaries: (parallel) dotplots, boxplots, stemplots, histograms  Shape (skewed?, “even”?), center, spread, unusual observations  Try several different graphs, scalingsscalings Numerical summaries  Center: median (five-number summary), mean Mean = average of all values (not “resistant”) Median = “typical” value  Spread: interquartile range (IQR=Q3-Q1), standard deviation

3 Last Time – Summarizing Quantitative Variables (cont.) Interquartile range  Width of middle 50% of data values Length of box in boxplot 1978 IQR = 81-58 = 23 min 2003 IQR = 98-87 = 11 min Without outliers IQR = 98-87 = 11 min

4 Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR Standard deviation  Want to compare the distance of the observations from the mean Deviation from mean: y i - Absolute deviations Squared deviations

5 Old Faithful  1978 SD = 13 minutes  2003 SD = 8.5 minutes Without outliers SD=6.9 (  SD is not resistant!)

6 Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR  Width of middle 50% of data values Length of box in boxplot Standard deviation, s  Want to compare the distance of the observations from the mean Loose interpretation as a typical deviation from the mean of the data values

7 Example 3 What do we mean by variability? Most among classes A, B, C Least among classes A, B, C Most between C and D Most between D and E What about F?

8 PP 4 Quartiles Vocabulary  “Normal”  “In lower quartile” Center vs. spread  SF vs. Raleigh (59.5 and 57 degrees)  Quality control, freezer, distance of homeruns, test scores in a class, real estate prices, drumstick breakage strength, times of cross country team, medical team response time, commuting times, earthquake strengths, weight loss

9 Notes on Using Minitab Worksheets vs. Projects Saving graph windows Stacked vs. unstacked data

10 What’s Left? Have learned how to perform descriptive statistics with a quantitative response variable Found is a difference in average rainfall amounts between seeded and unseeded clouds (442 acre-feet vs. 164.6 acre-feet). Are you convinced that this reflects a true treatment effect from cloud seeding?

11 Example 1: Sleep Deprivation and Visual Learning “Visual discrimination learning requires post- training sleep,”  Stickgold, R., James, L., & Hobson, J.A.  Nature Neuroscience, 2:1237-1238, 2000.

12 Example 1 Sleep groupSample size Mean improvement Median improvement Deprived113.904.50 Unrestricted1019.8216.55

13 Example Summary These data come from a randomized, comparative experiment. The dotplots and descriptive statistics reveal that the sleep- deprived subjects tended to have lower improvements than those permitted unrestricted sleep. But is this difference statistically significant?

14 How Decide?

15 All possible random assignments

16 Example Summary (cont.) Randomization alone rarely produced group differences in group means as extreme as in the actual study (the p-value is less than.01). Thus, we have fairly strong evidence that the learning improvements are genuinely lower with the sleep-deprived subjects. Moreover, because this was a randomized experiment, we can draw a causal conclusion that the sleep deprivation was the cause.

17 Example 2 Actual study Hypothetical data

18 Example 3: Lifetimes of Notables Writers (n=20)Scientists (n=20) 92 53 348 950389 76622100666 75170357789 95308679Leaf unit = 1 year 09004

19 Example Summary - descriptive Graphical and numerical summaries reveal that scientists do tend to live longer than writers, and the difference in median lifetimes is 10 years (76 for scientists, 66 for writers). Both distributions are roughly symmetric, perhaps a bit skewed to the left. The lifetimes vary more for the writers in that they range from the 20s through 90 years, as opposed to scientists ranging from the 40s through the 90s, but on the other hand, the writers’ lifetimes have a strong concentration in the 60s. Neither group has obvious outliers.

20 Example Summary- inferential The simulation reveals that the approximate p-value for comparing the group means is about.07. This suggests that if there was no difference between the groups, it is unlikely, but not terribly so, for such a large difference to occur by chance alone. However, we cannot attribute the longer lifetimes to the choice of occupation, because this observational study does not control for confounding variables. One explanation for the observed tendency is that scientists require more formal training in order to succeed than writers, so someone who dies young but famous is more likely to have achieved fame as a writer than as a scientist.

21 flexibilty

22 For Thursday PP 5  See HW handout for graph Reading  Finally start talking about selecting the observational/experimental units in the first place HW 3 Perhaps some time for groups to meet together and brainstorm?


Download ppt "Stat 512 – Day 5 Statistical significance with quantitative response variable."

Similar presentations


Ads by Google