Stat 217 – Day 28 Review
The National Transportation Safety Board recently divulged a highly secret plan they had funded with the U.S. auto makers for the past five-years. The NTSB covertly funded a project whereby the auto makers were installing black boxes in four wheel drive pick- up trucks in an effort to determine, in fatal accidents, the circumstances in the last 15 seconds before the crash. They were surprised to find in 49 of the 50 states the last words of drivers in 61.2% of fatal crashes were, "Oh, Shit!" Only the state of Texas was different, where 89.3% of the final words were, "Hey Y'all, watch this!”
Last Time (p. 578) An observation is considered influential if removing it from the dataset substantially changes the least squares regression equation An observation is considered an outlier in the regression setting if it has a large residual (vertical distance from regression line) See also bottom p. 580
Last Time : FEV Smoker Obs unitFEV Non-smoker Avg age = Avg age = 9.54 The regression equation is predicted FEV = Age t-value = 29.53, p-value =.000
FEV
Example – Multiple Regression In a study, IQ test scores were obtained for 300 eight-year-old children who had been part of a study of premature babies in the early 1980s. The researchers reported the results of the regression of IQ on social class, mother’s education, gender, number of days of ventilation of the baby after birth, and an indicator of whether there was any breast milk in the baby’s diet.
Explanatory VariableCoefficientp Social class Mother’s education gender Days of ventilation Breast milk indicator8.3 <.001 After accounting for the effects of social class, mother’s education, gender, days of ventilation, the estimated mean IQ for those receiving breast milk was 8.3 pts higher than for those not receiving breast milk.
Example It is possible that the decision and ability to provide breast milk are related to social class and mother’s education which are likely to be related to child’s IQ score. It is desired to estimate the effect of breast milk that is separate from the association with these potentially confounding variables.
Lab 7 Comments Remember to stat Ho/Ha before you collect data Measurement units, symbols Interpretation of p-value:.2% of random assignments show a difference in sample means of 31 or larger when there is no difference in “population” means (no genuine treatment effect) 95% confident that the butterscotch (population) mean is 10.5 to 51.4 seconds longer
Lab 7 Comments Once look at “paired” data, can no longer use a two-sample t-test One sample t-test on difference Parameter, = mean difference for all Cal Poly students 95% confident that on average butterscotch chips take 6 sec to 20.6 sec longer Differences have much less variability so easier to see butterscotch vs. choc chip differences 24 of 33 people took longer with butterscotch
Lab 7 Comments Significantly different from zero? How big is the average difference? Are you able to draw a cause and effect representative of all Cal Poly students on this issue? p-value Confidence interval Random order Not a random sample but may not differ on this issue?
HW 7 Comments Proportion of gingivitis surfaces cured At least one treatment leads to different likelihood of curing gingivitis Higher number of cured surfaces in high treatment group than expected
To Do Lab 9 due tomorrow Course evaluations Thursday in lab Mandatory! Time in lab: review case study Review Questions Handout online, Review problems HW 7 solutions Review session Tuesday 6-8pm?
Finals Week Final review Discussion Board forum Office Hours Monday 9-3pm Graded HW 7, Lab 9 available Monday? Final Exam Wednesday 1:10-4 Tuesday evening review session?
Format of Final Exam Multiple choice (~30-40 min, ~20%) Big ideas, not memorization, little calculation (calculator) Short, long written answers Like midterms, partial credit Open 3 pages of notes (to be turned in), Minitab, TOS Calculator applet Cumulative though some emphasis on more recent material Some very familiar questions Recognizing which procedure to use Sampling distributions!
Food for thought Simulation vs. (CLT) analysis Descriptive statistics vs. inferential statistics Confidence intervals Other study aids: Multiple choice review projects in BB (emphasizing terminology, earlier material) Jeopardy in evenings!