Download presentation
Presentation is loading. Please wait.
1
Stat 512 – Lecture 18 Multiple Regression (Ch. 11)
2
Projects Guidelines handout (one per group) Project Do’s and Don’ts Presentations 5 minutes Will take volunteers for Tuesday Hit the highlights Tons more detail for me in your paper Sample hypothetical presentation Plan for technology in advance! Group evaluation form next week
3
Last Time – Inference for Regression Use residual plots to check technical conditions Linearity: If residuals vs. EV/fitted values does not show pattern (e.g., curvature) will assume original relationship was linear Independence: If have random sample or randomization, we will assume independence Normality: If histogram of residuals is reasonably normal, will assume condition distributions at each x are all normal Will be a bit more forgiving on this condition with large n Equal standard deviation: If residuals vs. EV/fitted values shows equal vertical spread across all the EV values, will assume conditions distributions at each x have same SD
4
Last Time – Inference for Regression Null hypothesis H 0 : no association between RV and EV (identify) H 0 : population slope = 0 H o : no treatment effect from EV on RV (identify) Minitab/SAS output Assumes two-sided alternative t = observed slope – hypothesized slope SE(observed slope) d.f. = n-2 Equivalent to p-value reported by Minitab with correlation coefficient
5
PP – Money Making Movies box office = - 42.9 + 1.86 score
6
PP – Money Making Movies
7
Consequence: restrict population to movies earning less than $200 million
8
PP – Money Making Movies Is the relationship statistically significant? Highly significant (p-value <.001) But not all that useful (r 2 = 8.9%) Not a cause and effect relationship Not clear what population this represents
9
Can we improve on these models? Adding predictor variables to the model Average response = 0 + 1 x 1 + x 2 + … Graphical displays Interpreting coefficients Interpreting R 2 Inference for model, coefficients Checking technical conditions (the same!)
10
Three variables…
11
Summary Both mileage and number of stops appear useful in predicting cost, even after controlling for the other variable. If number of stops held constant, each additional mile costs about 5 cents… The regression on mileage and number of stops allows us to explain 45.5% of the variability in airfares from LAX
12
Summary Can restrict population to “deal with” extreme outlier A statistically significant predictor individually, may not be significant when added to a model Overall F test just tells you that at least one of the slopes is non zero, use t tests to examine them individually
13
Summary If you want to remove variables from the model, do so one at a time as p-values will change each time Can use a 0-1 variable in the model. Interpret slope coefficient as average change in response between group 0 and group 1 (assuming the same relationship between response and explanatory) Otherwise consider interactions….
14
Multicollinearity
15
For Tuesday Have a great Thanksgiving! Check Final Exam Schedule on Web One Wed person back to Friday Submit PP (choice of procedure) HW 8 For Thursday Submit last PP (review questions) Review sheet will be posted online Presentations!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.