Download presentation
Presentation is loading. Please wait.
1
Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)
2
Last Time – Two Quantitative Variables Question: Is there an association between the two variables? Graphical summary: Scatterplot of response variable vs. explanatory variable (horizontal) Description: Direction, form, strength Numerical summary: If linear, r, Pearson’s correlation coefficient, -1 < r < 1
3
Practice Problem r =.504 r =.938
4
Temperatures vs. time r =.029 Always plot the data!!
5
Least-Squares Regression Line Model: Least Squares Regression Line Minimize sum of squared residuals Response-hat = a + b explanatory a = intercept, predicted value when explanatory=0 b = slope, predicted change in response associated with an increase in explanatory variable by 1 unit Use regression line for making predictions Warnings: regression line is not resistant Influential observation = removing the value changes the regression equation Outliers = extreme residual value
6
R2R2 If predict everyone to have the same height, lots of “unexplained” variation (SSE = 475.75) If take explanatory variable into account, much less “unexplained” variation (SSE = 235) % change=(475.75-235) 475.75 = 50.6%
7
R2R2 Of the variability in the heights, 50.6% of that variation is explained by this regression on foot length BAD 50.6% of points lie on the line 50.6% of predictions will be correct
8
Example 1 (cont.): Airline costs Each flight has a ‘set up’ cost of $151 and each additional mile of travels is associated with an predicted increase in cost of about 7 cents. 19.3% of the variability in airfare is explained by this regression on distance (still lots of unexplained variability) Might investigate further while the cost for ACK was so much higher than expected
9
Inference in Regression Is the relationship between the two variables statistically significant? Need to understand how the behavior/variability of regression lines from different random samples
10
Example 1: House Prices Observational units, variable Houses, price (quantitative) Positive linear moderately strong association Larger houses tend to cost more! Predicted price = 65930 + 202.4 sq ft, r 2 =42.1% Perhaps houses in Northern CA tend to be a bit more expensive even for the same size, but not a huge difference
11
Example 1: House Prices H 0 : no association between price and size H 0 : = 0 H a : > 0, there is a positive association between price and size
12
Example 1: House Prices 1) Curvature? Not really 2) normality? no3) independence? Random sample 4) Equal spread? No
13
Example 1: House Prices 1) Curvature? No 2) normality? better3) independence? Random sample 4) Equal spread? Yes
14
Simulating Regression Lines Sampling variabiltiy A slope of.8899 would be quite surprising!.8899 is more than 7 standard errors from 0! p-value <.001, less than.1% of random samples from a population with =0 would see such an extreme sample slope by chance alone
15
Example 2: Airfare Costs H 0 : = 0 (no association between price and distance) H a : > 0 (cities further away are associated with more expensive flights) p-value =.002/2 =.001 Strong evidence against the null hypothesis, statistically significant evidence of a positive relationship between price and distance BUT residual plots don’t look so great
16
Example 3: Money-Making Movies
17
For Tuesday Skim non-starred sections of Ch. 11 Submit PP 15 in Blackboard by 3pm Submit Project Report 3 in class (see syllabus for details)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.