Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)

Similar presentations


Presentation on theme: "Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)"— Presentation transcript:

1 Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)

2 Last Time – Two Quantitative Variables Question: Is there an association between the two variables? Graphical summary: Scatterplot of response variable vs. explanatory variable (horizontal)  Description: Direction, form, strength Numerical summary: If linear, r, Pearson’s correlation coefficient, -1 < r < 1

3 Practice Problem r =.504 r =.938

4 Temperatures vs. time r =.029 Always plot the data!!

5 Least-Squares Regression Line Model: Least Squares Regression Line  Minimize sum of squared residuals Response-hat = a + b explanatory  a = intercept, predicted value when explanatory=0  b = slope, predicted change in response associated with an increase in explanatory variable by 1 unit  Use regression line for making predictions Warnings: regression line is not resistant  Influential observation = removing the value changes the regression equation  Outliers = extreme residual value

6 R2R2 If predict everyone to have the same height, lots of “unexplained” variation (SSE = 475.75) If take explanatory variable into account, much less “unexplained” variation (SSE = 235) % change=(475.75-235) 475.75 = 50.6%

7 R2R2 Of the variability in the heights, 50.6% of that variation is explained by this regression on foot length BAD  50.6% of points lie on the line  50.6% of predictions will be correct

8 Example 1 (cont.): Airline costs Each flight has a ‘set up’ cost of $151 and each additional mile of travels is associated with an predicted increase in cost of about 7 cents. 19.3% of the variability in airfare is explained by this regression on distance (still lots of unexplained variability) Might investigate further while the cost for ACK was so much higher than expected

9 Inference in Regression Is the relationship between the two variables statistically significant?  Need to understand how the behavior/variability of regression lines from different random samples

10 Example 1: House Prices Observational units, variable  Houses, price (quantitative) Positive linear moderately strong association  Larger houses tend to cost more! Predicted price = 65930 + 202.4 sq ft, r 2 =42.1% Perhaps houses in Northern CA tend to be a bit more expensive even for the same size, but not a huge difference

11 Example 1: House Prices H 0 : no association between price and size  H 0 :  = 0 H a :  > 0, there is a positive association between price and size

12 Example 1: House Prices 1) Curvature? Not really 2) normality? no3) independence? Random sample 4) Equal spread? No

13 Example 1: House Prices 1) Curvature? No 2) normality? better3) independence? Random sample 4) Equal spread? Yes

14 Simulating Regression Lines Sampling variabiltiy A slope of.8899 would be quite surprising!.8899 is more than 7 standard errors from 0! p-value <.001, less than.1% of random samples from a population with  =0 would see such an extreme sample slope by chance alone

15 Example 2: Airfare Costs H 0 :  = 0 (no association between price and distance) H a :  > 0 (cities further away are associated with more expensive flights) p-value =.002/2 =.001 Strong evidence against the null hypothesis, statistically significant evidence of a positive relationship between price and distance BUT residual plots don’t look so great

16 Example 3: Money-Making Movies

17 For Tuesday Skim non-starred sections of Ch. 11 Submit PP 15 in Blackboard by 3pm Submit Project Report 3 in class (see syllabus for details)


Download ppt "Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)"

Similar presentations


Ads by Google