Least Square Regression More info (as usual) and Computer Output with Interpretations
Where did it get it’s name? The sum of all the errors squared is called the total sum of squared errors (SSE). Calculate the error (residual) and square it.
Population vs. Sample
Four Key Properties of LSR The LSR passes through the point The LSR sum of residuals is zero. The LSR sum of residuals squared is an absolute minimum. The histogram of the residuals for any value of x has a normal distribution (as does the histogram of all the residuals in the LSR)—normally distributed. Look at residual plot to check for constant variance.
Has the number of flights increased over the past years? Searching the Internet to find information on air travel in the United States, data was found on a number of commercial aircraft flying the United States during the years 1990-1998. The dates were recorded as years since 1990. Thus the year 1990 was recorded as year 0.
Has the number of flights increased?
Flights r = .99885 r2 = .99771
Flights—Computer Output Predictor Coef Stdev t-ratio p Constant 2939.93 20.55 143.09 0.000 Years 233.517 4.316 54.11 0.000 s = 33.43 Write the LSR equation from the computer output? Flights = 233.517(Years) + 2939.93
How tall is that building if you know how many stories are in it? 46 “tall” buildings were selected from all over the US. Height = 11.36·stories - 85
How tall is it? 198.6 ft 45 Is this a decent model? Why or why not? If a building had 10 stories, what would its height be? If a building stood 600 ft tall, how many stories would it have? 198.6 ft 45
Is there a relationship between body weight and height? 91 students’ body weight and corresponding height are in this sample.
Height to Weight?
Height to Weight? LSR Equation y-intcp slope P-value The regression equation is WEIGHT = - 205 + 5.09 HEIGHT Predictor Coef Stdev t-ratio p Constant -204.74 29.16 -7.02 0.000 HEIGHT 5.0918 0.4237 12.02 0.000 s = 14.79 R-sq = 61.6% R-sq(adj) = 61.2% Analysis of Variance SOURCE DF SS MS F p Regression 1 31592 31592 144.38 0.000 Error 90 19692 219 Total 91 51284 Unusual Observations Obs. HEIGHT WEIGHT Fit Stdev.Fit Residual 9 72.0 195.00 161.87 2.08 33.13 25 61.0 140.00 105.86 3.62 34.14 40 72.0 215.00 161.87 2.08 53.13 84 68.0 110.00 141.50 1.57 -31.50 slope P-value
Check for bell-shaped (normal) histogram of residuals Height to Weight? Check residual plot of residuals vs. height Check for bell-shaped (normal) histogram of residuals
Height to Weight?
Four Key Properties of LSR The LSR passes through the point HEIGHT Mean 68.717 WEIGHT Mean 145.15 The LSR sum of residuals is zero. SUM Residuals = -0.00029755 The LSR sum of residuals squared is an absolute minimum. SUM Squared Residuals = 19692 The histogram of the residuals for any value of x has a normal distribution (as does the histogram of all the residuals in the LSR)—normally distributed. Look at residual plot for constant variance.
Height to Weight? Hypothesis Testing of Slope, 1: Ho: 1= 0 Ha: 1 0 Predictor Coef Stdev t-ratio p Constant -204.74 29.16 -7.02 0.000 HEIGHT 5.0918 0.4237 12.02 0.000 Since the p-value is less than .05, we reject the null hypothesis and conclude that the slope does not equal zero.
Limitations to Predictions Interpolation Extrapolation Confidence Intervals estimate the mean value of the response variable for a particular value of x. Prediction Intervals (like confidence intervals) are used to describe the variation among individuals with a particular value of x.
Confidence Interval for Slope Formula: b1 t*(s.e. b1) The best estimate of slope is (-----, -----) at a 95% confidence interval.
Confidence Interval for Slope 0.0173098 0.00388426 ( .01342554, .02119406) Since the CI for slope does not contain zero, we can conclude the slope is not equal to zero.