Simple Linear Regression Often we want to understand the relationships among variables, e.g., SAT scores and college GPA car weight and gas mileage amount of a certain pollutant in wastewater and bacteria growth in local streams number of takeoffs and landings and degree of metal fatigue in aircraft structures Simplest relationship Y = α + βx
Example The owner of a small harness race track in Florida is interested in understanding the relationship between attendance at the track and the total amount bet each night. The data for a two-week period (10 racing nights) is as follows: Attendance, x Amount Bet ($000), Y 117 2.07 128 2.8 122 3.14 119 2.26 131 3.4 135 3.89 125 2.93 120 2.66 130 3.33 127 3.54
Estimating the Regression Coefficients Method of Least Squares Determine a and b (estimates for α and β) so that the sum of the squares of the residuals is minimized.) Steps: Calculate b using and a using
For Our Example b = _______________________________________ Night Attendance, x Amount Bet, Y xiyi xi2 1 117 2.07 242.19 13689 2 128 2.8 358.4 16384 3 122 3.14 383.08 14884 4 119 2.26 268.94 14161 5 131 3.4 445.4 17161 6 135 3.89 525.15 18225 7 125 2.93 366.25 15625 8 120 2.66 319.2 14400 9 130 3.33 432.9 16900 10 127 3.54 449.58 16129 TOTAL 1254 30.02 3791.09 157558 b =((10*3791.09)-(1254*30.02))/((10*157558)-1254^2) = 0.086755875 a = (30.02/10) – 0.0868*(1254/10) = -7.877186684 b = _______________________________________ a = ______________________________
What does this mean? We can draw the regression line that describes the relationship between attendance and amount bet: We can also predict amount bet based on attendance.
How good is our prediction? Estimating the variance: Coefficient of determination, R2 a measure of the “quality of fit,” or the proportion of the variability explained by the fitted model. SSE = sum(residuals2)= 0.639015 s2 = SSE/8 = 0.079876917 SST = Σ(Yi - Y)2 = 2.94516 R2 = 1-(SSE/SST) = 1-(0.639/2.945)=0.783029 (see next page)
Calculations … Night Attendance, x Amount Bet, Y xiyi xi2 yhat residuals2 1 117 2.07 242.19 13689 2.27325 0.0413108 0.86862 2 119 2.26 268.94 14161 2.44676 0.0348802 0.55056 3 120 2.66 319.2 14400 2.53352 0.0159976 0.11696 4 122 3.14 383.08 14884 2.70703 0.187463 0.01904 5 125 2.93 366.25 15625 2.9673 0.0013911 0.00518 6 127 3.54 449.58 16129 3.14081 0.1593531 0.28944 7 128 2.8 358.4 16384 3.22757 0.1828121 0.0408 8 130 3.33 432.9 16900 3.40108 0.0050519 0.10758 9 131 3.4 445.4 17161 3.48783 0.0077146 0.1584 10 135 3.89 525.15 18225 3.83486 0.0030408 0.78854 TOTAL 1254 30.02 3791.09 157558 0.6390153 2.94516 (Y - Y)2
Or … Using Excel Note the confidence interval … we can also draw a confidence interval around our predictions.