Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 14, part C Goodness of Fit..

Similar presentations


Presentation on theme: "Chapter 14, part C Goodness of Fit.."— Presentation transcript:

1 Chapter 14, part C Goodness of Fit.

2 III. Coefficient of Determination
We developed an equation, but we don’t really know how well it fits the data. The coefficient of variation gives us a measure of the goodness of fit for an estimated regression equation. How closely an estimate, comes to the actual value, yi is called a residual.

3 A. Total Sum of Squares (SST)
If you had to estimate repair cost but had no knowledge of the car’s age, what would be your best guess? Probably the mean repair cost. If we subtract each yi from the mean, we calculate the error involved in using the mean to estimate cost. I hope our regression equation does a better job of estimating repair cost than just using the mean!

4 The calculation of SST For the 4th observation, this difference is =24. Do this for each observation, square it and sum them and you calculate SST=62,870.

5 B. Sum of Squares due to Error (SSE)
Every ith observation has a residual. The process of Least Squares minimizes the sum of the squared residuals. Some observations will be overestimated, some underestimated. A predicted yi that is $20 too high is just as large of a “miss” as $20 too low. So squaring each residual gives equal weight to positive and negative residuals of equal magnitude.

6 You can also see the variation around the mean, 276.
Take the 4th observation. The estimated repair cost for a 4-year old car is $351.50, but the actual data for y4=$300. So the residual is Square this for every observation and sum them and you get SSE= =51.50 =276 You can also see the variation around the mean, 276.

7 C. Sum of Squares due to Regression (SSR)
So SSE measures how closely observations are clustered around the regression line, SST measures how closely they are clustered around the mean. What’s left over is called SSR. SST = SSE + SSR, where Since our regression model is designed to minimize SSE, I would hope that SSR would make up the bulk of the total variation in y. =57,002.5

8 D. Coefficient of Determination (R2)
All of the variation in y is represented by SST, and since least squares is designed to minimize SSE, then a very good model is one that explains most of the variation in y and would thus have a very small SSE. Equivalently, you could think of a good model as having a large SSR, relative to SSE. If so, SSR/SST is very close to being equal to 1.

9 This ratio, of SSR to SST is called R2, the coefficient of determination.
A terrible model has a very large SSE, and a very small SSR, so R2 is very close to zero. An excellent model has a R2 very close to 1.

10 Interpretation of R2 In the repair cost example, R2= This means that 90.67% of the total sum of squares can be explained by using the estimated regression equation between age and repair cost.

11 Excel Output I’ve highlighted the relevant information in the table of regression output. Can you pick out the important information that we have been discussing?


Download ppt "Chapter 14, part C Goodness of Fit.."

Similar presentations


Ads by Google