Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nvmrao Three useful results: GOODNESS OF FIT This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating.

Similar presentations


Presentation on theme: "Nvmrao Three useful results: GOODNESS OF FIT This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating."— Presentation transcript:

1 nvmrao Three useful results: GOODNESS OF FIT This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating three useful results. The first is that the mean value of the residuals must be zero. 1

2 nvmrao Three useful results: GOODNESS OF FIT The residual in any observation is given by the difference between the actual and fitted values of Y for that observation. 2

3 nvmrao Three useful results: GOODNESS OF FIT First substitute for the fitted value. 3

4 nvmrao Three useful results: GOODNESS OF FIT Now sum over all the observations. 4

5 nvmrao Three useful results: GOODNESS OF FIT Dividing through by n, we obtain the sample mean of the residuals in terms of the sample means of X and Y and the regression coefficients. 5

6 nvmrao Three useful results: GOODNESS OF FIT 6 If we substitute for b 1, the expression collapses to zero.

7 nvmrao Three useful results: GOODNESS OF FIT Next we will demonstrate that the mean of the fitted values of Y is equal to the mean of the actual values of Y. 7

8 nvmrao Three useful results: GOODNESS OF FIT Again, we start with the definition of a residual. 8

9 nvmrao Three useful results: GOODNESS OF FIT Sum over all the observations. 9

10 nvmrao Three useful results: GOODNESS OF FIT Divide through by n. The terms in the equation are the means of the residuals, actual values of Y, and fitted values of Y, respectively. 10

11 nvmrao Three useful results: GOODNESS OF FIT We have just shown that the mean of the residuals is zero. Hence the mean of the fitted values is equal to the mean of the actual values. 11

12 nvmrao Three useful results: GOODNESS OF FIT Finally we will demonstrate that the covariance between the fitted values of Y and the residuals is zero. 12

13 nvmrao Three useful results: GOODNESS OF FIT We start by replacing the fitted value of Y with its definition. 13

14 nvmrao Three useful results: GOODNESS OF FIT Using Covariance Rule 1, the covariance can be decomposed into two terms. 14

15 nvmrao Three useful results: GOODNESS OF FIT Using Covariance Rule 3, Cov(b 1, e) is zero because b 1 is a constant. 15

16 nvmrao Three useful results: GOODNESS OF FIT Using Covariance Rule 2, b 2 can be taken out of the second term as a factor. 16

17 nvmrao Three useful results: GOODNESS OF FIT Substitute for e using the definition of the residual. 17

18 nvmrao Three useful results: GOODNESS OF FIT Using Covariance Rule 1, we can decompose the expression into three terms. 18

19 nvmrao Three useful results: GOODNESS OF FIT Cov(X, b 1 ) is zero because b 1 is a constant. 19

20 nvmrao Three useful results: GOODNESS OF FIT b 2 in the third term can be taken out as a factor. 20

21 nvmrao Three useful results: GOODNESS OF FIT Next replace the b 2 inside the expression with its definition in terms of Cov(X, Y) and Var(X). 21

22 nvmrao Three useful results: GOODNESS OF FIT The terms cancel and the expression collapses to zero. Hence we have demonstrated that the covariance between the fitted values of Y and the residuals must be zero. 22

23 nvmrao GOODNESS OF FIT We start the discussion of goodness of fit by noting that the actual values of Y can be decomposed into the fitted values and the residuals. 23

24 nvmrao GOODNESS OF FIT The variance of Y can therefore be rewritten as shown. 24

25 nvmrao GOODNESS OF FIT The variance has been decomposed using Variance Rule 1. 25

26 nvmrao GOODNESS OF FIT We have just shown that the covariance term is zero. Hence the variance of Y can be neatly decomposed into the variance of the fitted values and the variance of the residuals. 26

27 nvmrao GOODNESS OF FIT The decomposition has been rewritten using the definitions of the sample variances. 27

28 nvmrao GOODNESS OF FIT The equation has been rewritten again, multiplying through by n and making use of two of the preliminary results (the mean of the fitted values of Y being equal to the mean of the actual values, and the mean of the residuals being zero). 28

29 nvmrao GOODNESS OF FIT The left side of the equation is the sum of the squared deviations of Y about its sample mean. This is described as the Total Sum of Squares. 29

30 nvmrao GOODNESS OF FIT The first term on the right side of the equation is the sum of the squared deviations of the fitted values of Y about its sample mean. This is described as the Explained Sum of Squares. 30

31 nvmrao GOODNESS OF FIT “Explained” really ought to be in inverted commas, because the explanation is conditional on the model being correctly specified, and often it is not. 31

32 nvmrao GOODNESS OF FIT The second term on the right side of the equation is the sum of the squares of the residuals. This is described as the Residual Sum of Squares. 32

33 nvmrao GOODNESS OF FIT The main criterion of goodness of fit, formally described as the coefficient of determination, but usually referred to as R 2, is defined to be the ratio of ESS to TSS, that is, the proportion of the variance of Y explained by the regression equation. 33

34 nvmrao GOODNESS OF FIT Obviously we would like to locate the regression line so as to make the goodness of fit as high as possible, according to this criterion. Does this objective clash with our use of the least squares principle to determine b 1 and b 2 ? 34

35 nvmrao GOODNESS OF FIT Fortunately, there is no clash. To demonstrate that the two objectives are equivalent, we use the decomposition of the variance to obtain an alternative expression for R 2. 35

36 nvmrao GOODNESS OF FIT The OLS regression coefficients are chosen in such a way as to minimize the sum of the squares of the residuals. Thus it automatically follows that they maximize R 2. 36

37 nvmrao GOODNESS OF FIT Another natural criterion of goodness of fit is the correlation between the actual and fitted values of Y. We will demonstrate that this is maximized by using the least squares principle to determine the regression coefficients 37

38 nvmrao GOODNESS OF FIT First substitute for Y in the numerator. 38

39 nvmrao GOODNESS OF FIT Use Covariance Rule 1 to split the numerator into two terms. 39

40 nvmrao GOODNESS OF FIT We have already demonstrated that the second term is zero. The first term is just the sample variance. 40

41 nvmrao GOODNESS OF FIT The numerator has been rewritten as the square root of the variance squared. 41

42 nvmrao GOODNESS OF FIT The reason for this is that it is then easy to see that the square root of the variance of the fitted values can be cancelled from the numerator and denominator 42

43 nvmrao GOODNESS OF FIT Thus the correlation coefficient is the square root of R 2. It follows that it is maximized by the use of the least squares principle to determine the regression coefficients. 43


Download ppt "Nvmrao Three useful results: GOODNESS OF FIT This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating."

Similar presentations


Ads by Google