Prediction, Goodness-of-Fit, and Modeling Issues

Prediction, Goodness-of-Fit, and Modeling Issues
Chapter 4 Prepared by Vera Tabakova, East Carolina University

4.1 Least Squares Prediction
Figure 4.1 A point prediction Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit
□ Consider how well the regression line describes the data □ The R2 measures how well OLS regression line fits the data (4.7) □ We assume and separate equation (4.7) into explainable and unexplainable components. (4.8) Principles of Econometrics, 3rd Edition

□ Since both parts of (4.8) are unobservable, we use (4.9) where (4.9) □ Subtract the sample mean from both sides, we have (4.10) (4.10) Principles of Econometrics, 3rd Edition

Figure 4.3 Explained and unexplained components of yi Principles of Econometrics, 3rd Edition

□ If we square and sum both sides of (4.10), we have (4.11). □ Note that the cross-product term, 𝑦 𝑖 − 𝑦 𝑒 𝑖 , is equal to 0 by the normal equations. (4.11) □ Equation (4.11) decomposes the total sample variation in y into explained and unexplained parts. Principles of Econometrics, 3rd Edition

= total sum of squares = SST: a measure of total variation in y about the sample mean. = sum of squares due to the regression = SSR: a part of total variation in y about the sample mean, that is explained by, or due to, the regression. Also known as the “explained sum of squares.” = sum of squares due to error = SSE: a part of total variation in y about its mean that is not explained by the regression. Also known as the unexplained sum of squares, the residual sum of squares, or the sum of squared errors. SST = SSR + SSE Principles of Econometrics, 3rd Edition

The closer R2 (coefficient of determination) is to one, the closer the sample values yi are to the fitted regression equation If R2= 1, then all the sample data fall exactly on the fitted least squares line, so SSE = 0, and the model fits the data “perfectly.” If the sample data for y and x are uncorrelated and show no linear association, then the least squares fitted line is “horizontal,” so that SSR = 0 and R2 = 0. (4.12) Principles of Econometrics, 3rd Edition

4.2 Properties of the R2 R2 measures the linear association, or goodness-of-fit, between the sample data and their predicted values. Consequently R2 is sometimes called a measure of “goodness-of-fit.” Notes; - the test of β2 addresses only the question of whether there is enough evidence to infer that a linear relationship exists. - the R2 measures the strength of that linear relationship, particularly when we compare several different models Principles of Econometrics, 3rd Edition

4.2 Properties of the R2 If R2 = 0.385, it means that 38.5% of the variation can be explained by the regression line; Microeconomic (cross section) data (ex. Household behavior) has R2 values from 0.1 to 0.4; Whereas macroeconomic (time series) data has R2 values from 0.4 to 0.9, or even higher; Adjusted R2 : where K is the total number of explanatory variables in the model, and N is the sample size. ⇒ Details will be explained in Ch. 6 Principles of Econometrics, 3rd Edition

Jarque-Bera Test for Normality
4.3 Modeling Issues Jarque-Bera Test for Normality Hypothesis tests and interval estimates for the coefficients rely on the assumption that the errors, and hence the dependent variable y, are normally distributed Are they normally distributed? We can check the distribution of the residuals using: A histogram Formal statistical test: Jarque–Bera test for normality

Residual histogram and summary statistics
for food expenditure example (Fig. 4.10) 4.3 Modeling Issues

4.3 Modeling Issues Jarque-Bera Test for Normality The Jarque–Bera test for normality is based on two measures, skewness and kurtosis Skewness refers to how symmetric the residuals are around zero Perfectly symmetric residuals will have a skewness of zero The skewness value for the food expenditure residuals is Thus residuals are slightly negatively skewed = left long tail Kurtosis refers to the ‘‘peakedness’’ of the distribution. For a normal distribution, the kurtosis value is 3

4.3 Modeling Issues Jarque-Bera Test for Normality The Jarque–Bera statistic is given by: where N = sample size S = skewness K = kurtosis

4.3 Modeling Issues Jarque-Bera Test for Normality When the residuals are normally distributed, the Jarque–Bera statistic has a chi-squared distribution with two degrees of freedom We reject the hypothesis of normally distributed errors if a calculated value of the statistic exceeds a critical value selected from the chi-squared distribution with two degrees of freedom The 5% critical value from a χ2-distribution with two degrees of freedom is 5.99, and the 1% critical value is 9.21

4.3 Modeling Issues Jarque-Bera Test for Normality For the food expenditure example, the Jarque–Bera statistic is: Because < 5.99 there is insufficient evidence from the residuals to conclude that the normal distribution assumption is unreasonable at the 5% level of significance

4.3 Modeling Issues Jarque-Bera Test for Normality We could reach the same conclusion by examining the p-value The p-value appears in Figure 4.10 described as ‘‘Probability’’ of J-B test. Thus, we also fail to reject the null hypothesis of normality on the grounds that > 0.05

Prediction, Goodness-of-Fit, and Modeling Issues

Similar presentations

Presentation on theme: "Prediction, Goodness-of-Fit, and Modeling Issues"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prediction, Goodness-of-Fit, and Modeling Issues

Similar presentations

Presentation on theme: "Prediction, Goodness-of-Fit, and Modeling Issues"— Presentation transcript:

Similar presentations

About project

Feedback