Prediction, Goodness-of-Fit, and Modeling Issues

Slides:



Advertisements
Similar presentations
Dummy Variables. Introduction Discuss the use of dummy variables in Financial Econometrics. Examine the issue of normality and the use of dummy variables.
Advertisements

Panel Data Models Prepared by Vera Tabakova, East Carolina University.
Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
Ch.6 Simple Linear Regression: Continued
Prediction, Goodness-of-Fit, and Modeling Issues ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Correlation and Regression
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
SIMPLE LINEAR REGRESSION
Prediction, Goodness-of-fit, and Modeling Issues
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Correlation and Regression
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Prediction, Goodness-of-Fit, and Modeling Issues Prepared by Vera Tabakova, East Carolina University.
Lecture 10: Correlation and Regression Model.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Chapter 13 Simple Linear Regression
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Vera Tabakova, East Carolina University
Vera Tabakova, East Carolina University
Statistics for Managers using Microsoft Excel 3rd Edition
Inference for Regression
Linear Regression and Correlation Analysis
Chapter 11: Simple Linear Regression
Chapter 13 Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
Further Inference in the Multiple Regression Model
Correlation and Simple Linear Regression
Undergraduated Econometrics
Review of Statistical Inference
Interval Estimation and Hypothesis Testing
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
The Multiple Regression Model
St. Edward’s University
Presentation transcript:

Prediction, Goodness-of-Fit, and Modeling Issues Chapter 4 Prepared by Vera Tabakova, East Carolina University

4.1 Least Squares Prediction Figure 4.1 A point prediction Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit □ Consider how well the regression line describes the data □ The R2 measures how well OLS regression line fits the data (4.7) □ We assume and separate equation (4.7) into explainable and unexplainable components. (4.8) Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit □ Since both parts of (4.8) are unobservable, we use (4.9) where (4.9) □ Subtract the sample mean from both sides, we have (4.10) (4.10) Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit Figure 4.3 Explained and unexplained components of yi Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit □ If we square and sum both sides of (4.10), we have (4.11). □ Note that the cross-product term, 2 𝑦 𝑖 − 𝑦 𝑒 𝑖 , is equal to 0 by the normal equations. (4.11) □ Equation (4.11) decomposes the total sample variation in y into explained and unexplained parts. Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit = total sum of squares = SST: a measure of total variation in y about the sample mean. = sum of squares due to the regression = SSR: a part of total variation in y about the sample mean, that is explained by, or due to, the regression. Also known as the “explained sum of squares.” = sum of squares due to error = SSE: a part of total variation in y about its mean that is not explained by the regression. Also known as the unexplained sum of squares, the residual sum of squares, or the sum of squared errors. SST = SSR + SSE Principles of Econometrics, 3rd Edition

4.2 Measuring Goodness-of-Fit The closer R2 (coefficient of determination) is to one, the closer the sample values yi are to the fitted regression equation . If R2= 1, then all the sample data fall exactly on the fitted least squares line, so SSE = 0, and the model fits the data “perfectly.” If the sample data for y and x are uncorrelated and show no linear association, then the least squares fitted line is “horizontal,” so that SSR = 0 and R2 = 0. (4.12) Principles of Econometrics, 3rd Edition

4.2 Properties of the R2 R2 measures the linear association, or goodness-of-fit, between the sample data and their predicted values. Consequently R2 is sometimes called a measure of “goodness-of-fit.” Notes; - the test of β2 addresses only the question of whether there is enough evidence to infer that a linear relationship exists. - the R2 measures the strength of that linear relationship, particularly when we compare several different models Principles of Econometrics, 3rd Edition

4.2 Properties of the R2 If R2 = 0.385, it means that 38.5% of the variation can be explained by the regression line; Microeconomic (cross section) data (ex. Household behavior) has R2 values from 0.1 to 0.4; Whereas macroeconomic (time series) data has R2 values from 0.4 to 0.9, or even higher; Adjusted R2 : where K is the total number of explanatory variables in the model, and N is the sample size. ⇒ Details will be explained in Ch. 6 Principles of Econometrics, 3rd Edition

Jarque-Bera Test for Normality 4.3 Modeling Issues Jarque-Bera Test for Normality Hypothesis tests and interval estimates for the coefficients rely on the assumption that the errors, and hence the dependent variable y, are normally distributed Are they normally distributed? We can check the distribution of the residuals using: A histogram Formal statistical test: Jarque–Bera test for normality

Residual histogram and summary statistics for food expenditure example (Fig. 4.10) 4.3 Modeling Issues

Jarque-Bera Test for Normality 4.3 Modeling Issues Jarque-Bera Test for Normality The Jarque–Bera test for normality is based on two measures, skewness and kurtosis Skewness refers to how symmetric the residuals are around zero Perfectly symmetric residuals will have a skewness of zero The skewness value for the food expenditure residuals is -0.097 Thus residuals are slightly negatively skewed = left long tail Kurtosis refers to the ‘‘peakedness’’ of the distribution. For a normal distribution, the kurtosis value is 3

Jarque-Bera Test for Normality 4.3 Modeling Issues Jarque-Bera Test for Normality The Jarque–Bera statistic is given by: where N = sample size S = skewness K = kurtosis

Jarque-Bera Test for Normality 4.3 Modeling Issues Jarque-Bera Test for Normality When the residuals are normally distributed, the Jarque–Bera statistic has a chi-squared distribution with two degrees of freedom We reject the hypothesis of normally distributed errors if a calculated value of the statistic exceeds a critical value selected from the chi-squared distribution with two degrees of freedom The 5% critical value from a χ2-distribution with two degrees of freedom is 5.99, and the 1% critical value is 9.21

Jarque-Bera Test for Normality 4.3 Modeling Issues Jarque-Bera Test for Normality For the food expenditure example, the Jarque–Bera statistic is: Because 0.063 < 5.99 there is insufficient evidence from the residuals to conclude that the normal distribution assumption is unreasonable at the 5% level of significance

Jarque-Bera Test for Normality 4.3 Modeling Issues Jarque-Bera Test for Normality We could reach the same conclusion by examining the p-value The p-value appears in Figure 4.10 described as ‘‘Probability’’ of J-B test. Thus, we also fail to reject the null hypothesis of normality on the grounds that 0.9688 > 0.05