Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.

Similar presentations


Presentation on theme: "Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics."— Presentation transcript:

1 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics for Economist Chap 7. The Error for Regression 1.Difference between Actual and Predict values 2.Computing RMSE Using the Correlation. 3.The Residual Plot 4.The Vertical Strips 5.Approximating to the Normal Curve Inside a Vertical Strip

2 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 2/24 INDEX 1 Difference between Actual and Predict Values 2 Computing RMSE Using the Correlation 3 The Residual Plot 4 The Vertical Strips 5 Approximating to the Normal Curve Inside a Vertical Strip

3 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 3/24 1. Difference between Actual and Predict Values Root-Mean-Square-Error (RMSE) Root-Mean-Square Error (RMSE) Standard Error of Estimate Standard Error of Regression Actual value Estimate Error 회귀직선

4 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 4/24 Estimation error1 height141cm. average weight of height 141cm is 38.7kg residual = actual weight – predicted weight = 54.5kg – 38.7kg = +15.8kg 67.4kg – 84.0kg = -16.6kg Residual of A Residual of B Korean men 4514 with age 10-90 - Average height = 167.5cm - SD of height = 8.5cm - Average weight = 63.5kg - SD of weight = 11.9kg - Correlation coefficient = 0.67 1. Difference between Actual and Predict Values

5 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 5/24 Estimation error actual weight – predicted weight generally called, residual. The overall size of these errors in measured by taking their root mean square. Vertical distance from the line Estimation error 2 predicted error actual weight height 1. Difference between Actual and Predict Values

6 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 6/24 A typical point on a scatter plot is above or below the regression line by 8.9kg. (vertical distance) meaning The divisor degrees of freedom = 4514-2 = 4512 Computing the errors are based on the regression line. The regression line is defined by slope and intercept (lowering the degree of freedom) Computing the RMSE 1. Difference between Actual and Predict Values

7 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 7/24 Group average  height of the regression line Distance from the center(RMSE) The Normal curve. Following 68-95 rule. Regression line & RMSE vs. Average & SD 1. Difference between Actual and Predict Values

8 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 8/24 Regression and rule of thumb 68% regression 1RMSE 95% regression 2RMSE About 68% of the points on a scatter diagram will be within 1RMSE of the regression line; about 95% of them will be within 2RMSE. 1. Difference between Actual and Predict Values

9 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 9/24 Elementary method for RMSE actual y residual= (actual y) – (average y) estimate = (average y) x Estimate y ignoring x → a horizontal line for estimates. This elementary RMSE is SDy. 1. Difference between Actual and Predict Values

10 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 10/24 INDEX 1 Difference between Actual and Predict Values 2 Computing RMSE Using the Correlation 3 The Residual Plot 4 The Vertical Strips 5 Approximating to the Normal Curve Inside a Vertical Strip

11 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 11/24 2. Computing RMSE Using the Correlation RMSE of the regression line and SDy yy xx RMSE SD y Regression lines Average y RMSE of regression is about RMSE of regression < SDy  because the regression line get closer to the points than the horizontal line. ref: Regression line is for ‘ much closer to the more scatters ’. r = 1 → RMSE = 0 r = -1 → RMSE = 0 r = 0 → RMSE  SD y Degrees of freedom

12 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 12/24 RMSE and Correlation coefficient Correlation coefficient Measures spread relative to the SD without units. RMSE Measures vertically spread around the regression line in absolute y-terms. We can get the RMSE from SDy using the correlation coefficient.. 2. Computing RMSE Using the Correlation

13 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 13/24 Regression analysis and correlation coefficient  r describes the clustering of the points around the SD line, relative to the SDs  Associated with each 1SD increase in x there is an increase of only r SDs in y, on the average  r determines the accuracy of the regression predictions, through the formula RMSE =  SD y.  RMSE describes how the regression line summarize data well. 2. Computing RMSE Using the Correlation

14 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 14/24 INDEX 1 Difference between Actual and Predict Values 2 Computing RMSE Using the Correlation 3 The Residual Plot 4 The Vertical Strips 5 Approximating to the Normal Curve Inside a Vertical Strip

15 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 15/24 3. The Residual Plot Plotting the Residual Plot  The residuals average out to 0.  The regression line for the residual plot is horizontal x-axis. The reason is that all the trend up or down has been taken out of the residual, and is in the residuals.

16 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 16/24 A residual with a strong pattern With a mistake to use a regression line, such a pattern appears. The residual plot should not have a strong pattern. 3. The Residual Plot

17 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 17/24 INDEX 1 Difference between Actual and Predict Values 2 Computing RMSE Using the Correlation 3 The Residual Plot 4 The Vertical Strips 5 Approximating to the Normal Curve Inside a Vertical Strip

18 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 18/24 35 40 45 50 55 60 65 70 75 80 85 90 95 100 4. The Vertical Strips Scatter plot and histogram inside the vertical strips The two histograms have similar shapes, and their SDs are nearly the same. Group with height about 165cm people Group with height about 170 cm people

19 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 19/24 Homoscedasticity and Heteroscedasticity HomoscedasticityHeteroscedasticity All the vertical strips in a scatter plot show similar amounts of spread and the SDs of weight are not related to x-value. The size of it is about RMSE. The SDs of income in groups vary to the vertical strips. In this case, the RMSE of the regression line only gives a sort of average error across all the different x- values. 4. The Vertical Strips

20 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 20/24 INDEX 1 Difference between Actual and Predict Values 2 Computing RMSE Using the Correlation 3 The Residual Plot 4 The Vertical Strips 5 Approximating to the Normal Curve Inside a Vertical Strip

21 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 21/24 5.Approximating to the Normal Curve inside a Vertical Strip Impossible to approximate Estimates are meaningless themselves, The errors does not follow normal curve. The regression method uwing RMSE is off by different amounts in different parts of the scatter plot.

22 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 22/24 Ex) Midterm and final scores of econometrics in spring semester year 2002 midterm average = 27.9 midterm SD = 8.5 final average = 56.4 final SD = 13.8 r = 0.49 an oval shaped scatter plot. (1) What percentage of students got 66 or over on the final? (2) What percentage of students whose midterm score is 33 got 66 or over on the final? example1 5.Approximating to the Normal Curve inside a Vertical Strip

23 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 23/24 example 1 (1)Even Midterm related statistics or correlation coefficient are not necessary. z=0.7 By standard normal curve, 24% ☞ ☞ 5.Approximating to the Normal Curve inside a Vertical Strip

24 Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics STATISTICS 24/24 example 1 (2) We get new average using the regression analysis, new SD from RMSE of regression line. Regression Analysis Method 1. Midterm score is above the average by 0.6 SDx. 2. r= 0.49; 0.6  0.49 = 0.3 3. Final score is above by 0.3 SDy = 4.1 4. New average is 56.4 + 4.1 = 60.5. z = 0.5 By standard normal curve, 31 % 5.Approximating to the Normal Curve inside a Vertical Strip


Download ppt "Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics."

Similar presentations


Ads by Google