Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Similar presentations


Presentation on theme: "Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:

1 Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

2 Part 3: Regression and Correlation 3-2/41 Regression and Forecasting Models Part 3 – Model Fit and Correlation

3 Part 3: Regression and Correlation 3-3/41 Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Correlation = 0.845

4 Part 3: Regression and Correlation 3-4/41 Correlation Coefficient for Two Variables

5 Part 3: Regression and Correlation 3-5/41 Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Standard Deviation Height = 2.978 Standard Deviation Income = 176.903 Covariance of Height and Income = 445.034 Correlation = 445.034 / (2.978 x 176.903) = 0.845

6 Part 3: Regression and Correlation 3-6/41 Sample Correlation Coefficients r xy = 0.723 r xy = -.402 r xy = +1.000 r xy = -.06 (close to 0)

7 Part 3: Regression and Correlation 3-7/41 Inference About a Correlation Coefficient

8 Part 3: Regression and Correlation 3-8/41 Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Correlation = 0.845 t =.845 / sqr((1-.845 2 )/(30-2)) = 8.361

9 Part 3: Regression and Correlation 3-9/41 Correlation is Not Causality Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Correlation = 0.845

10 Part 3: Regression and Correlation 3-10/41 Linear regression is about correlation Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes The variables are highly correlated because the regression does a good job of predicting changes in the y variable associated with changes in the x variable.

11 Part 3: Regression and Correlation 3-11/41 Regression Algebra

12 Part 3: Regression and Correlation 3-12/41 Variance Decomposition

13 Part 3: Regression and Correlation 3-13/41 ANOVA Table

14 Part 3: Regression and Correlation 3-14/41 Fit of the Model to the Data

15 Part 3: Regression and Correlation 3-15/41 Explained Variation  The proportion of variation “explained” by the regression is called R-squared (R 2 )  It is also called the Coefficient of Determination  (It is the square of something – to be shown later.)

16 Part 3: Regression and Correlation 3-16/41 Movie Madness Fit R2R2

17 Part 3: Regression and Correlation 3-17/41 Pretty Good Fit: R 2 =.722 Regression of Fuel Bill on Number of Rooms

18 Part 3: Regression and Correlation 3-18/41 Regression Fits R 2 = 0.522 R 2 = 0.880 R 2 = 0.424 R 2 = 0.924

19 Part 3: Regression and Correlation 3-19/41 R 2 = 0.338 R 2 is still positive even if the correlation is negative.

20 Part 3: Regression and Correlation 3-20/41 R Squared Benchmarks  Aggregate time series: expect.9+  Cross sections,.5 is good. Sometimes we do much better.  Large survey data sets,.2 is not bad. R 2 = 0.924 in this cross section.

21 Part 3: Regression and Correlation 3-21/41 R-Squared is r xy 2  R-squared is the square of the correlation between y i and the predicted y i which is a + bx i.  The correlation between y i and (b 0 +b 1 x i ) is the same as the correlation between y i and x i.  Therefore,….  A regression with a high R 2 predicts y i well.

22 Part 3: Regression and Correlation 3-22/41 Squared Correlations r xy 2 = 0.522 r xy 2 =.161 r xy 2 =.924

23 Part 3: Regression and Correlation 3-23/41 Regression Fits Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes

24 Part 3: Regression and Correlation 3-24/41 Is R 2 Large?  Is there really a relationship between x and y? We cannot be 100% certain. We can be “statistically certain” (within limits) by examining R 2. F is used for this purpose.

25 Part 3: Regression and Correlation 3-25/41 The F Ratio

26 Part 3: Regression and Correlation 3-26/41 Is R 2 Large?  Since F = (N-2)R 2 /(1 – R 2 ), if R 2 is “large,” then F will be large.  For a model with one explanatory variable in it, the standard benchmark value for a ‘large’ F is 4.

27 Part 3: Regression and Correlation 3-27/41 Movie Madness Fit R2R2 F

28 Part 3: Regression and Correlation 3-28/41 Why Use F and not R 2 ?  When is R 2 “large?” we have no benchmarks to decide.  We have a table for F statistics to determine when F is statistically large: yes or no.

29 Part 3: Regression and Correlation 3-29/41 F Table The “critical value” depends on the number of observations. If F is larger than the value in the table, conclude that there is a “statistically significant” relationship. There is a huge table on pages 826-833 of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data. n 2 is N-2

30 Part 3: Regression and Correlation 3-30/41 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1 n 2 is N-2

31 Part 3: Regression and Correlation 3-31/41 Inference About a Correlation Coefficient This is F

32 Part 3: Regression and Correlation 3-32/41 $135 Million http://www.nytimes.com/2006/06/19/arts/design/19klim.html?ex=1308369600&en=37eb32381038a74 9&ei=5088&partner=rssnyt&emc=rss Klimt, to Ronald Lauder

33 Part 3: Regression and Correlation 3-33/41 $100 Million … sort of Stephen Wynn with a Prized Possession, 2007

34 Part 3: Regression and Correlation 3-34/41 An Enduring Art Mystery Why do larger paintings command higher prices? The Persistence of Memory. Salvador Dali, 1931 The Persistence of Econometrics. Greene, 2011 Graphics show relative sizes of the two works.

35 Part 3: Regression and Correlation 3-35/41

36 Part 3: Regression and Correlation 3-36/41

37 Part 3: Regression and Correlation 3-37/41 Monet in Large and Small Log of $price = a + b log surface area + e Sale prices of 328 signed Monet paintings The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.

38 Part 3: Regression and Correlation 3-38/41 The Data Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)

39 Part 3: Regression and Correlation 3-39/41 Application: Monet Paintings  Does the size of the painting really explain the sale prices of Monet’s paintings?  Investigate: Compute the regression  Hypothesis: The slope is actually zero.  Rejection region: Slope estimates that are very far from zero. The hypothesis that β = 0 is rejected

40 Part 3: Regression and Correlation 3-40/41 An Equivalent Test  Is there a relationship?  H 0 : No correlation  Rejection region: Large R 2.  Test: F=  Reject H 0 if F > 4  Math result: F = t 2. Degrees of Freedom for the F statistic are 1 and N-2

41 Part 3: Regression and Correlation 3-41/41 Monet Regression: There seems to be a regression. Is there a theory?


Download ppt "Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics."

Similar presentations


Ads by Google