Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics
Part 3: Regression and Correlation 3-2/41 Regression and Forecasting Models Part 3 – Model Fit and Correlation
Part 3: Regression and Correlation 3-3/41 Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc Correlation = 0.845
Part 3: Regression and Correlation 3-4/41 Correlation Coefficient for Two Variables
Part 3: Regression and Correlation 3-5/41 Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc Standard Deviation Height = Standard Deviation Income = Covariance of Height and Income = Correlation = / (2.978 x ) = 0.845
Part 3: Regression and Correlation 3-6/41 Sample Correlation Coefficients r xy = r xy = r xy = r xy = -.06 (close to 0)
Part 3: Regression and Correlation 3-7/41 Inference About a Correlation Coefficient
Part 3: Regression and Correlation 3-8/41 Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc Correlation = t =.845 / sqr(( )/(30-2)) = 8.361
Part 3: Regression and Correlation 3-9/41 Correlation is Not Causality Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc Correlation = 0.845
Part 3: Regression and Correlation 3-10/41 Linear regression is about correlation Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes The variables are highly correlated because the regression does a good job of predicting changes in the y variable associated with changes in the x variable.
Part 3: Regression and Correlation 3-11/41 Regression Algebra
Part 3: Regression and Correlation 3-12/41 Variance Decomposition
Part 3: Regression and Correlation 3-13/41 ANOVA Table
Part 3: Regression and Correlation 3-14/41 Fit of the Model to the Data
Part 3: Regression and Correlation 3-15/41 Explained Variation The proportion of variation “explained” by the regression is called R-squared (R 2 ) It is also called the Coefficient of Determination (It is the square of something – to be shown later.)
Part 3: Regression and Correlation 3-16/41 Movie Madness Fit R2R2
Part 3: Regression and Correlation 3-17/41 Pretty Good Fit: R 2 =.722 Regression of Fuel Bill on Number of Rooms
Part 3: Regression and Correlation 3-18/41 Regression Fits R 2 = R 2 = R 2 = R 2 = 0.924
Part 3: Regression and Correlation 3-19/41 R 2 = R 2 is still positive even if the correlation is negative.
Part 3: Regression and Correlation 3-20/41 R Squared Benchmarks Aggregate time series: expect.9+ Cross sections,.5 is good. Sometimes we do much better. Large survey data sets,.2 is not bad. R 2 = in this cross section.
Part 3: Regression and Correlation 3-21/41 R-Squared is r xy 2 R-squared is the square of the correlation between y i and the predicted y i which is a + bx i. The correlation between y i and (b 0 +b 1 x i ) is the same as the correlation between y i and x i. Therefore,…. A regression with a high R 2 predicts y i well.
Part 3: Regression and Correlation 3-22/41 Squared Correlations r xy 2 = r xy 2 =.161 r xy 2 =.924
Part 3: Regression and Correlation 3-23/41 Regression Fits Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes
Part 3: Regression and Correlation 3-24/41 Is R 2 Large? Is there really a relationship between x and y? We cannot be 100% certain. We can be “statistically certain” (within limits) by examining R 2. F is used for this purpose.
Part 3: Regression and Correlation 3-25/41 The F Ratio
Part 3: Regression and Correlation 3-26/41 Is R 2 Large? Since F = (N-2)R 2 /(1 – R 2 ), if R 2 is “large,” then F will be large. For a model with one explanatory variable in it, the standard benchmark value for a ‘large’ F is 4.
Part 3: Regression and Correlation 3-27/41 Movie Madness Fit R2R2 F
Part 3: Regression and Correlation 3-28/41 Why Use F and not R 2 ? When is R 2 “large?” we have no benchmarks to decide. We have a table for F statistics to determine when F is statistically large: yes or no.
Part 3: Regression and Correlation 3-29/41 F Table The “critical value” depends on the number of observations. If F is larger than the value in the table, conclude that there is a “statistically significant” relationship. There is a huge table on pages of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data. n 2 is N-2
Part 3: Regression and Correlation 3-30/41 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = Buzz Predictor Coef SE Coef T P Constant Buzz S = R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total n 2 is N-2
Part 3: Regression and Correlation 3-31/41 Inference About a Correlation Coefficient This is F
Part 3: Regression and Correlation 3-32/41 $135 Million 9&ei=5088&partner=rssnyt&emc=rss Klimt, to Ronald Lauder
Part 3: Regression and Correlation 3-33/41 $100 Million … sort of Stephen Wynn with a Prized Possession, 2007
Part 3: Regression and Correlation 3-34/41 An Enduring Art Mystery Why do larger paintings command higher prices? The Persistence of Memory. Salvador Dali, 1931 The Persistence of Econometrics. Greene, 2011 Graphics show relative sizes of the two works.
Part 3: Regression and Correlation 3-35/41
Part 3: Regression and Correlation 3-36/41
Part 3: Regression and Correlation 3-37/41 Monet in Large and Small Log of $price = a + b log surface area + e Sale prices of 328 signed Monet paintings The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.
Part 3: Regression and Correlation 3-38/41 The Data Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)
Part 3: Regression and Correlation 3-39/41 Application: Monet Paintings Does the size of the painting really explain the sale prices of Monet’s paintings? Investigate: Compute the regression Hypothesis: The slope is actually zero. Rejection region: Slope estimates that are very far from zero. The hypothesis that β = 0 is rejected
Part 3: Regression and Correlation 3-40/41 An Equivalent Test Is there a relationship? H 0 : No correlation Rejection region: Large R 2. Test: F= Reject H 0 if F > 4 Math result: F = t 2. Degrees of Freedom for the F statistic are 1 and N-2
Part 3: Regression and Correlation 3-41/41 Monet Regression: There seems to be a regression. Is there a theory?