Descriptive measures of the degree of linear association R-squared and correlation
Coefficient of determination R 2 is a number (a proportion!) between 0 and 1. If R 2 = 1: –all data points fall perfectly on the regression line –predictor X accounts for all of the variation in Y If R 2 = 0: –the fitted regression line is perfectly horizontal –predictor X accounts for none of the variation in Y
Interpretations of R 2 R 2 ×100 percent of the variation in Y is reduced by taking into account predictor X. R 2 ×100 percent of the variation in Y is “explained by” the variation in predictor X.
R-sq on Minitab fitted line plot
R-sq on Minitab regression output The regression equation is Mort = Lat S = R-Sq = 68.0 % R-Sq(adj) = 67.3 % Analysis of Variance Source DF SS MS F P Regression Error Total
Correlation coefficient r is a number between -1 and 1, inclusive. Sign of coefficient of correlation –plus sign if slope of fitted regression line is positive –negative sign if slope of fitted regression line is negative.
Correlation coefficient formulas
Interpretation of correlation coefficient No clear-cut operational interpretation as for R-squared value. r = -1 is perfect negative linear relationship. r = 1 is perfect positive linear relationship. r = 0 is no linear relationship.
R 2 = 100% and r = +1
R 2 = 2.9% and r = 0.17
R 2 = 70.1% and r = U.S. Norway Finland Italy France
R 2 = 82.8% and r = 0.91
R 2 = 50.4% and r = 0.71
R 2 = 0% and r = 0
Cautions about R 2 and r Summary measures of linear association. Possible to get R 2 = 0 with a perfect curvilinear relationship. Large R 2 does not necessarily imply that estimated regression line fits the data well. Both measures can be greatly affected by one (outlying) data point.
Cautions about R 2 and r A “statistically significant R 2 ” does not imply that slope is meaningfully different from 0. A large R 2 does not necessarily mean that useful predictions can be made. Can still get wide intervals.