Lecture 5 Correlation and Regression Dr Peter Wheale
A Scatter Plot of Monthly Returns
Interpretation of Correlation Coefficient Correlation Interpretation coefficient (r) (r) r = +1 perfect positive correlation 0 < r < +1 positive linear relationship r = 0 no linear relationship r = -1 perfect negative correlation -1 < r < 0 negative linear relationship
Scatter Plots and Correlation
Covariance of Rates of Return Example: Calculate the covariance between the returns on the two stocks indicated below:
Covariance Using Historical Data Σ = 0.0154 Cov = 0.0154 / 2 = 0.0077 R1 = 0.05 R2 = 0.07
Sample Correlation Coefficient Correlation, ρ, is a standardized measure of covariance and is bounded by +1 and –1 Example: The covariance of returns on two assets is 0.0051 and σ1= 7% and σ2= 11%. Calculate ρ1,2.
Testing H0: Correlation = 0 The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2
Example Data: n = 10 r = 0.475 Determine if the sample correlation is significant at the 5% level of significance. t = 0.475 (8)0.5 / [1 – (0.475)2] 0.5 = 1.3435 / 0.88 = 1.5267 The two-tailed critical t – values at a 5% level of significance with df = 8 (n-2) are found to be +/- 2.306. Since -2.306≤ 1.5267≤ 2.306, the null hypothesis cannot be rejected, i.e. correlation between variables X and Y is not significantly different from zero at a 5% significance level.
Testing H0: Correlation = 0 The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2
Testing H0: Correlation = 0 The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2
Testing H0: Correlation = 0 The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2
Linear Regression Dependent variable: you are trying to explain changes in this variable Independent variable: the variable being used to explain the changes in the dependent variable Example: You want to predict housing starts using mortgage interest rates: Independent variable = mortgage interest rates Dependent variable = housing starts
Regression Equation y-Intercept Error Term Independent Variable Slope Coefficient
Assumptions of Linear Regression Linear relation between dependent and independent variables Independent variable uncorrelated with error term Expected value of error term is zero Variance of the error term is constant Error term is independently distributed Error term is normally distributed
Estimated Regression Coefficients Estimated regression line is: Slope Y-Intercept
Estimating the slope coefficient b1 = the cov(X,Y) / var(X) Example Compute the slope coefficient and intercept term for the least squares regression equation using the following information: Where X – Xmean multiplied by Y-Ymean = 445, and X – Xmean squared = 374.50. The sample means of X and Y = 25 and 75, respectively. The slope coefficient, b1 = 445/374.5 = 1.188. The intercept term, b0 = 75 – 1.188 (25) = 45.3.
Calculating the Standard Error of the Estimate (SEE) SEE measures the accuracy of the prediction from a regression equation It is the standard dev. of the error term The lower the SEE, the greater the accuracy
Interpreting the Coefficient of Determination (R2) R2 measures the percentage of the variation in the dependent variable that can be explained by the independent variable An R2 of 0.25 means the independent variable explains 25% of the variation in the dependent variable Caution: You cannot conclude causation
Calculating the Coefficient of Determination (R2) For simple linear regression, R2 is the correlation coefficient (r) squared Example: Correlation coefficient between X and Y, (r) = 0.50 Coefficient of determination = 0.502 = 0.25
Coefficient of Determination (R2) R2 can also be calculated with SST and SSR SS Total = SS Regression + SS Error Total variation = explained variation + unexplained variation