ASSESSING THE STRENGTH OF THE REGRESSION MODEL
Assessing the Model’s Strength Although the best straight line through a set of points may have been found and the assumptions for may appear valid, is the resulting regression line useful in predicting y? NONO YES
STEP 4: HOW GOOD IS THE MODEL? Can we conclude that there is a linear relation between y and x? –This is a hypothesis test (t-test) What proportion of the overall variability in y (from its mean) can be explained by changes in x? r 2 –This is a performance measure called -- the coefficient of determination (denoted by r 2 )
Can we conclude a linear relation exists between y and x? We are hypothesizing that y changes linearly with x: y = 0 + 1 x. That is, if x goes up by 1, y will change by 1. But if no linear relation exists, then that means if x goes up by 1, y will not change, i.e. 1 = 0.
The Hypothesis Test To test whether or not a linear relation exists: H 0 : 1 = 0 (No linear relation exists) H A : 1 0 (A linear relation does exists) = the significance level Reject H 0 (Accept H A ) if t > t /2 or if t < -t /2 with Degrees of Freedom = n- (# betas) = n-2
The t –statistic for the test of 1 = 0
HAND CALCULATIONS Test: Reject H 0 if t > t.025,8 = or t < -t.025,8 = > > – Can conclude β 1 0, i.e. a linear relation exists a linear relation exists.
95% Confidence Interval for 1 (Point Estimate) t.025,n-2 (Appropriate st’d dev.)
Coefficient of Determination -- r 2 r 2 The proportion of the total change in y that can be explained by changes in the x values is called the coefficient of determination, denoted r 2.
Hand Calculation of SSR, SSE, SST SUM i 2 i 2 iiii )y(y )y ˆ )yy ˆ ( y ˆ y x i SSRSSESST
Hand Calculation of r 2
Interpretation of r 2 r 2 = 1 -- perfect (positive or negative) relation i.e. points fit exactly along the regression line r 2 close to 0 -- very little relation The higher the value of r 2 the better the model fits the data
Pearson Correlation Coefficient, r Pearson correlation coefficient.r = r 2, which can also be calculated by cov(x,y)/s x s y is called the Pearson correlation coefficient. This is also used to measure the strength of the relation between y and x. r = -1 means perfect negative correlation (i.e. all points fit exactly on a line with negative slope). r = +1 means perfect positive correlation (i.e. all points fit exactly on a line with positive slope). r = 0 means no correlation. Other values give relative strength, but have no exact meaning like r 2 – so we usually use r 2 When we take the square root of r 2 to get r, the sign in front of r is the sign of b 1 – positive or negative slope
EXCEL r2r2 r (“+” if 1 >0; “-” if 1 < 0) SSRSSESST s b1 t statistic for 1 test p-value for 1 test 95% Confidence Interval for 1
Steps Using Excel 1.Determine regression equation Equation: ŷ = x 2.Can you conclude a linear relation exists between y and x? The p-value for the test is < =.05—YES 3.What proportion of the overall variation in y is explained by changes to x? This is r 2 = a high r 2 CONCLUSION: Overall a good model!
Review Can we conclude a linear relation exists? –Two-tailed t-test of 1 0 –Look at p-value for the x-variable on Excel Computation of a confidence interval for the amount y will change per unit increase in x (i.e. for 1 ) –By hand –Printed on Excel Output What proportion of the overall variation in y is explained by changes in x?– r 2 –By hand –Printed on Excel Pearson correlation coefficient – r –Square root of r 2 –Sign is same as b 1