Reasoning in Psychology Using Statistics 2015
Announcements Quiz 8 due Fri. Apr 17 Includes both correlation and regression Final Project due date Wed. April 29th (you should get your cases assigned to you in labs today) Announcements
Exam(s) 3 Lecture Exam 3 Lab Exam 3 Combined Exam 3 Mean 53.6 (53.6/75 = 71.4%) Lab Exam 3 Mean 61.0 (61.0/75 = 81.3%) Combined Exam 3 Mean 116.1 (116.1/150 = 77.4%) Exam(s) 3
Regression procedures can be used to predict the response variable based on the explanatory variable(s) Suppose that you notice that the more you study for an exam, the better your score typically is. This suggests that there is a relationship between the variables. You can use this relationship to predict test performance base on study time. study time test performance 115 mins 15 mins Regression
Decision tree Regression Describing the nature of the relationship between variables for the purposes of prediction Two variables Relationship between variables Quantitative variables Making predictions based on form of the relationship Decision tree
For correlation: “it doesn’t matter which variable goes on the X-axis or the Y-axis” The variable that you are predicting (response variable) goes on the Y-axis Predicted variable For regression this is NOT the case Y X 1 2 3 4 5 6 Quiz performance Predictor variable The variable that you are making the prediction based on (explanatory variable) goes on the X-axis Hours of study Regression
Regression For correlation: “Imagine a line through the points” But there are lots of possible lines Y X 1 2 3 4 5 6 One line is the “best fitting line” Today: learn how to compute the equation corresponding to this “best fitting line” Quiz performance Hours of study Regression
Regression A brief review of geometry Y = (X)(slope) + (intercept) 2.0 Y = intercept, when X = 0 Y X 1 2 3 4 5 6 Y = (X)(slope) + (intercept) 2.0 Regression
Regression A brief review of geometry Y = (X)(slope) + (intercept) 0.5 1 2 3 4 5 6 Y = (X)(slope) + (intercept) 0.5 2.0 1 2 Change in Y Change in X = slope Regression
Regression A brief review of geometry Y = (X)(slope) + (intercept) 1 2 3 4 5 6 Y = (X)(slope) + (intercept) Y = (X)(0.5) + 2.0 In regression analysis this line (or the equation that describes it) represents our predicted values of Y given particular values of X Regression
Regression A brief review of geometry Consider a perfect correlation X = 5 Y = ? Y X 1 2 3 4 5 6 Y = (X)(0.5) + (2.0) Y = (5)(0.5) + (2.0) Y = 2.5 + 2 = 4.5 4.5 Can make specific predictions about Y based on X Regression
Regression Consider a less than perfect correlation The line still represents the predicted values of Y given X X = 5 Y = ? Y X 1 2 3 4 5 6 Y = (X)(0.5) + (2.0) Y = (5)(0.5) + (2.0) Y = 2.5 + 2 = 4.5 4.5 Regression
The “best fitting line” is the one that minimizes the differences (error or residuals) between the predicted scores (the line) and the actual scores (the points) Y X 1 2 3 4 5 6 Rather than compare the errors from different lines and picking the best, we will directly compute the equation for the best fitting line Regression
Example Using the dataset from our correlation lecture Y X Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. Y X 1 2 3 4 5 6 X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 Example
Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 2.4 5.76 2.0 4.0 4.8 B 1 2 -2.6 6.76 -2.0 4.0 5.2 C 5 6 1.4 1.96 2.0 4.0 2.8 D 3 4 -0.6 0.36 0.0 0.0 0.0 E 3 2 -0.6 0.36 -2.0 4.0 1.2 mean 3.6 4.0 0.0 15.20 SSX 0.0 16.0 SSY 14.0 SP Example
Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 SP mean 3.6 4.0 15.20 SSX 16.0 SSY 14.0 Example
Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 4.0 D 3 4 E 3 2 SP mean 3.6 15.20 SSX 16.0 SSY 14.0 Example
Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. Y X 1 2 3 4 5 6 X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 Example
The two means will be on the line Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. Y X 1 2 3 4 5 6 X Y The two means will be on the line A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 Example
Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. Y X 1 2 3 4 5 6 X Y A 6 6 B 1 2 C 5 6 Hypothesis testing on each of these D 3 4 E 3 2 mean 3.6 4.0 Example
Hypothesis testing with Regression SPSS Regression output gives you a lot of stuff Hypothesis testing with Regression
Hypothesis testing with Regression SPSS Regression output gives you a lot of stuff Make sure you put the variables in the correct role Hypothesis testing with Regression
Hypothesis testing with Regression SPSS Regression output gives you a lot of stuff Unstandardized coefficients “(Constant)” = intercept Variable name = slope These t-tests test hypotheses H0: Intercept (constant) = 0 H0: Slope = 0 Hypothesis testing with Regression
Measures of Error in Regression The linear equation isn’t the whole thing Also need a measure of error Y = X(.5) + (2.0) + error Y = X(.5) + (2.0) + error Same line, but different relationships (strength difference) Y X 1 2 3 4 5 6 Y X 1 2 3 4 5 6 Measures of Error in Regression
Measures of Error in Regression The linear equation isn’t the whole thing Also need a measure of error Three common measures of error r2 (r-squared) Sum of the squared residuals = SSresidual= SSerror Standard error of estimate Measures of Error in Regression
Measures of Error in Regression R-squared (r2) represents the percent variance in Y accounted for by X r = 0.8 r2 = 0.64 r = 0.5 r2 = 0.25 Y X 1 2 3 4 5 6 64% of the variance in Y is explained by X Y X 1 2 3 4 5 6 25% of the variance in Y is explained by X Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror Y X 1 2 3 4 5 6 Compute the difference between the predicted values and the observed values (“residuals”) Square the differences Add up the squared differences Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 Predicted values of Y (points on the line) 4.0 Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror X Y 6 6 6.2 = (0.92)(6)+0.688 1 2 5 6 3 4 3 2 mean 3.6 Predicted values of Y (points on the line) 4.0 Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror X Y 6 6 6.2 = (0.92)(6)+0.688 1 2 1.6 = (0.92)(1)+0.688 5 6 5.3 = (0.92)(5)+0.688 3 4 3.45 = (0.92)(3)+0.688 3 2 3.45 = (0.92)(3)+0.688 mean 3.6 4.0 Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror X Y Y X 1 2 3 4 5 6 6 6 6.2 6.2 1.6 5.3 3.45 1 2 1.6 5 6 5.3 3 4 3.45 3 2 3.45 Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror residuals X Y 6 6 6.2 6 - 6.2 = -0.20 1 2 1.6 2 - 1.6 = 0.40 5 6 5.3 6 - 5.3 = 0.70 3 4 3.45 4 - 3.45 = 0.55 3 2 3.45 -1.45 2 - 3.45 = mean 3.6 4.0 Quick check 0.00 Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror X Y 6 6 6.2 -0.20 0.04 1 2 1.6 0.40 0.16 5 6 5.3 0.70 0.49 3 4 3.45 0.55 0.30 3 2 3.45 -1.45 2.10 mean 3.6 4.0 0.00 3.09 SSERROR Measures of Error in Regression
Measures of Error in Regression Sum of the squared residuals = SSresidual = SSerror 4.0 0.0 16.0 SSY X Y 6 6 6.2 -0.20 0.04 1 2 1.6 0.40 0.16 5 6 5.3 0.70 0.49 3 4 3.45 0.55 0.30 3 2 3.45 -1.45 2.10 mean 3.6 4.0 0.00 3.09 SSERROR Measures of Error in Regression
Measures of Error in Regression Standard error of the estimate represents the average deviation from the line Y X 1 2 3 4 5 6 df = n - 2 Measures of Error in Regression
Measures of Error in Regression SPSS Regression output gives you a lot of stuff r2 percent variance in Y accounted for by X Standard error of the estimate the average deviation from the line SSresiduals or SSerror Measures of Error in Regression
You’ll practice computing the regression equation and error for the “best fitting line” (by hand and using SPSS) In lab