Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.

Regression

Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used to predict values of one variable from values of the other (e.g., SAT to predict GPA).

Basic Ideas (2) Sample value: Intercept – place where X=0 Slope – change in Y if X changes 1 unit. Rise over run. If error is removed, we have a predicted value for each person at X (the line): Suppose on average houses are worth about $75.00 a square foot. Then the equation relating price to size would be Y’=0+75X. The predicted price for a 2000 square foot house would be $150,000.

Linear Transformation 1 to 1 mapping of variables via line Permissible operations are addition and multiplication (interval data) Add a constantMultiply by a constant

Linear Transformation (2) Centigrade to Fahrenheit Note 1 to 1 map Intercept? Slope? 1209060300 Degrees C 240 200 160 120 80 40 0 Degrees F 32 degrees F, 0 degrees C 212 degrees F, 100 degrees C Intercept is 32. When X (Cent) is 0, Y (Fahr) is 32. Slope is 1.8. When Cent goes from 0 to 100 (rise), Fahr goes from 32 to 212, and 212-32 = 180. Then 180/100 =1.8 is rise over run is the slope. Y = 32+1.8X. F=32+1.8C.

Regression Line (1) Basics 1. Passes thru both means. 2. Passes close to points. Note errors. 3. Described by an equation.

Regression Line (2) Slope Equation for a line is Y=mX+b in algebra. In regression, equation usually written Y=a+bX Y is the DV (weight), X is the IV (height), a is the intercept (-327) and b is the slope (7.15). The slope, b, indicates rise over run. It tells how many units of change in Y for a 1 unit change in X. In our example, the slope is a bit over 7, so a change of 1 inch is expected to produce a change a bit more than 7 pounds.

Regression Line (3) Intercept The Y intercept, a, tells where the line crosses the Y axis; it’s the value of Y when X is zero. The intercept is calculated by: Sometimes the intercept has meaning; sometimes not. It depends on the meaning of X=0. In our example, the intercept is –327. This means that if a person were 0 inches tall, we would expect them to weigh –327 lbs. Nonsense. But if X were the number of smiles,then a would have meaning.

Correlation & Regression Correlation & regression are closely related. 1. The correlation coefficient is the slope of the regression line if X and Y are measured as z scores. Interpreted as SD Y change with a change of 1 SD X. 2.For raw scores, the slope is: The slope for raw scores is the correlation times the ratio of 2 standard deviations. (These SDs are computed with (N-1), not N). In our example, the correlation was.96, so the slope can be found by b =.96*(33.95/4.54) =.96*7.45 = 7.15. Recall that. Our intercept is 150.7- 7.15*66.8  -327.

Correlation & Regression (2) 3.The regression equation is used to make predictions. The formula to do so is just: Suppose someone is 68 inches tall. Predicted weight is -327+7.15*68 = 159.2.

Review What is the slope? What does it tell or mean? What is the intercept? What does it tell or mean? How are the slope of the regression line and the correlation coefficient related? What is the main use of the regression line?

Test Questions A B C D What is the approximate value of the intercept for Figure C? a.0 b.10 c.15 d.20

Test Questions In a regression line, the equation used is typically. What does the value a stand for?  independent variable  intercept  predicted value (DV)  slope

Regression of Weight on Height HtWt 61105 62120 63120 65160 65120 68145 69175 70160 72185 75210 N=10 M=67M=150 SD=4.57SD= 33.99 Correlation (r) =.94. Regression equation: Y’=-361.86+6.97X

Predicted Values & Errors NHtWtY'Error 161105108.19-3.19 262120115.164.84 363120122.13-2.13 465160136.0623.94 565120136.06-16.06 668145156.97-11.97 769175163.9411.06 870160170.91-10.91 972185184.840.16 1075210205.754.25 M67150150.000.00 SD4.5733.9931.8511.89 Variance20.891155.561014.37141.32 Numbers for linear part and error. Note M of Y’ and Residuals. Note variance of Y is V(Y’) + V(res).

Error variance In our example, Standard error of the Estimate – average distance from prediction In our example (Heiman’s notation for error is not standard. )

Variance Accounted for (Heiman’s notation for error is not standard. ) The basic idea is to try maximize r-square, the variance accounted for. The closer this value is to 1.0, the more accurate the predictions will be.

Sample Exam Data from Previous Class 86.0056.00 98.0070.00 70.0076.00 84.0082.00 82.0074.00 92.0094.00 92.0078.00 72.0056.00 96.0066.00 82.0072.00 Exam 1 Exam 2 A sample of 10 scores from both exams Assuming these are representative, what can you say about the exams? The students?

Scatterplot & Boxplots of 2 Exams Exam 1Exam 2

Descriptive Stats Descriptives StatisticStd. Error Exam1Mean83.4412.89508 Median86.0000 Variance108.959 Std. Deviation10.43837 Minimum52.00 Maximum100.00 Range48.00 Exam2Mean70.77211.27332 Median72.0000 Variance220.503 Std. Deviation14.84935 Minimum24.00 Maximum100.00 Range76.00

Correlations Exam1Exam2 Exam1 Pearson Correlation 1.420 ** Sig. (2-tailed).000 N165136 Exam2 Pearson Correlation.420 ** 1 Sig. (2-tailed).000 N136139 **. Correlation is significant at the 0.01 level (2-tailed).

Scatterplot with means and regression line Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant)20.8959.3772.228.028 Exam1.598.112.4205.360.000 a. Dependent Variable: Exam2 Note that the correlation, r, is.42 and the squared correlation, R 2, is.177. R 2 is also the variance accounted for. We can predict a bit less than 20 percent of the variance in Exam 2 from Exam 1.

Predicted Scores Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant)20.8959.3772.228.028 Exam1.598.112.4205.360.000 a. Dependent Variable: Exam2 Predicted Exam 2 = 20.895 +.598*Exam1 For example, if I got 85 on Exam 1, then my predicted score for Exam 2 is 20.895+.598*85 = 71.73 = 72 percent

Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.

Similar presentations

Presentation on theme: "Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.

Similar presentations

Presentation on theme: "Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used."— Presentation transcript:

Similar presentations

About project

Feedback