BPK 304W Regression Linear Regression Least Sum of Squares Assumptions about the relationship between Y and X Standard Error of Estimate Multiple Regression Standardized Regression
Prediction Can we predict one variable from another? Linear Regression Analysis Y = mX + c m = slope; c = intercept Regression
Linear Regression Correlation Coefficient (r) how well the line fits Standard Error of Estimate (S.E.E.) how well the line predicts Regression
Least Sum of Squares Curve Fitting (Residual) Regression
Assumptions about the relationship between Y and X For each value of X there is a normal distribution of Y from which the sample value of Y is drawn The population of values of Y corresponding to a selected X has a mean that lies on the straight line In each population the standard deviation of Y about its mean has the same value
Standard Error of Estimate measure of how well the equation predicts Y has units of Y true score 68.26% of time is within plus or minus 1 SEE of predicted score Standard deviation of the normal distribution of residuals Regression
Right Hand L. = 0.99Left Hand L. + 0.254 r = 0.94 S.E.E. = 0.38cm Regression Kin 304 Spring 2009
How good is my equation? Regression equations are sample specific Cross-validation Studies Test your equation on a different sample Split sample studies Take a 50% random sample and develop your equation then test it on the other 50% of the sample Regression
Multiple Regression More than one independent variable Y = m1X1 + m2X2 + m3X3 …… + c Same meaning for r, and S.E.E., just more measures used to predict Y Stepwise regression variables are entered into the equation based upon their relative importance Regression
Building a multiple regression equation X1 has the highest correlation with Y, therefore it would be the first variable included in the equation. X3 has a higher correlation with Y than X2. However, X2 would be a better choice than X3. to include in an equation with X1, to predict Y. X2 has a low correlation with X1 and explains some of the variance that X1 does not. X3 Y X1 X2
Standardized Regression The numerical value is of mn is dependent upon the size of the independent variable Y = m1X1 + m2X2 + m3X3 …… + c Variables are transformed into standard scores before regression analysis, therefore mean and standard deviation of all independent variables are 0 and 1 respectively. The numerical value of zmn now represents the relative importance of that independent variable to the prediction Y = zm1X1 + zm2X2 + zm3X3 …… + c Regression