BPK 304W Correlation
Correlation Coefficient (r) Correlation Coefficient (r) is a measure of association between two variables Varies from -1 to +1 r is a ratio of variability in X to that of Y. 0 = no relationship; 1 = perfect relationship Correlation
Correlation
Linear Fit High Correlation Coefficient does not mean a linear fit
Correlation does not mean causation Spurious Correlations – coincidental correlation between two unrelated variables A study of boys aged 6 to 18 years produced correlations of standing broad jump with other measures. Which had the highest correlation? Correlation
Range of the Data affects the Correlation Coefficient
Correlation coefficient depends upon the orientation of the two groups
Significance of the Correlation Coefficient The critical value of the correlation coefficient is determined by the sample size Bigger sample size = lower critical value of r Statistical significance of r does not infer “practical significance” Correlation
Degrees Probability of Freedom 0.05 0.01 1 .997 1.000 24 .388 .496 2 .950 .990 25 .381 .487 3 .878 .959 26 .374 .478 4 .811 .917 27 .367 .470 5 .754 .874 28 .361 .463 6 .707 .834 29 .355 .456 7 .666 .798 30 .349 .449 8 .632 .765 35 .325 .418 9 .602 .735 40 .304 .393 10 .576 .708 45 .288 .372 11 .553 .684 50 .273 .354 12 .532 .661 60 .250 13 .514 .641 70 .232 .302 14 .497 .623 80 .217 .283 15 .482 .606 90 .205 .267 16 .468 .590 100 .195 .254 17 .575 125 .174 .228 18 .444 .561 150 .159 .208 19 .433 .549 200 .138 .181 20 .423 .537 300 .113 .148 21 .413 .526 400 .098 .128 22 .404 .515 500 .088 .115 23 .396 .505 1,000 .062 .081 Table 2-4.2: Critical Values of the Correlation Coefficient
Coefficient of Determination R squared (r2) The circle represents the total variance in the measure Weight 75% unexplained Correlation of Weight with Arm Girth r = 0.5, r2 = 0.25 Therefore 25% of the variance in weight is explained by arm girth 25% Arm Girth
Correlations between all variables Correlation Matrix Correlations between all variables Weight vs Arm Girth r = 0.5, r2 = 0.25 Weight vs Calf Girth r = 0.6 , r2 = 0 .36 Arm Girth vs Calf Girth r = 0.4 , r2 = 0 .16 Weight Arm Girth Calf Girth
BPK 304W Regression
Prediction Can we predict one variable from another? Linear Regression Analysis Y = mX + c m = slope; c = intercept Regression
Linear Regression Correlation Coefficient (r) how well the line fits Standard Error of Estimate (S.E.E.) how well the line predicts Regression
Least Sum of Squares Curve Fitting (Residual) Regression
Assumptions about the relationship between Y and X For each value of X there is a normal distribution of Y from which the sample value of Y is drawn The population of values of Y corresponding to a selected X has a mean that lies on the straight line In each population the standard deviation of Y about its mean has the same value
Standard Error of Estimate measure of how well the equation predicts Y has units of Y true score 68.26% of time is within plus or minus 1 SEE of predicted score Standard deviation of the normal distribution of residuals Regression
Right Hand L. = 0.99Left Hand L. + 0.254 r = 0.94 S.E.E. = 0.38cm Regression
How good is my equation? Regression equations are sample specific Cross-validation Studies Test your equation on a different sample Split sample studies Take a 50% random sample and develop your equation then test it on the other 50% of the sample Regression
Multiple Regression More than one independent variable Y = m1X1 + m2X2 + m3X3 …… + c Same meaning for r, and S.E.E., just more measures used to predict Y Stepwise regression variables are entered into the equation based upon their relative importance Regression
Building a multiple regression equation X1 has the highest correlation with Y, therefore it would be the first variable included in the equation. X3 has a higher correlation with Y than X2. However, X2 would be a better choice than X3. to include in an equation with X1, to predict Y. X2 has a low correlation with X1 and explains some of the variance that X1 does not. X3 Y X1 X2
Standardized Regression The numerical value is of mn is dependent upon the size of the independent variable Y = m1X1 + m2X2 + m3X3 …… + c Variables are transformed into standard scores before regression analysis, therefore mean and standard deviation of all independent variables are 0 and 1 respectively. The numerical value of zmn now represents the relative importance of that independent variable to the prediction Y = zm1X1 + zm2X2 + zm3X3 …… + c Regression