4.2 Correlation The Correlation Coefficient r Properties of r 1
Correlation We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here are both scatterplots of the same data, on different scales. The second seems to be a stronger association… So we need a measure of association independent of the graphics…
3 A scatterplot displays the strength, direction, and form of the relationship between two quantitative variables. Linear relations are important because a straight line is a simple pattern that is quite common. Our eyes are not good judges of how strong a relationship is. Therefore, we use a numerical measure to supplement our scatterplot and help us interpret the strength of the linear relationship. The correlation r measures the strength of the linear relationship between two quantitative variables. Measuring Linear Association
4 We say a linear relationship is strong if the points lie close to a straight line and weak if they are widely scattered about a line. The following facts about r help us further interpret the strength of the linear relationship. Properties of Correlation r is always a number between –1 and 1. r > 0 indicates a positive association. r < 0 indicates a negative association. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0 toward –1 or 1. The extreme values r = –1 and r = 1 occur only in the case of a perfect linear relationship. Properties of Correlation r is always a number between –1 and 1. r > 0 indicates a positive association. r < 0 indicates a negative association. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0 toward –1 or 1. The extreme values r = –1 and r = 1 occur only in the case of a perfect linear relationship. Measuring Linear Association
5 Correlation
The correlation coefficient r Time to swim: = 35, s x = 0.7 Pulse rate: = 140 s y = 9.5
r does not distinguish between x & y The correlation coefficient, r, treats x and y symmetrically "Time to swim" is the explanatory variable here, and belongs on the x axis. However, in either plot r is the same (r=-0.75). r = -0.75
Changing the units of measure of variables does not change the correlation coefficient r, because we "standardize out" the units when getting z-scores. r has no unit of measure (unlike x and y ) r = z-score plot is the same for both plots z for time z for pulse
9 Cautions: Correlation requires that both variables be quantitative. Correlation does not describe curved relationships between variables, no matter how strong the relationship is. Correlation is not resistant. r is strongly affected by a few outlying observations. Correlation is not a complete summary of two-variable data. Cautions: Correlation requires that both variables be quantitative. Correlation does not describe curved relationships between variables, no matter how strong the relationship is. Correlation is not resistant. r is strongly affected by a few outlying observations. Correlation is not a complete summary of two-variable data.
10 HW: Read section 4.2 on the Correlation Coefficient. Pay particular attention to the Figure 4.12… Work the following exercises: # , , HW: Read section 4.2 on the Correlation Coefficient. Pay particular attention to the Figure 4.12… Work the following exercises: # , ,