The Correlation Coefficient
Social Security Numbers
A Scatter Diagram
The Point of Averages Where is the center of the cloud? Take the average of the x-values and the average of the y-values; this is the point of averages. It locates the center of the cloud. Similarly, take the SD of the x-values and the SD of the y-values.
The Correlation Coefficient An association can be stronger or weaker. Remember: a strong association means that knowing one variable helps to predict the other variable to a large extend. The correlation coefficient is a numerical value expressing the strength of the association.
The Correlation Coefficient We denote the correlation coefficient by r. If r = 0, the cloud is completely formless; there is no correlation between the variables. If r = 1, all the points lie exactly on a line (not necessarily x = y) and there is perfect correlation.
Strong and Weak
The Correlation Coefficient What about negative values? The correlation coefficient is between –1 and 1, negative shows negative association, positive indicates positive association. Note that –0.90 shows the same degree of association as +0.90, only negative instead of positive.
Computing the Correlation Coefficient 1.Convert each variable to standard units. 2.The average of the products gives the correlation coefficient r. r = average of (x in standard units) (y in standard units)
Example xy We must first convert to standard units. Find the average and the SD of the x-values: average = 4, SD = 2. Find the deviation: subtract the average from each value, and divide by the SD. Then do the same for the y-values.
Example Standard units xyxy x yx y
Example Finally, take the average of the products In this example, r = r = average of (x in standard units) (y in standard units)
The SD line If there is some association, the points in the scatter diagram cluster around a line. But around which line? Generally, this is the SD line. It is the line through the point of averages. It climbs at the rate of one vertical SD for each horizontal SD. Its slope is (SD of y) / (SD of x) in case of a positive correlation, and –(SD of y) / (SD of x) in case of a negative correlation.
Five-point Summary Remember the five-point summary of a data set: minimum, lower quartile, median, upper quartile, and maximum. A five-point summary for a scatter plot is: average x-values, SD x-values, average y- values, SD y-values, and correlation coefficient r.