Section 5.1: Correlation
Correlation Coefficient A quantitative assessment of the strength of a relationship between the x and y values in a set of (x,y) pairs. –Positive correlation – As x increases, y increases –Negative correlation – As x increases, y decreases –No correlation – No strong relationship between x and y (there is no tendency for y either to increase or to decrease as x increases)
Examples of Correlations in Scatterplots
Pearson’s Sample Correlation Coefficient Let (x 1, y 1 ), (x 2, y 2 ), …, (x n,y n ) denote a sample of (x,y) pairs Pearson’s Sample Correlation Coefficient r is given by:
Example: Correlation Calculation
Some Correlation Pictures
Properties of r The value of r does not depend on the unit of measurement for either variable The value of r does not depend on which of the two variables is considered x The value of r is between -1 and 1 The correlation coefficient is: –r = 1 only when all the points in a scatterplot of the data lie exactly on a straight line that slopes upward. – r = -1 only when all points lie exactly on a downward sloping line. The value of r is a measure of the extent to which x and y are linearly related
Population Correlation Coefficient – (denoted by ρ) A measure of how strongly x and y are related in the entire population –ρ is a number between -1 and 1 that does not depend on the unit of measurement for either x or y, or on which variable is labeled x and which is labeled y –ρ = 1 or -1 if and only if all (x,y) pairs in the population lie exactly on a straight line, so ρ measures the extent to which there is a linear relationship in the population.
Example Consider the following bivariate data set:
Computing the Pearson Correlation Coefficient we find that r =
With a sample Pearson correlation coefficient, r = 0.001, one would note that there seems to be little or no linearity to the relationship between x and y. Be careful that you do not infer that there is no relationship between x and y.
Note that there appears to be an almost perfect quadratic relationship between x and y when the scatterplot is drawn.