Scatter Plots and Correlation Coefficients Technology Activity 11 By Maria Frederick
Before We Begin… Let’s go over some vocabulary Scatter plots- a collection of data points, one data point per person or object. Correlation- is used to determine if there is a relationship between two variables, and, if so, the strength and direction of that relationship. Regression line or the line of best fit- is the particular line in which the spread of the data points around it is as small as possible. Correlation Coefficient- a measure that is used to describe the strength and direction of a relationship between variables whose data points lie on or near a line. It is designated by r.
Here is an example: This scatter plot and regression line (line of best fit) show two variables; ice-cream sales and outside temperature. According to this graph, is there a correlation? What might the correlation coefficient be?
Interpreting the Information First, how are the data points arranged compared to the regression line? Do they lie directly on the line, close to it, or are they scattered all over? Most of the data points lie on or very close to the line of best fit, therefore we can say there seems to be a strong correlation. Remember correlation is not necessarily causation!
Interpreting the Information Second, are the variables positively or negatively correlated? Do the data points tend to increase or decrease together? Does the regression line fall from left to right or rise from left to right? The data points tend to increase together and the regression line or line of best fit rises from left to right, so we can say the variables are positively correlated.
Interpreting the Information Finally, can we approximate what the correlation coefficient might be based on the graph and what we have already determined? Let’s take a look at the next slide before answering. r ≈ ?
Interpreting the Information Using this number line to help us, we see that correlation coefficients can range from -1 to 1. Since we have determined there to be a fairly strong positive correlation between ice-cream sales and outside temperature, what would you approximate r to be? A reasonable estimate for r could range from .7 to .9.
Are you ready to interpret some graphs on your own? Remember, you are looking for the following: Correlation If there is a correlation, is it strong or weak If there is a correlation, is it positive or negative Estimate what the correlation coefficient could be (r ≈)
Graph 1 Graph 1 has a strong negative correlation. r ≈ -.8
Graph 2 Graph 2 has a perfect positive correlation. r = 1
Graph 3 Graph 3 has no correlation. r = 0
Graph 4 Graph 4 has a weak negative correlation. r ≈ -.3
Graph 5 Graph 5 has a perfect negative correlation. r = -1
Graph 6 Graph 6 has a weak positive correlation. r ≈ .3
How did you do? 1. Were you able to determine if there was a correlation? 2. If there was a correlation, were you able to determine if it was strong or weak, positive or negative? 3. Were you able to approximate the correlation coefficient? If you could answer yes to all these questions, congratulations! You are ready to go from estimating to computing the actual correlation coefficient and regression line. But that is another lesson!