Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 1 of 30 Chapter 4 Section 1 Scatter Diagrams and Correlation
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 2 of 30 Chapter 4 – Section 1 ●Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient Determine whether there is a linear relation between two variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 3 of 30 Chapter 4 – Section 1 ●Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient Determine whether there is a linear relation between two variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 4 of 30 Chapter 4 – Section 1 ●In many studies, we measure more than one variable for each individual ●Some examples are Rainfall amounts and plant growth Exercise and cholesterol levels for a group of people Height and weight for a group of people ●In many studies, we measure more than one variable for each individual ●Some examples are Rainfall amounts and plant growth Exercise and cholesterol levels for a group of people Height and weight for a group of people ●In these cases, we are interested in whether the two variables have some kind of a relationship
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 5 of 30 Chapter 4 – Section 1 ●When we have two variables, they could be related in one of several different ways They could be unrelated ●When we have two variables, they could be related in one of several different ways They could be unrelated One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) ●When we have two variables, they could be related in one of several different ways They could be unrelated One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) One variable could be thought of as causing the other variable to change ●When we have two variables, they could be related in one of several different ways They could be unrelated One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) One variable could be thought of as causing the other variable to change ●In this chapter, we examine the second case … explanatory and response variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 6 of 30 Chapter 4 – Section 1 ●Sometimes it is not clear which variable is the explanatory variable and which is the response variable ●Sometimes the two variables are related without either one being an explanatory variable ●Sometimes the two variables are both affected by a third variable, a lurking variable, that had not been included in the study
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 7 of 30 Chapter 4 – Section 1 ●An example of a lurking variable ●A researcher studies a group of elementary school children Y = the student’s height X = the student’s shoe size ●An example of a lurking variable ●A researcher studies a group of elementary school children Y = the student’s height X = the student’s shoe size ●It is not reasonable to claim that shoe size causes height to change ●An example of a lurking variable ●A researcher studies a group of elementary school children Y = the student’s height X = the student’s shoe size ●It is not reasonable to claim that shoe size causes height to change ●The lurking variable of age affects both of these two variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 8 of 30 Chapter 4 – Section 1 ●Some other examples ●Rainfall amounts and plant growth Explanatory variable – rainfall Response variable – plant growth Possible lurking variable – amount of sunlight ●Some other examples ●Rainfall amounts and plant growth Explanatory variable – rainfall Response variable – plant growth Possible lurking variable – amount of sunlight ●Exercise and cholesterol levels Explanatory variable – amount of exercise Response variable – cholesterol level Possible lurking variable – diet
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 9 of 30 Chapter 4 – Section 1 ●The most useful graph to show the relationship between two quantitative variables is the scatter diagram ●Each individual is represented by a point in the diagram The explanatory (X) variable is plotted on the horizontal scale The response (Y) variable is plotted on the vertical scale
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 10 of 30 Chapter 4 – Section 1 ●An example of a scatter diagram ●Note the truncated vertical scale!
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 11 of 30 Chapter 4 – Section 1 ●There are several different types of relations between two variables A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line ●There are several different types of relations between two variables A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line A relationship is nonlinear when, plotted on a scatter diagram, the points follow a general pattern, but it is not a line ●There are several different types of relations between two variables A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line A relationship is nonlinear when, plotted on a scatter diagram, the points follow a general pattern, but it is not a line A relationship has no correlation when, plotted on a scatter diagram, the points do not show any pattern
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 12 of 30 Chapter 4 – Section 1 ●Linear relations have points that cluster around a line ●Linear relations can be either positive (the points slants upwards to the right) or negative (the points slant downwards to the right)
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 13 of 30 Chapter 4 – Section 1 ●For positive (linear) associations Above average values of one variable are associated with above average values of the other (above/above, the points trend right and upwards) Below average values of one variable are associated with below average values of the other (below/below, the points trend left and downwards) ●For positive (linear) associations Above average values of one variable are associated with above average values of the other (above/above, the points trend right and upwards) Below average values of one variable are associated with below average values of the other (below/below, the points trend left and downwards) ●Examples “Age” and “Height” for children “Temperature” and “Sales of ice cream”
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 14 of 30 Chapter 4 – Section 1 ●For negative (linear) associations Above average values of one variable are associated with below average values of the other (above/below, the points trend right and downwards) Below average values of one variable are associated with above average values of the other (below/above, the points trend left and upwards) ●For negative (linear) associations Above average values of one variable are associated with below average values of the other (above/below, the points trend right and downwards) Below average values of one variable are associated with above average values of the other (below/above, the points trend left and upwards) ●Examples “Age” and “Time required to run 50 meters” for children “Temperature” and “Sales of hot chocolate”
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 15 of 30 Chapter 4 – Section 1 ●Nonlinear relations have points that have a trend, but not around a line ●The trend has some bend in it
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 16 of 30 Chapter 4 – Section 1 ●When two variables are not related There is no linear trend There is no nonlinear trend ●Changes in values for one variable do not seem to have any relation with changes in the other
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 17 of 30 Chapter 4 – Section 1 ●Nonlinear relations and no relations are very different Nonlinear relations are definitely patterns … just not patterns that look like lines No relations are when no patterns appear at all ●This distinction will be very important in the remainder of this chapter
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 18 of 30 Chapter 4 – Section 1 ●Examples of nonlinear relations “Age” and “Height” for people (including both children and adults) “Temperature” and “Comfort level” for people ●Examples of nonlinear relations “Age” and “Height” for people (including both children and adults) “Temperature” and “Comfort level” for people ●Examples of no relations “Temperature” and “Closing price of the Dow Jones Industrials Index” (probably) “Age” and “Last digit of telephone number” for adults
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 19 of 30 Chapter 4 – Section 1 ●Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient Determine whether there is a linear relation between two variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 20 of 30 Chapter 4 – Section 1 ●The linear correlation coefficient is a measure of the strength of linear relation between two quantitative variables ●The sample correlation coefficient “r” is ●This should be computed with software (and not by hand) whenever possible
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 21 of 30 Chapter 4 – Section 1 ●Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) ●Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) r is always between –1 and +1 ●Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) r is always between –1 and +1 Positive values of r correspond to positive relations ●Some properties of the linear correlation coefficient r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms) r is always between –1 and +1 Positive values of r correspond to positive relations Negative values of r correspond to negative relations
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 22 of 30 Chapter 4 – Section 1 ●Some more properties of the linear correlation coefficient The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation ●Some more properties of the linear correlation coefficient The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation The closer r is to –1, the stronger the negative relation … when r = –1, there is a perfect negative relation ●Some more properties of the linear correlation coefficient The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation The closer r is to –1, the stronger the negative relation … when r = –1, there is a perfect negative relation The closer r is to 0, the less of a linear relation (either positive or negative)
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 23 of 30 Chapter 4 – Section 1 ●Examples of positive correlation Strong Positive r =.8 Moderate Positive r =.5 Very Weak r =.1 ●Examples of positive correlation ●In general, if the correlation is visible to the eye, then it is likely to be strong
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 24 of 30 Chapter 4 – Section 1 ●Examples of negative correlation Strong Negative r = –.8 Moderate Negative r = –.5 Very Weak r = –.1 ●Examples of negative correlation ●In general, if the correlation is visible to the eye, then it is likely to be strong
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 25 of 30 Chapter 4 – Section 1 ●Nonlinear correlation and no correlation Nonlinear RelationNo Relation ●Nonlinear correlation and no correlation ●Both sets of variables have r = 0.1, but the difference is that the nonlinear relation shows a clear pattern
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 26 of 30 Chapter 4 – Section 1 ●Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient Determine whether there is a linear relation between two variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 27 of 30 Chapter 4 – Section 1 ●Correlation is not causation! ●Just because two variables are correlated does not mean that one causes the other to change ●Correlation is not causation! ●Just because two variables are correlated does not mean that one causes the other to change ●There is a strong correlation between shoe sizes and vocabulary sizes for grade school children Clearly larger shoe sizes do not cause larger vocabularies Clearly larger vocabularies do not cause larger shoe sizes ●Correlation is not causation! ●Just because two variables are correlated does not mean that one causes the other to change ●There is a strong correlation between shoe sizes and vocabulary sizes for grade school children Clearly larger shoe sizes do not cause larger vocabularies Clearly larger vocabularies do not cause larger shoe sizes ●Often lurking variables result in confounding
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 28 of 30 Chapter 4 – Section 1 ●Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient Determine whether there is a linear relation between two variables
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 29 of 30 Chapter 4 – Section 1 ●How large does the correlation coefficient have to be before we can say that there is a relation? ●We’re not quite ready to answer that question (that’s Chapter 12 – Section 3) ●For now, we can look at Table VIII in Appendix A ●How large does the correlation coefficient have to be before we can say that there is a relation? ●We’re not quite ready to answer that question (that’s Chapter 12 – Section 3) ●For now, we can look at Table VIII in Appendix A ●For example for n = 15 A correlation coefficient of greater than would indicate a positive linear correlation A correlation coefficient of less than –0.514 would indicate a negative linear correlation
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 30 of 30 Summary: Chapter 4 – Section 1 ●Correlation between two variables can be described with both visual and numeric methods ●Visual methods Scatter diagrams Analogous to histograms for single variables ●Numeric methods Linear correlation coefficient Analogous to mean and variance for single variables ●Care should be taken in the interpretation of linear correlation (nonlinearity and causation)