Chapter 7: Scatterplots, Association, and Correlation “You can observe a lot by watching.” Yogi Berra
Scatterplots Most common and effective display of data Observe patterns, trends, relationships, and outlying values Observe the relationship between two quantitative variables Ask whether there is an association between the two variables.
Scatterplots Look at the direction of the points Look at the form Positive Negative Look at the form Linear Non-linear Look at how much scatter the plot has Look for any outliers Points that stand away from the overall pattern
Scatterplots – TI Tips
Roles for Variables Explanatory Variable Response Variable Predictor variable x-axis Response Variable y-axis Roles that we choose are based upon how we think about the variables The variables may or may not explain anything or respond to anything
Correlation Conditions Measures the strength of the linear association between two quantitative variables. Quantitative Variable Condition Know the variables’ units and what they measure Strength Enough Condition Know that the correlation is linear Outlier Condition Report the correlation with and without the outlier(s).
Finding the Correlation Always check the conditions first.
Correlation Properties Sign gives the direction of the association Between -1 and +1 Treats x and y symmetrically No units Not affected by change in units, scale, or center Measures strength of linear association Sensitive to outliers
Strengthening Scatterplots When a scatterplot shows a non-linear form that consistently increases or decreases, we can straighten the form by re-expressing one or both variables.
What Can Go Wrong?? Don’t say “Correlation” when you mean “Association.” Association Vague term Describes relationship between two variables Correlation Precise term Describes the linear relationship between quantitative variables
What Can Go Wrong?? Check the Conditions! Don’t correlate categorical variables. Be sure the association is linear. Don’t confuse Correlation with Causation!!! Don’t try to explain correlation by saying that the predictor variable has caused the response variable to change. Watch out for Lurking variables!! Hidden variable simultaneously affecting both variables
Let’s Try Lunch! (Pg 133 #13) Variables: Conditions Calories: average number of calories a child consumed during lunch Time: average number of minutes a child spent at the table when lunch was served. Conditions Quantitative: both calories and time are quantitative Straight enough: scatterplot looks linear Outlier: There are a few stray points, but the none are very far from the rest of the points
Lunchtime!!
Lunchtime!! The correlation coefficient is Interpretation: The scatterplot shows a negative direction, with lower calories going with higher times. The plot is generally straight with a moderate amount of scatter. The correlation coefficient of -0.65 indicates a linear association. A few cases stand out with lower times related to higher calorie intake, as well as a few with higher times related to lower calorie intake.