3.1 Scatterplots and Correlation Objectives SWBAT: IDENTIFY explanatory and response variables in situations where one variable helps to explain or influences the other. MAKE a scatterplot to display the relationship between two quantitative variables. DESCRIBE the direction, form, and strength of a relationship displayed in a scatterplot and identify outliers in a scatterplot. INTERPRET the correlation. UNDERSTAND the basic properties of correlation, including how the correlation is influenced by outliers USE technology to calculate correlation. EXPLAIN why association does not imply causation.
We study relationships between two variables to make predictions, and to explain phenomena. Reminder: the response variable is the outcome we are trying to measure and the explanatory variable is what helps explain or predict changes in the response variable. The explanatory variable goes on the x-axis and the response variable on the y-axis. Four characteristics to consider when interpreting a scatterplot: Direction: positive association, negative association, or no association Form: linear, non-linear Strength: how closely the points follow the form (usually use very strong, strong, moderate, or weak) Outliers: values outside the overall pattern (there is no specific rule)
The following scatterplot shows the amount of sodium (in milligrams) and amount of fat (in grams) in salads from McDonalds (with no dressing). Describe the relationship between sodium and fat. Direction: There is a positive association between sodium and fat—salads with more sodium tend to have more fat. Form: The overall association is nonlinear, as the pattern does not follow a straight line. Strength: However, the association is fairly strong as the points do not deviate much from the nonlinear form. Outliers: There do not appear to be any outliers. Finally, there are three distinct clusters of points, formed by salads with no chicken (lower-left), salads with grilled chicken (lower-right), and salads with crispy chicken (upper-right). Within each cluster there is a positive, linear association between sodium and fat.
What is correlation (r), and what are some characteristics of correlation? The correlation r measures the direction and strength of the linear relationship between two quantitative variables. r is always a number between -1 and 1 r > 0 indicates a positive association. r < 0 indicates a negative association. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0 towards -1 or 1. The extreme values r = -1 and r = 1 occur only in the case of a perfect linear relationship. Note: correlation only works for linear relationships. You can calculate correlation for non-linear relationships, but it tells us nothing.
Measuring Linear Association: Correlation
Now it’s time for everyone’s favorite game….Guess the Correlation!!!
A few more notes about correlation: Correlation is not resistant to outliers. r is strongly affected by a few outlying observations. Something to be aware of: Correlation and association are NOT synonyms. Association is a more general word to describe the relationship between any two variables, whether numerical or categorical. Correlation is a specific measure of the strength and direction of a linear association between two numerical variables. Be cautioned: Correlation does NOT imply cause-and-effect. Just think of Happy Gilmore. He was able to crush the ball off the tee, but he still finished in last early in the movie because other facets of his game were not polished. So even though there is correlation between average driving distance and scoring average, there is no guarantee that increasing driving distance will result in lower scores.
Calculating Correlation on the TI-84 1) Turn on the diagnostic feature. Press 2 nd : 0 to enter the CATALOG. Scroll down to DiagnosticOn. Press ENTER twice so it says Done on the home screen.
2) Enter the top 10 LPGA data in L1 and L2. 3) Press the STAT button, move to the CALC menu, and scroll down to select number 8: LinReg (a+bx). After choosing enter L1, L2 and press ENTER. The last line of the output gives the value of r. In this case the correlation is r=