Activity: Car Correlations Consumer Reports’ data from a sample of n=109 car models We’ll explore the following associations: (a) Weight vs. City MPG (b) Weight vs. Fuel Capacity (c) PageNumber vs. Fuel Capacity (d) Weight vs. Qtr Mile Time (e) Acc060 vs. Qtr Mile Time (f) City MPG vs. Qtr Mile Time Make your initial guesses on handout…
Scatterplots
Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation: ρ (“rho”) Be able to find this with technology
Correlations What are the properties of correlation? (-.91) (.89) (-.08) (-.45) (.99) (.51)
Correlation ≤ r ≤ 1 2. The sign indicates the direction of association 1. positive association: r > 0 2. negative association: r < 0 3. no linear association: r ≠ 0 3. The closer r is to ±1, the stronger the linear association 4. r has no units and does not depend on the units of measurement 5. The correlation between X and Y is the same as the correlation between Y and X
Is the formula worth teaching? z-score +, + –, – +, – –, + It depends! Might be worth showing, but don’t expect students to use it… (USE TECHNOLOGY!)
Correlation Guessing Game Can create a school-specific or class specific leaderboard! Group ID: uscots
Correlation r = 0.43 NFL Teams Dennis Lock Director of Analytics Miami Dolphins
Correlation Same plot, but with Dolphins and Raiders (outliers) removed r = 0.08
Correlation Cautions 1. Correlation can be heavily affected by outliers. Always plot your data!
Testosterone Levels and Time What is the correlation between testosterone levels and hour of the day? a) Positive b) Negative c) About 0 Are testosterone level and hour of the day associated? a) Yes b) No
Correlation Cautions 1. Correlation can be heavily affected by outliers. Always plot your data! 2. r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data!
TVs and Life Expectancy Should you buy more TVs to live longer?
Correlation Cautions 1. Correlation can be heavily affected by outliers. Always plot your data! 2. r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! 3. Correlation does not imply causation! Beware of confounding variables.