Chapter 5 Correlation
Suppose we found the age and weight of a sample of 10 adults. Create a scatterplot of the data. Is there any relationship between the age and weight of these adults? Age 24 30 41 28 50 46 49 35 20 39 Wt 256 124 320 185 158 129 103 196 110 130
Suppose we found the height and weight of a sample of 10 adults. Create a scatterplot of the data. Is there any relationship between the height and weight of these adults? Ht 74 65 77 72 68 60 62 73 61 64 Wt 256 124 320 185 158 129 103 196 110 130 Is it positive or negative? Weak or strong?
The farther away from a straight line, the weaker the relationship The closer the points in a scatterplot are to a straight line, the stronger the relationship The farther away from a straight line, the weaker the relationship
Positive relationship, negative relationship, or no relationship? + 1. Heights of mothers & heights of their adult daughters - Age of a car in years and its current value + Weight of a person and calories consumed Height of a person and the person’s birth month None Number of hours spent in safety training and the number of accidents that occur -
Correlation Coefficient (r) Measures strength & direction of a linear relationship Used for bivariate numerical data
Calculate r. Interpret it in context. Speed Limit (mph) 55 50 45 40 30 20 Avg. # of accidents (weekly) 28 25 21 17 11 6 Calculate r. Interpret it in context. There is a strong, positive, linear relationship between speed limit and average number of accidents per week.
Properties of r Legitimate values of r: from -1 to 1 Strong correlation No Correlation Moderate Correlation Weak correlation
r does not depend on units x (in mm) 12 15 21 32 26 19 24 y 4 7 10 14 9 8 12 Find r. Change x to cm & find r. The correlations are the same. r does not depend on units
r does not depend on which variable is x and which is y Switch x & y, then find r. The correlations are the same. r does not depend on which variable is x and which is y
Outliers affect the correlation coefficient x 12 15 21 32 26 19 24 y 4 7 10 14 9 8 22 Find r. Outliers affect the correlation coefficient r is non-resistant
r only measures how well x & y are linearly related r = 0, but there's a definite relationship! r only measures how well x & y are linearly related
Do Methodist ministers drink a lot of Cuban rum?
Methodist Ministers Barrels of Cuban Rum Year in New England Imported to Boston 1860 63 8376 1865 48 6406 1870 53 7005 1875 64 8486 1880 72 9595 1885 80 10,643 1890 85 11,265 1895 76 10,071 1900 80 10,547 1905 83 11,008 1910 105 13,885 1915 140 18,559 1920 175 23,024 1925 183 24,185 1930 192 25,434 1935 221 29,238 1940 262 34,705
The correlation coefficient for this relationship is r =. 999986 The correlation coefficient for this relationship is r = .999986. What can we conclude?
Did the increase in Methodist ministers cause the increase in consumption of Cuban rum?
Study suggests attending religious services sharply cuts risk of death
Facebook Users Get Worse Grades in College
TV raises blood pressure in obese kids
Deep-voiced men have more kids
Height Affects How People Perceive Their Quality Of Life
Eating pizza cuts cancer risk
Diet of fish can prevent teen violence
Happiness wards off heart disease, study suggests
Sugar Rush… to Prison? Study Says Lots of Candy Could Lead to Violence
Correlation does not imply causation