Correlation
Let’s study the age and weight of a sample of 10 adults. Create a scatterplot of the data below. Is there any relationship between the age and weight of these adults? Age 24 30 41 28 50 46 49 35 20 39 Wt 256 124 320 185 158 129 103 196 110 130
Create a scatterplot of the data below. Let’s study the height and weight of a sample of 10 adults. Create a scatterplot of the data below. Is there any relationship between the height and weight of these adults? Ht 74 65 77 72 68 60 62 73 61 64 Wt 256 124 320 185 158 129 103 196 110 130 Is it positive or negative? Weak or strong?
The farther away from a straight line – the weaker the relationship The closer the points in a scatterplot are to a straight line - the stronger the relationship. The farther away from a straight line – the weaker the relationship
Identify as having a positive association, a negative association, or no association. + 1. Heights of mothers and heights of their adult daughters - Age of a car in years and its current value + Weight of a person and calories consumed Height of a person and the person’s birth month NO Number of hours spent in safety training and the number of accidents that occur -
Correlation Coefficient (r)- A quantitative assessment of the strength & direction of the linear relationship between bivariate, quantitative data Pearson’s sample correlation is used most parameter - r (rho) statistic - r
Calculate r. Interpret r in context. Speed Limit (mph) 55 50 45 40 30 20 Avg. # of accidents (weekly) 28 25 21 17 11 6 Calculate r. Interpret r in context. There is a strong, positive, linear relationship between speed limit and average number of accidents per week.
Properties of r (correlation coefficient) legitimate values of r is [-1,1] Strong correlation No Correlation Moderate Correlation Weak correlation
The correlations are the same. value of r does not depend on the unit of measurement for either variable x (in mm) 12 15 21 32 26 19 24 y 4 7 10 14 9 8 12 Find r. Change to cm & find r. The correlations are the same.
value of r does not depend on which of the two variables is labeled x y 4 7 10 14 9 8 12 Switch x & y & find r. The correlations are the same.
value of r is non-resistant x 12 15 21 32 26 19 24 y 4 7 10 14 9 8 22 Find r. Outliers affect the correlation coefficient
r = 0, but has a definite relationship! value of r is a measure of the extent to which x & y are linearly related A value of r close to zero does not rule out any strong relationship between x and y. r = 0, but has a definite relationship!
In the 1900’s, it was noticed that as the number of Methodists ministers increased, the imports of rum also increased The correlation was r = .9999 So does an increase in ministers cause an increase in consumption of rum?
Are Ministers Alcoholics? Number of Methodist Ministers Cuban Rum Imported to Boston Year in New England (in barrels) ---------------------------------------------------------------------------------------------------------------- 1860 63 8376 1865 48 6406 1870 53 7005 1875 64 8486 1880 72 9595 1885 80 10,643 1890 85 11,265 1895 76 10,071 1900 80 10,547 1905 83 11,008 1910 105 13,885 1915 140 18,559 1920 175 23,024 1925 183 24,185 1930 192 25,434 1935 221 29,238 1940 262 34,705
Correlation does not imply causation