Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay
Univariate vs. Bivariate Statistics Bivariate analyses/graphical representations Scatterplots Correlation: Univariate analyses/ graphical representations: Frequency histograms Measures of central tendency and variability Z-scores linear pattern of relationship between one variable (x) and another variable (y) – an association between two variables relative position of one variable correlates with relative distribution of another variable How can we define correlation?
Correlations allow us to look for evidence of a relationship between variables.
Correlations can vary in strength
Correlations can vary in direction So, how do we QUANTIFY a correlation? We need to come up with a NUMBER that reflects both the strength and direction of the correlation. flu shots given
Correlation finds the strength and direction of the best fitting line to the data. XY - ( X) ( Y) n r = X 2 - ( X) 2 Y 2 - ( Y) 2 n n [ [ ] ] The number we calculate in Statistics is called the correlation coefficient. Developed by Karl Pearson, it is also sometimes referred to as Pearson’s r.
Example Calculation: the following data represent the number of emergency room visits per year (x) and cigarettes smoked a day (y) by three individuals recruited from New York Methodist Hospital. = 0.94 x237x237 y456y456 x y xy (12) (15) 3 62 (12) (15) √[(14)(2)] =
Another way to think of the correlation The product of the Z-scores for each pair of scores r = ( Z x Z y ) /( n-1) x237x237 y456y456 Zx Zy 0 1 If x=2, (2-4)/2.65 = -.76 … If y=4, (4-5)/1 = -1
Another way to think of the correlation The product of the Z-scores for each pair of scores r = ( Z x Z y ) /( n-1) x237x237 y456y456 Zx Zy 0 1 ZxZy =.945 =.95
Interpreting the Pearson r * Range of values: Interpreting the value of r -1.0 to +1.0 * Direction from the sign negative => anticorrelated As one variable goes up the other goes down in value. positive => correlatedAs on variable goes up so does the other. * Strength from the magnitude | r | = 1.0perfect relationship | r | = 0.0no evidence of relationship 0.0 < | r | < 1.0intermediate strength relationship
When NOT to use a correlation: Extreme scores r =.97 Non-linear relationships r =.20
Some Issues with Correlation NO CAUSATION! Spurious correlation
Number of people who drowned in a swimming pool & number of Nicholas Cage films in a given year =.67 Per capita consumption of cheese & number of deaths by becoming tangled in bed sheets =.95 Divorce rate in Maine & consumption of margarine in the United States =.99
Preview of Next Lecture: Regression finding the best fitting line to a data set.