Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)
Goals ⇨ Introduce concepts of Covariance Correlation ⇨ Develop computational formulas 2R F Riesenfeld Sp 2010CS5961 Comp Stat
Covariance ⇨ Variables may change in relation to each other ⇨ measures how much the movement in one variable predicts the movement in a corresponding variable ⇨ Covariance measures how much the movement in one variable predicts the movement in a corresponding variable 3R F Riesenfeld Sp 2010CS5961 Comp Stat
Smoking and Lung Capacity ⇨ Example: investigate relationship between and ⇨ Example: investigate relationship between cigarette smoking and lung capacity ⇨ Data: sample group response data on smoking habits, measured lung capacities, respectively ⇨ Data: sample group response data on smoking habits, and measured lung capacities, respectively R F Riesenfeld Sp 2010CS5961 Comp Stat4
Smoking v Lung Capacity Data 5 N Cigarettes ( X )Lung Capacity ( Y ) R F Riesenfeld Sp 2010CS5961 Comp Stat
Smoking and Lung Capacity 6
Smoking v Lung Capacity ⇨ Observe that as smoking exposure goes up, corresponding lung capacity goes down ⇨ Variables inversely ⇨ Variables covary inversely ⇨ and quantify relationship ⇨ Covariance and Correlation quantify relationship 7R F Riesenfeld Sp 2010CS5961 Comp Stat
Covariance ⇨ Variables that inversely, like smoking and lung capacity, tend to appear on opposite sides of the group means ⇨ Variables that covary inversely, like smoking and lung capacity, tend to appear on opposite sides of the group means When smoking is above its group mean, lung capacity tends to be below its group mean. ⇨ Average measures extent to which variables covary, the degree of linkage between them ⇨ Average product of deviation measures extent to which variables covary, the degree of linkage between them 8R F Riesenfeld Sp 2010CS5961 Comp Stat
The Sample Covariance ⇨ Similar to variance, for theoretical reasons, average is typically computed using ( N -1 ), not N. Thus, 9R F Riesenfeld Sp 2010CS5961 Comp Stat
Cigs ( X )Lung Cap ( Y ) Calculating Covariance R F Riesenfeld Sp 2010CS5961 Comp Stat10
Calculating Covariance Cigs (X ) Cap (Y ) ∑= R F Riesenfeld Sp 2010CS5961 Comp Stat11
Evaluation yields, Evaluation yields, 12 Covariance Calculation (2) R F Riesenfeld Sp 2010CS5961 Comp Stat
Covariance under Affine Transformation 13R F Riesenfeld Sp 2010CS5961 Comp Stat
14 CovarianceunderAffineTransf Covariance under Affine Transf (2) R F Riesenfeld Sp 2010CS5961 Comp Stat
(Pearson) Correlation Coefficient r xy ⇨ Like covariance, but uses Z-values instead of deviations. Hence, invariant under linear transformation of the raw data. 15R F Riesenfeld Sp 2010CS5961 Comp Stat
Alternative (common) Expression 16R F Riesenfeld Sp 2010CS5961 Comp Stat
Computational Formula 1 17R F Riesenfeld Sp 2010CS5961 Comp Stat
Computational Formula 2 18R F Riesenfeld Sp 2010CS5961 Comp Stat
Table for Calculating r xy 19 Cigs ( X ) X 2 XYY 2 Cap (Y ) ∑= R F Riesenfeld Sp 2010CS5961 Comp Stat
Computing from Table Computing r xy from Table 20R F Riesenfeld Sp 2010CS5961 Comp Stat
Computing Correlation 21R F Riesenfeld Sp 2010CS5961 Comp Stat
Conclusion ⇨ r xy = implies almost certainty smoker will have diminish lung capacity ⇨ Greater smoking exposure implies greater likelihood of lung damage 22R F Riesenfeld Sp 2010CS5961 Comp Stat
End Covariance & Correlation 23 Notes R F Riesenfeld Sp 2010CS5961 Comp Stat