Correlation 10/30
Relationships Between Continuous Variables Some studies measure multiple variables – Any paired-sample experiment – Training & testing performance; personality variables; neurological measures – Continuous independent variables How are these variables related? – Positive relationship: tend to be both large or both small – Negative relationship: when one is large, other tends to be small – Independent: value of one tells nothing about other
Scatterplots Graph of relationship between two variables, X and Y One point per subject – Horizontal coordinate X – Vertical coordinate Y Height = 67.7 Weight = 181.7
Correlation Measure of how closely two variables are related – Population correlation: (rho) – Sample correlation: r Direction – r > 0: positive relationship; big X goes with big Y – r < 0: negative relationship; big X goes with small Y Strength – ±1 means perfect relationship Data lie exactly on a line If you know X, you know Y – 0 means no relationship Independent: Knowing X tells nothing about Y
r = -1r = -.75r = -.5 r = -.25r = 0r =.25 r =.5r =.75r = 1
Computing Correlation 1.Get z-scores for both samples 2.Multiply all pairs 3.Get average by dividing by n – 1 Positive relationship – Positive z X tend to go with positive z Y – Negative z X tend to go with negative z Y – z X z Y tends to be positive Negative relationship – Positive z X tend to go with negative z Y – Negative z X tend to go with positive z Y – z X z Y tends to be negative z X > 0z X < 0 MXMX MYMY z Y > 0 z Y < 0 MXMX z X > 0z X < 0 MYMY z Y > 0 z Y < 0
Computing Correlation XYX – M X Y – M Y zXzX zYzY zX zYzX zY M X = 5M Y = 7 = 5.80 s X = 2.6s Y = 3.7r =.97 X Y
Correlation and Linear Relationships Correlation measures how well data fit on straight line – Assumes linear relationship between X and Y Not useful for nonlinear relationships Arousal Performance r = 0
Predicting One Variable from Another Knowing one measure gives information about others from same subject – Knowing a person’s weight tells about his height Goal: Come up with a rule or function that uses X to compute best estimate of Y Y (Y-hat) – Predicted value of Y – Function of X – Best prediction of Y based on X
Linear Prediction Simplest way to predict one variable from another Straight line through data Y is linear function of X X= 71
How Good is the Prediction? Sometimes data fall nearly on a perfect line – Strong relationship between variables – r near ±1 – Good prediction Sometimes data are more scattered – Weak relationship – r near 0 – Can’t predict well X Y X Y X Y
How Good is the Prediction? Goal: Keep error close to zero – Minimize mean squared error:
Correlation and Prediction Best prediction line minimizes MS Error – Closest to data; best “fit” Correlation determines best prediction line – Slope = r when plotting z-scores: zYzY zXzX r =.75 slope =.75
r = -1r = -.75r = -.5 r = -.25r = 0r =.25 r =.5r =.75r = 1
Explained Variance Without knowing XKnowing X Original Variance Explained Variance Reduction from knowing X Residual Variance
Properties of Correlation Measures relationship between two continuous variables – How well data are fit by a straight line Sign of r shows direction of relationship Magnitude of r shows strength of relationship – Strongest relationships have r = ±1; weak relationships have r ≈ 0 Best prediction line minimizes error of prediction (MS Error ) – Correlation gives slope of line (when using z-scores): r 2 equals proportion of variance in one variable explained by other – Reduction from original variance (s Y 2 ) to residual variance (MS Error )
Review Find the correlation of r =.7. A B C D
Review Calculate the correlation between X and Y. z X = [ ] z Y = [ ] A.-.94 B.-.70 C.-.02 D.-.001
Review The correlation between IQ and number of bicycles owned is r =.6. Predict the IQ of someone who owns 4 bikes (z bike = 2.5). Recall that µ IQ = 100 and IQ = 15. A B C D.137.5