r xy
When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate our prediction will be
r xy We need a measure of the “strength” of a correlation
r xy We need a number that gets bigger when big numbers are paired with big numbers and small numbers are paired with small numbers We need a number that gets smaller when big numbers are paired with small numbers and small numbers are paired with big numbers
r xy Remember the height/weight example: Big number indicates this (strong positive correlation) 5’5’25’45’65’85’ a a b b, e c c d d ef f
r xy Remember the height/weight example: Small number indicates this (strong negative correlation) 5’5’25’45’65’85’ a a b b, e c c d d ef f
r xy Two sets of scores, x i and y i What could we do?
r xy What could we do?
r xy What could we do? When pairs are multiplied and the products are summed up: – Greatest when big numbers paired with big numbers and small numbers with small numbers –Least when small numbers are paired with big numbers and big numbers are paired with small numbers
r xy analogy: This gets you most money Pennies Quarters Loonies
r xy analogy:this gets you the least… Pennies Quarters Loonies
r xy analogy: Because: 3 x $1 plus 2 x $0.25 plus 1 x $0.01 is more than 1 x $1 plus 2 x $0.25 plus 3 x $0.01
r xy But there’s a problem Not a good measure because the value ultimately depends on n AND the size of the numbers
r xy Try this
r xy Try this Still not so good - doesn’t depend on n anymore, but does depend on size of x’s and y’s
r xy How about multiply deviation scores –comparing each variable relative to its respective mean
r xy Multiply deviation scores Now value depends on the spread of the data
r xy So standardize the scores
r xy This measures strength of correlation: = = r xy
r xy r xy ranges from -1.0 indicating a perfect negative correlation to +1.0 indicating a perfect positive correlation an r xy of zero indicates no correlation whatsoever. Scores are random with respect to each other.
r xy r xy also has a geometric meaning
r xy r xy also has a geometric meaning Recall that the mean of the z x and z y distributions is zero and each z-score is a deviation from the mean
r xy Each point lands in one of four quadrants point z x, z y zxzx zyzy
r xy notice that: both z x and z y are positive r xy =
r xy notice that: z x is negative and z y is positive r xy =
r xy notice that: z x is negative and z y is negative r xy =
r xy notice that: z x is positive and z y is negative r xy =
r xy So Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive I II III IV
r xy So If most points tend to fall around a line with a negative slope (II and IV), the cross products will tend to be negative Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive I II III IV
r xy So If the points were randomly scattered about, the negative and positive cross-products cancel
Covariance a related measure of the relationship between scores on two different variables is the covariance
Covariance notice that the variance (S 2 x ) is the covariance between a variable and itself !
Regression If two variables are perfectly correlated (r = + or - 1.0) then one can exactly predict a score on one variable given a score on another
Regression For example: a university charges $250 registration fee plus $100 / credit
Regression tuition = $100(X) + $250 –where X is the number of credits Notice this is a linear relationship (an equation of the form y = ax + b –a = $100/credit –b = $250 –x = number of credits
Regression Tuition as a function of credit hours is a straight line There is a perfect correlation between credit hours and tuition You could predict perfectly the tuition required given the number of credit hours
Next Time Regression - read chapter 8