Correlation
What is a correlation? A correlation examines the relationship between two measured variables. No manipulation by the experimenter/just observed. E.g., Look at relationship between height and weight. You can correlate any two variables as long as they are numerical (no nominal variables) Is there a relationship between the height and weight of the students in this room? Of course! Taller students tend to weigh more.
1) Strength of Relationships 2 aspects of the relationship: Strength and Direction. The relationship between any 2 variables is rarely a perfect correlation. Perfect correlation: +1.00 OR –1.00 strongest possible relationship Tough to find. No correlation: 0.00 (no relationship). E.g, height and social security #.
2) Direction of the Relationship Positive relationship – Variables change in the same direction. As X is increasing, Y is increasing As X is decreasing, Y is decreasing E.g., As height increases, so does weight. Negative relationship – Variables change in opposite directions. As X is increasing, Y is decreasing As X is decreasing, Y is increasing E.g., As TV time increases, grades decrease Indicated by sign; (+) or (-).
Positive Correlation–as x increases, y increases Scatter Plots and Types of Correlation x = SAT score y = GPA 4.00 3.75 3.50 3.25 GPA 3.00 2.75 2.50 2.25 2.00 1.75 1.50 300 350 400 450 500 550 600 650 700 750 800 Math SAT Positive Correlation–as x increases, y increases
Negative Correlation–as x increases, y decreases Scatter Plots and Types of Correlation x = hours of training y = number of accidents 60 50 40 Accidents 30 20 10 2 4 6 8 10 12 14 16 18 20 Hours of Training Negative Correlation–as x increases, y decreases
Scatter Plots and Types of Correlation x = height y = IQ 160 150 140 130 IQ 120 110 100 90 80 60 64 68 72 76 80 Height No linear correlation
Correlation Coefficient Interpretation Range Strength of Relationship 0.00 - 0.20 Very Low 0.20 - 0.40 Low 0.40 - 0.60 Moderate 0.60 - 0.80 High Moderate 0.80 - 1.00 Very High
Direction Positive relationship r = +.80 Weight Height
Direction Negative relationship r = -.80 TV watching per week Exam score
Interpreting correlations - Summary Absolute size shows strength of relationship The higher the absolute number, the stronger the relationship A correlation of -.80 is reflects as powerful a relationship as one of +.80 A correlation of 0.00 means no relationship E.g., Can’t predict GPA from ID number All correlations range from -1.00 to +1.00
Strength of relationship Perfect Correlation r = -1.0 TV watching per week Exam score
Strength of relationship Strong Correlation r = + 0.8 Quality of Breakfast Exam score
Strength of relationship Moderate Correlation r = + 0.4 Shoe Size Weight
Strength of relationship Weak Correlation (negative) r = - 0.2 Shoe Size Weight
Strength of relationship No Correlation (horizontal line) r = 0.0 IQ Height
One more example Exam Grade +.80 Amount of Study Time -.60 .00 # of classes missed Social Security Number
More examples Positive relationships: Negative relationships: water consumption and temperature. study time and grades. time spent in jail to severity of offense. What else?? Negative relationships: alcohol consumption and driving ability. # of hateful remarks and # of friends. What else?? Why used: 1) Prediction; 2) Validity (does something measure what it’s suppose to measure; 3) Reliability (does something produce a consistent score). *** Easier to do than experiments ***
Pearson correlation coefficient r = the Pearson coefficient r measures the amount that the two variables (X and Y) vary together (i.e., covary) taking into account how much they vary apart Pearson’s r is the most common correlation coefficient; there are others.
Computing the Pearson correlation coefficient To put it another way: Or
Sum of Products of Deviations Measuring X and Y individually (the denominator): compute the sums of squares for each variable Measuring X and Y together: Sum of Products Definitional formula Computational formula n is the number of (X, Y) pairs
Correlation Coefficent: the equation for Pearson’s r: expanded form:
Example What is the correlation between study time and test score:
Calculating values to find the SS and SP:
Calculating SS and SP
Calculating r
Limitations of Pearson’s r Correlation does not mean causation!! Third Variable problem – there’s always the possibility of a third factor causing the relationship. E.g., Moderate, positive relationship between viewing violent TV and engaging in aggressive behaviors.
Possibilities Tendency to engage Viewing violent in aggressive behaviors Viewing violent television Tendency to engage in aggressive behaviors Viewing violent television Tendency to engage in aggressive behaviors A third factor; EX. genetic tendency to like violence Viewing violent television
Limitations of Pearson’s r Correlation does not mean causation Restriction of range Restricted range of measured values can lead to inaccurate conclusions about the data
Limitations of Pearson’s r Outliers (extreme scores) Scores with extreme X and/or Y value can drastically effect Pearson’s r Ambiguity of the strength of the relationship Pearson r does not give a directly interpretable strength of the relationship between X and Y 5. Interval or ratio data.
Coefficient of Determination r2 = percentage of variance in Y accounted for by X Calculated by squaring r (Pearson correlational coefficient) Ranges from 0 to 1 (positive only) This number is a meaningful proportion (unlike the Pearson’s r).
Coefficient of Determination: An example What percentage of variance is accounted for in Y by X with a Pearson r = 0.50? The r2 = (0.50)2 = 0.25 = 25% The number is always positive