The Pearson Correlation The Pearson correlation “r” measures the direction and degree of linear (straight line) relationship between two variables. The magnitude of the Pearson correlation ranges from 0 (indicating no linear relationship between X and Y) to 1.00 (indicating a perfect straight-line relationship between X and Y). The correlation can be either positive or negative depending on the direction of the relationship.
Figure 15-3 Examples of different values for linear correlations: (a) a perfect negative correlation, –1.00, (b) no linear trend, 0.00, (c) a strong positive relationship, approximately +.90, and (d) a relatively weak negative correlation, approximately –0.40.
The Pearson Correlation degree to which X and Y vary together r = divided by degree to which X and Y vary separately covariability of X and Y r = divided by variability of X and Y separately The Pearson correlation compares the amount of Covariability of X and Y to the amount X and Y vary separately If there is a perfect linear relationship every change in X is matched by a change in the Y variable see fig 15.3a which illustrates a perfect negative correlation When X goes up one unit Y goes down one unit When X goes up two units Y goes down two units So X and Y covary perfectly
The Pearson Correlation To compute the Pearson correlation Calculate Covariability which is the sum of products of deviation scores SP = S (X-Mx)(Y-My) calculate the variability of X and Y scores separately by computing SS for the scores of each variable SSX and SSY The Pearson correlation is found by computing the ratio of SP compared to square root of the (SSX)(SSY) r = SP/(SSX)(SSY)
Excel file for calculating correlation coefficient Example of a perfect positive correlation calculation Excel file for calculating correlation coefficient
The Pearson Correlation Calculations Example 15.1 Calculation of Pearson correlation r = SP / √ (ssx)(ssy) r = 2 / √ (ssx)(ssy) r = 2/ √ (10)(10) r = 0.20 Note: SS columns are not in the textbook Calculating SP from definitional formula SP = S (X-Mx)(Y-My) SP = (+3)+(-0.5)+(-1.5)+(+1) SP = +2 X Y X-Mx Y-My Products (X-Mx)2 (Y-My)2 1 3 -1.5 -2 +3 4 2 6 -0.5 +1 +1.5 -1 7 +0.5 +2 M=2.5 M=5 SP = +2 SSx= 10 SSy= 10
The Pearson Correlation Calculations Example 15 The Pearson Correlation Calculations Example 15.1 Using computational formula Not on the exam X Y X2 Y2 XY 1 3 9 2 6 4 36 12 16 7 49 21 10 20 30 110 52 SP Using Computational formula SP = SXY – (SXSY / n) SP = 52 - [10(20)] /4 = 2 SS Using Computational formula from page 113 SSx = SX2 – (SX)2 /n SSx = 30 – (10) 2 / 4 = 10 SSy = SY2 – (SY)2 /n SSy = 110 – (20)2 /4 = 10 Calculation of Pearson correlation r = SP / √ (ssx)(ssy) r = 2/ √ (10)(10) r = 0.20
Calculating Sum of Products (SP) Example 15.3 Using definitional formula table 15.1 r = SP/(SSX)(SSY) r = 28/ (64)(16) r = 28/32 = +0.875
Figure 15.4 Scatter plot of data from Example 15.3 r = SP/(SSX)(SSY) r = 28/ (64)(16) r = 28/32 = +0.875 X Y 2 10 6 4 8 Figure 15.4 Scatter plot of data from Example 15.3
Time For More Fun With SPSS
Using and Interpreting The Pearson Correlation Predictions: knowing the relationship between SAT and GPA makes it possible to use SAT to predict GPA Validity: comparing two tests of the same construct such as “anxiety” if they have high correlation their is construct validity Reliability: Test – Retest reliability Theory Verification: When a theory makes a prediction about the relationship between two variables they can be tested with correlation Amount of sleep is positively related to GPA
Interpreting Correlations Correlations describe relationships but do not explain why they exist can not draw cause and effect conclusions However causation is not ruled out either Cigarette smoking is positively correlated with cancer Correlations are sensitive to the range of scores Correlations are sensitive to outliers Correlations are not proportions size of the r value is not directly related to strength of the relationship use r2 to interpret strength of the relationship
Correlations describe relationships but do not explain why they exist can not draw cause and effect conclusions Figure 15-5 Hypothetical data showing the logical relationship between the number of churches and the number of serious crimes for a sample of U.S. cities.
Correlations are sensitive to the range of scores Problem of Restricted Range Figure 15-6 In this example, the green ellipse, when the full range of X and Y values are used there is a strong, positive correlation. However, the brown circle, when the X values have a restricted range of scores the correlation is near zero.
Correlations are sensitive to outliers Problem of Outliers Figure 15-7 A demonstration of how one extreme data point (an outlier) can influence the value of a correlation.
Correlation and Strength of the Relationship Coefficient of Determination r2 Using correlation for prediction Using SAT to predict GPA Based on degree of the relationship r value is not a good measure for predictions r2 measures the proportion of variability in one variable that can be determined by the other variable Small, Medium, Large see table 9.3 Used as a measure of effect size for t test Amount of variance in the dependent explained by the independent
r = + 0.00 r2 = 0.00 r = + 0.60 r2 = 0.36 r = + 1.00 r2 = 1.00 Figure 15.8 Three sets of data showing three different degrees of linear relationships.
Correlation and the Strength of the Relationship When there is a less-than-perfect correlation between two variables, extreme scores (high or low) for one variable tend to be paired with the less extreme scores (more toward the mean) on the second variable. This fact is called regression toward the mean. FIGURE 15.9 A demonstration of regression toward the mean. The figure shows a scatter plot for data with a less-than-perfect correlation. Notice that the highest scores on variable I (extreme right-hand points) are not the highest scores on variable 2, but are displaced downward toward the mean. Also, the lowest scores on variable 1 (extreme left-hand points) are not the lowest scores on variable 2, but are displaced upward toward the mean.
The Pearson Correlation and z-scores Calculations for Pearson Correlation Coefficient Definitional Formula r = SP / √ (ssx)(ssy) SP = S (X-Mx)(Y-My) Computational Formula SP = SXY - SXSY / n z – score formula (for samples) r = Szxzy / n-1
Partial Correlations Occasionally a researcher may suspect that the relationship between two variables is being distorted by the influence of a third variable. A partial correlation measures the relationship between two variables while controlling the influence of a third variable by holding it constant. With three variables, X “churches’, Y “crimes”, and Z “city size” it is possible to compute three individual Pearson correlations: rXY measuring the correlation between X and Y (churches-crimes) rXZ measuring the correlation between X and Z (churches-city size) rYZ measuring the correlation between Y and Z (crimes-city size)
Partial Correlations These three individual correlations can then be used to compute a partial correlation. For example, the partial correlation between X and Y, holding Z constant, is determined by the following formula that you do not need to know for the exam TABLE 15.2 Hypothetical data showing the relationship between the number of churches, the number of crimes, and the populations for a set of n = 15 cities.
Hypothetical Data Showing the Logical Relationship FIGURE 15.10 Hypothetical data showing the logical relationship between the number of churches and the number of crimes for three groups of cities: those with small populations (Z = 1), those with medium populations (Z = 2), and those with large populations (Z = 3). FIGURE 15.10 Hypothetical data showing the logical relationship between the number of churches and the number of crimes for three groups of cities: those with small populations (Z = 1), those with medium populations (Z = 2), and those with large populations (Z = 3).