Download presentation
Presentation is loading. Please wait.
Published byHarvey Bailey Modified over 9 years ago
1
Using Correlation to Describe Relationships between two Quantitative Variable.
2
Pearson’s Correlation Coefficient When we describe the association between two variables, we can use a scatterplot to help our description. However, words like strong, moderate, and weak to determine the strength of the relationship can be very subjective. (Remember the saying. “Beauty is in the eye of the beholder”) So statisticians have another tool, a numeric measure, to help us clarify and be somewhat consistent when we describe these relationships.
3
What it Measures In the lesson on scatterplots, we indicated that a tighter oval around the data points indicated a stronger relationship. In some sense this must mean that the closer the points are to each other, the stronger the relationship. Pearson’s Correlation Coefficient helps us to numerically measure this “spread” of our data.
4
So how does it Measure this “spread”? We know that in describing a distribution that we think about both the “center” of the distribution, and the “spread” of the distribution. When looking at the relationship between two variables, we need to consider both the “center” and the “spread” of each, and how the combination of these two distributions interact.
5
Properties of r “r” is unitless, which allows us to change scales or calculate the relationship between two variables that are not the same units “r” measures the linear relationship between two quantitative variables. -1 ≤ r ≤ 1 The sign of “r” indicates the direction of the relationship The closer “r” is to either +1 or -1, the stronger the relationship. The closer “r” is to 0, the weaker the relationship.
6
Numeric Guidelines Physical Sciences “Hard Sciences” ≥.80---Strong.50 --.80—Moderate ≤.50—Weak Social Sciences “Soft Sciences” ≥.50---Strong.30 --.50—Moderate ≤.30—Weak Remember that these numbers are just guidelines. Each set of data is different and the context for the data must be considered.
7
The Formula Notice that the formula is adding terms together (we’ll talk about what those terms are shortly) and then dividing that sum by 1 less than the number of data points we have. So, it appears that we are looking for “an average” of sorts.
8
The Formula (cont.) Now the terms that we are adding together are the product of z-scores. Remember that a z-score is the number of standard deviations a piece of data is from the mean of the distribution. So each term is the product of the z-scores in each direction (x and y) for each point. So, how can we calculate this value?
9
Back to the 1992 Dream Team United States Dream Team Charles Barkley Larry Bird Clyde Drexler Patrick Ewing “Magic” Johnson Michael Jordan Christian Laettner Karl Malone Chris Mullin Scottie Pippen David Robinson John Stockton Total Min FG-A FT-A R A S Pf Pts 48 14-22 4-5 6 7 7 10 36 47 6-12 4-5 13 7 4 2 17 54 13-21 0-0 7 9 5 5 28 54 14-24 4-9 14 0 2 5 32 72 15-25 2-4 9 21 6 3 38 67 20-45 7-7 9 9 10 7 47 17 1-3 11-12 8 2 1 6 14 45 13-21 10-14 15 2 5 4 36 60 16-20 3-4 5 8 2 2 42 56 11-17 2-3 6 14 6 3 26 57 13-20 10-14 17 1 5 3 36 23 3-6 1-1 1 8 0 1 7 600 139-236 58-78 110 88 53 51 359 Shooting Field goals, 58.9%, free throws, 74.3% Key for Table Min FG-A FT-A R A S Pf Pts Minutes played Field goals made—field goals attempted Free throws made—free throws attempted AssistsRebounds Steals Personal fouls Total points scored
10
Calculating “r” We can calculate “r” using this formula and the lists. L 1 X (minutes played) L 2 Y (points scored) L 3 Z x (x-x bar )/s x L 4 Z y (y-y bar )/s y L 5 L 3 *L 4 Once these lists are created, find the sum of L 5 and then divide by n-1
11
Another formula for “r” Starting with our original formula Now, the standard deviation of our x-values and the y-values are constants once our data has been collected, so they will be the same for each term in the summation. This means that we can factor those out of the sum leaving: Now, expanding the summation gives us:
12
Another formula for “r” (cont.) Now, using the distributive property to multiply the binomials in each term gives: Then, collapsing the sums gives:
13
Another formula for “r” (cont.) Now, the ∑x i and the ∑y i can be written as nx bar and ny bar But two of the last three terms cancel each other out, so we are left with:
14
Evaluating the Formula This formula is helpful to us because our calculator gives us each of the terms we see here With our data in the lists, L 1 (minutes) and L 2 (points) in this case, we calculate the 2-var stats to find these values.
15
Calculating “r” Now, calculate the 2- var stats for L 1, L 2 STAT CALC This gives us all the values we need to calculate “r” We can then describe numerically the relationship between minutes on the court and points scored.
16
Calculating “r” r=.82403 Now, substituting the values for each of the variables we find that the correlation coefficient, r=.82, indicating a strong, linear correlation in which as the number of minutes on the court increases, so do the points scored
17
And yet another way to find “r” In the next section, we will look at even another way to find the value of Pearson’s correlation coefficient. For now, either method used in this lesson is appropriate.
18
Additional Resources The Practice of Statistics—YMM –Pg 128 – 136 The Practice of Statistics—YMS –Pg 140-149
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.