Download presentation
Presentation is loading. Please wait.
Published byAlaina Glenn Modified over 9 years ago
1
Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Area of primary visual cortex Visual Acuity Left hemisphere Right hemisphere This graph is called a scatter plot or scatter diagram
2
How do we quantify the strength of the relationship between the two variables in a bivariate distribution?
4
Example from the book: Two measures made for each subject – stress level and eating difficulties StressE.D. 179 813 87 2018 1411 71 215 2215 1926 3028 5101520253035 5 10 15 20 25 Stress Eating Difficulties
5
The most common way to quantify the relation between the two variables in a bivariate distribution is the Pearson correlation coefficient, labeled r. r is always between -1 and 1. The z-score formula is the most intuitive formula: 179 813 87 2018 1411 71 215 2215 1926 3028 X Y 16.60 7.02 13.30 8.28 x = x = y = y = z x z y 0.06-0.52-0.03 -1.23-0.040.04 -1.23-0.760.93 0.480.570.27 -0.37-0.280.10 -1.37-1.482.03 0.63-0.63 0.770.210.16 0.341.530.52 1.911.773.39 6.68 raw scoresz scores Example: use the z-score formula to calculate r:
6
179 813 87 2018 1411 71 215 2215 1926 3028 x y 0.06-0.52-0.03 -1.23-0.040.04 -1.23-0.760.93 0.480.570.27 -0.37-0.280.10 -1.37-1.482.03 0.63-0.63 0.770.210.16 0.341.530.52 1.911.773.39 z x z y How does each data point contribute to the correlation value? 30 xx yy Points in the upper right or lower left quadrants add to the correlation value Points in the upper left or lower right subtract to the correlation value. 5101520253035 5 10 15 20 25 Stress Eating Difficulties r = 0.68
7
Fun fact about the Pearson correlation statistic Since the z-scores do not change when you add or multiply the raw scores, the Pearson correlation doesn’t change either. multiplying y by 2 and adding 100
8
Similarly, the correlation stays the same no matter how you stretch your axes: As a rule, you should plot your axes with an equal scale. 102030 5 10 15 20 25 Stress Eating Difficulties r = 0.68 02040 0 5 10 15 20 25 30 Stress Eating Difficulties r = 0.68 51015202530 0 10 20 30 Stress Eating Difficulties r = 0.68
9
Guess that correlation! Average of parent's height (in) Student's height (in) n = 90, r = 0.34
10
Guess that correlation! 5860626466687072 66 68 70 72 74 76 78 Father‘s height (in) Male student's height (in) n = 21, r = 0.34
11
5055606570758085 50 55 60 65 70 75 Mother's height (in) Female student's height (in) n = 70, r = 0.68
12
Guess that correlation! High School GPA UW GPA n = 90, r = 0.19
13
Guess that correlation! Caffeine (cups/day) Sleep (hours/night) n = 91, r = -0.12
14
Guess that correlation! Caffeine (cups/day) Drinks (per week) n = 91, r = 0.01
15
Guess that correlation! Facebook friends Drinks (per week) n = 91, r = 0.10
16
Guess that correlation! Favorite outdoor temperature (F) Video game playing (hours/week) n = 91, r = -0.19
17
020406080100 70 80 90 100 110 120 130 140 x y r = -0.56 Guess that correlation!
18
102030405060 105 110 115 120 125 130 135 140 145 150 x y r = 0.94 Guess that correlation!
19
102030405060708090 100 110 120 130 140 150 160 x y r = 0.08 Guess that correlation!
20
-20-15-10-505 135 140 145 150 155 x y r = -1.00 Guess that correlation!
21
-40-30-20-10010203040 80 90 100 110 120 130 140 x y r = -0.08 Guess that correlation!
22
x y r = 0.49 Guess that correlation!
23
x y r = -0.92 Guess that correlation!
24
x y r = -0.77 Guess that correlation!
25
r is a measure of the linear relation between two variables x y r = 0.01
26
x y r = 0.00 Guess that correlation!
27
x y r = 0.91 Guess that correlation!
28
Z-Score formula for calculating r (intuitive, but not very practical) Deviation-Score formula for calculating r: (somewhat intuitive, somewhat more practical) Substituting the formula for z: Computational formula for calculating r: (less intuitive, more practical)
29
Computational formula for calculating r: (less intuitive, more practical) A little algebra shows that: Computational raw score formula for calculating r: (least intuitive, most practical)
30
Using the Computational raw-score formula: nXYX2X2 Y2Y2 XY 1017928981153 81364169104 87644956 2018400324360 1411196121154 7249414 21544125105 2215484225330 1926361676494 3028900784840 Totals166134324824582610 SS X 492.4 SSy662.4 r0.675
31
A second measure of correlation, called the Spearman Rank-Order Coefficient is appropriate for ordinal scores. It is calculated by: Where D is the difference between each pair of ranks. Most often used when: a)At least one variable is an ordinal scale b)One of the distributions is very skewed or has outliers
32
Fact: (According to Wikipedia anyway) In 1995, National Pax had planned to replace the "Sir Isaac Lime" flavor with "Scarlett O'Cherry," until a group of Orange County, California fourth-graders created a petition in opposition and picketed the company's headquarters in early 1996. The crusade also included an e-mail campaign, in which a Stanford professor reportedly accused the company of "Otter-cide." After meeting with the children, company executives relented and retained the Sir Isaac Lime flavor. [1] Orange County, CaliforniaStanford [1] Example: Is there a correlation between your preference for Otter Pops® flavors and mine?
33
Example: Suppose two wine experts were asked to rank-order their preference for eight wines. How can we measure the similarity of their rankings? XYRank XRank YDD2D2 12121 212111 3535-24 434311 545411 67671 7878 1 868624 n=814
34
Pearson correlation is much more sensitive to outlying values than the Spearman coefficient. From: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
35
Pearson correlation is much more sensitive to outlying values than the Spearman coefficient. Caffeine (cups/day) Sleep (hours/night) n = 89 Pearson's r = 0.06 Spearman's r s = 0.07
36
Only the rank order matters for the Spearman coefficient
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.