Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.

Similar presentations


Presentation on theme: "Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6."— Presentation transcript:

1 Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Area of primary visual cortex Visual Acuity Left hemisphere Right hemisphere This graph is called a scatter plot or scatter diagram

2 How do we quantify the strength of the relationship between the two variables in a bivariate distribution?

3

4 Example from the book: Two measures made for each subject – stress level and eating difficulties StressE.D. 179 813 87 2018 1411 71 215 2215 1926 3028 5101520253035 5 10 15 20 25 Stress Eating Difficulties

5 The most common way to quantify the relation between the two variables in a bivariate distribution is the Pearson correlation coefficient, labeled r. r is always between -1 and 1. The z-score formula is the most intuitive formula: 179 813 87 2018 1411 71 215 2215 1926 3028 X Y 16.60 7.02 13.30 8.28  x =  x =  y =  y = z x z y 0.06-0.52-0.03 -1.23-0.040.04 -1.23-0.760.93 0.480.570.27 -0.37-0.280.10 -1.37-1.482.03 0.63-0.63 0.770.210.16 0.341.530.52 1.911.773.39 6.68 raw scoresz scores Example: use the z-score formula to calculate r:

6 179 813 87 2018 1411 71 215 2215 1926 3028 x y 0.06-0.52-0.03 -1.23-0.040.04 -1.23-0.760.93 0.480.570.27 -0.37-0.280.10 -1.37-1.482.03 0.63-0.63 0.770.210.16 0.341.530.52 1.911.773.39 z x z y How does each data point contribute to the correlation value? 30 xx yy Points in the upper right or lower left quadrants add to the correlation value Points in the upper left or lower right subtract to the correlation value. 5101520253035 5 10 15 20 25 Stress Eating Difficulties r = 0.68

7 Fun fact about the Pearson correlation statistic Since the z-scores do not change when you add or multiply the raw scores, the Pearson correlation doesn’t change either. multiplying y by 2 and adding 100

8 Similarly, the correlation stays the same no matter how you stretch your axes: As a rule, you should plot your axes with an equal scale. 102030 5 10 15 20 25 Stress Eating Difficulties r = 0.68 02040 0 5 10 15 20 25 30 Stress Eating Difficulties r = 0.68 51015202530 0 10 20 30 Stress Eating Difficulties r = 0.68

9 Guess that correlation! Average of parent's height (in) Student's height (in) n = 90, r = 0.34

10 Guess that correlation! 5860626466687072 66 68 70 72 74 76 78 Father‘s height (in) Male student's height (in) n = 21, r = 0.34

11 5055606570758085 50 55 60 65 70 75 Mother's height (in) Female student's height (in) n = 70, r = 0.68

12 Guess that correlation! High School GPA UW GPA n = 90, r = 0.19

13 Guess that correlation! Caffeine (cups/day) Sleep (hours/night) n = 91, r = -0.12

14 Guess that correlation! Caffeine (cups/day) Drinks (per week) n = 91, r = 0.01

15 Guess that correlation! Facebook friends Drinks (per week) n = 91, r = 0.10

16 Guess that correlation! Favorite outdoor temperature (F) Video game playing (hours/week) n = 91, r = -0.19

17 020406080100 70 80 90 100 110 120 130 140 x y r = -0.56 Guess that correlation!

18 102030405060 105 110 115 120 125 130 135 140 145 150 x y r = 0.94 Guess that correlation!

19 102030405060708090 100 110 120 130 140 150 160 x y r = 0.08 Guess that correlation!

20 -20-15-10-505 135 140 145 150 155 x y r = -1.00 Guess that correlation!

21 -40-30-20-10010203040 80 90 100 110 120 130 140 x y r = -0.08 Guess that correlation!

22 x y r = 0.49 Guess that correlation!

23 x y r = -0.92 Guess that correlation!

24 x y r = -0.77 Guess that correlation!

25 r is a measure of the linear relation between two variables x y r = 0.01

26 x y r = 0.00 Guess that correlation!

27 x y r = 0.91 Guess that correlation!

28 Z-Score formula for calculating r (intuitive, but not very practical) Deviation-Score formula for calculating r: (somewhat intuitive, somewhat more practical) Substituting the formula for z: Computational formula for calculating r: (less intuitive, more practical)

29 Computational formula for calculating r: (less intuitive, more practical) A little algebra shows that: Computational raw score formula for calculating r: (least intuitive, most practical)

30 Using the Computational raw-score formula: nXYX2X2 Y2Y2 XY 1017928981153 81364169104 87644956 2018400324360 1411196121154 7249414 21544125105 2215484225330 1926361676494 3028900784840 Totals166134324824582610 SS X 492.4 SSy662.4 r0.675

31 A second measure of correlation, called the Spearman Rank-Order Coefficient is appropriate for ordinal scores. It is calculated by: Where D is the difference between each pair of ranks. Most often used when: a)At least one variable is an ordinal scale b)One of the distributions is very skewed or has outliers

32 Fact: (According to Wikipedia anyway) In 1995, National Pax had planned to replace the "Sir Isaac Lime" flavor with "Scarlett O'Cherry," until a group of Orange County, California fourth-graders created a petition in opposition and picketed the company's headquarters in early 1996. The crusade also included an e-mail campaign, in which a Stanford professor reportedly accused the company of "Otter-cide." After meeting with the children, company executives relented and retained the Sir Isaac Lime flavor. [1] Orange County, CaliforniaStanford [1] Example: Is there a correlation between your preference for Otter Pops® flavors and mine?

33 Example: Suppose two wine experts were asked to rank-order their preference for eight wines. How can we measure the similarity of their rankings? XYRank XRank YDD2D2 12121 212111 3535-24 434311 545411 67671 7878 1 868624 n=814

34 Pearson correlation is much more sensitive to outlying values than the Spearman coefficient. From: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

35 Pearson correlation is much more sensitive to outlying values than the Spearman coefficient. Caffeine (cups/day) Sleep (hours/night) n = 89 Pearson's r = 0.06 Spearman's r s = 0.07

36 Only the rank order matters for the Spearman coefficient


Download ppt "Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6."

Similar presentations


Ads by Google