Download presentation
Presentation is loading. Please wait.
1
S1 :: Chapter 6 Correlation
Dr J Frost Last modified: 20th January 2016
2
Recap of correlation Correlation gives the strength of the relationship (and the type of relationship) between two variables. Weak negative correlation ? ? Type of correlation: Weak positive correlation ? ? strength type No correlation ? ? Strong positive correlation ?
3
Formula based on definition
π π₯π₯ ! π π₯π₯ represents the total squared distance from the mean. π π₯π₯ = π₯β π₯ =Ξ£ π₯ 2 β Ξ£π₯ 2 π = Ξ£ π₯ 2 π β Ξ£π₯ π 2 Formula based on definition ? Bro Exam Tip: Given in formula booklet, but useful to memorise. Simplified formula ? Recall that variance is defined as βthe average squared distance from the meanβ. We could therefore express π 2 in terms of π π₯π₯ : π π = πΊ ππ π ?
4
Covariance ? We understand variance as βhow much a variable variesβ.
(this wonβt be tested in an exam but is intended to provide background) We understand variance as βhow much a variable variesβ. We can extend variance to two variables. We might be interested in how one variable varies with another. ? We can say that as distance (say π₯) increases, the cost (say π¦) increases. Thus the covariance of π₯ and π¦ is positive.
5
Covariance (this wonβt be tested in an exam but is intended to provide background) Comment on the covariance between the variables. π¦ π¦ π₯ π₯ ? As π¦ increases, π₯ doesnβt change very much. So the covariance is small (but positive) ? As π₯ increases, π¦ doesnβt change very much. So the covariance is small (but positive)
6
Covariance (this wonβt be tested in an exam but is intended to provide background) Comment on the covariance between the variables. π¦ π¦ π₯ π₯ As π¦ varies, π₯ doesnβt vary at all. So we say that variables are independent, and the covariance is 0. ? ? As π₯ increases, π¦ decreases. So the covariance is negative.
7
π π₯π¦ = π₯β π₯ (π¦β π¦ ) =Ξ£π₯π¦β Ξ£π₯ Ξ£π¦ π
π π¦π¦ Just as π π₯π₯ gave a measure of how much a variable varies, π π₯π¦ gives a measure of how two variables π₯ and π¦ vary with each other. π π₯π¦ = π₯β π₯ (π¦β π¦ ) =Ξ£π₯π¦β Ξ£π₯ Ξ£π¦ π ! Simplified formula ? Interesting things to note (but not examined): Just as ππππππππ π₯ = π π₯π₯ π , πͺπππππππππ π,π = πΊ ππ π How could ππππππππ(π₯) be expressed in terms of covariance? π½πππππππ π =πͺπππππππππ π,π i.e. variance is the extent to which a variable varies with itself! ? ?
8
Have an intelligent guess based on the discussion above.
Product Moment Correlation Coefficient (PMCC) We saw that π π₯π¦ gives a measure of how two variables vary with each other. That sounds like correlation! Wouldnβt it be nice if we could somehow βnormaliseβ it so we end up with just a number between -1 and 1β¦ ! π= π π₯π¦ π π₯π₯ π π¦π¦ Have an intelligent guess based on the discussion above. ? Weβll interpret what that means in a second. π is known as the Product Moment Correlation Coefficient (PMCC).
9
Interpreting the PMCC Weβve seen the PMCC varies between -1 and 1.
means Perfect positive correlation. ? π=1 means No correlation ? π=0 means Perfect negative correlation. ? π=β1
10
Interpreting the PMCC Match the π value to each scatter diagram. π=0.8
π=β0.4 π=0.96
11
Example Ξ£π₯=191.1 Ξ£π¦=229 Ξ£π₯π¦=7296.7 Ξ£ π₯ 2 =6105.39 Ξ£ π¦ 2 =8753 π=6 ? ?
Baby A B C D E F Head Circumference (π) 31.1 33.3 30.0 31.5 35.0 30.2 Gestation Period (π) 36 37 38 40 Ξ£π₯=191.1 ? Ξ£π¦=229 ? Ξ£π₯π¦=7296.7 ? Ξ£ π₯ 2 = Ξ£ π¦ 2 =8753 π=6 ? ? ? π π₯π₯ =Ξ£ π₯ 2 β Ξ£π₯ 2 π =18.855 ? π= π π₯π¦ π π₯π₯ π π¦π¦ =0.196 ? π π¦π¦ =Ξ£ π¦ 2 β Ξ£π¦ 2 π =12.833 ? π π₯π¦ =Ξ£π₯π¦β Ξ£π₯ Ξ£π¦ π =3.05 ?
12
Letβs do it on our calculators!
Baby A B C D E F Head Circumference (π) 31.1 33.3 30.0 31.5 35.0 30.2 Gestation Period (π) 36 37 38 40 Put in Stats mode: MODE β2 Select 2 for π΄+π΅π (i.e. calculations to do with linear relationships) Insert the data into your table. Use the arrow keys and β=β to add the values. Once done, press the π΄πΆ button. This goes to normal calculation input. We want to insert π into your calculation. Press ππ»πΌπΉπ+1, and choose 5 for REGRESSION. Select 3 for π. π is now in your calculation, so press =.
13
Test Your Understanding
June 2013 Q1 ? ? ?
14
Further Practice Quite often the values are given to you in an exam. ? ? ? ? ? ? ? ?
15
Interpreting the PMCC ? ? βInterpretβ vs βStateβ
In general in Statistics exams, the word βinterpretβ means βexplain in context using non-statistical languageβ. Bob wants to establish if thereβs a connection between waiting time (π₯) at the post office and customer satisfaction (π¦). He calculates π as Interpret this correlation coefficient. A bad answer (that may or may not be accepted): βStrong negative correlationβ (this is stating the correlation not interpreting it) ? A good answer: βAs the waiting time increases, the customer satisfaction tends to decreaseβ. ?
16
Exam Questions (on provided sheet) Q1 ? ? ?
17
(Before you go on to Q2) Effects of coding
We know that ππππππππ π₯ = π π₯π₯ π and π= π π₯π¦ π π₯π₯ π π¦π¦ Therefore, if all our data values π₯ get k times bigger in size and values π¦ become π times bigger, what happens toβ¦ (Recap) Variance of π₯: π 2 times as big π π₯π₯ : π π¦π¦ : π 2 times as big π π₯π¦ : ππ times as big π: Unaffected! ? Bro Exam Note: For the purposes of the S1 exam, you just need to remember that: ! Coding affects π π₯π₯ in the same way that the variance is affected. i.e. If the variance becomes 9 times larger, so does π π₯π₯ . PMCC is completely unaffected by (linear) coding. ? ? ? ?
18
Example π 1020 1032 1028 1034 1023 1038 π 320 335 345 355 360 380 π= π₯β1020 1 π= π¦β300 5 π 12 8 14 3 18 π 4 7 9 11 16 We can now just find the PMCC of this new data set, and no further adjustment is needed. ? π=0.655
19
Exam Questions (on provided sheet) Q2 ? ? ?
20
Exam Questions (on provided sheet) Q3 ? ?
21
Exam Questions (on provided sheet) Q4 ? ? ?
22
Exam Questions (on provided sheet) Q5 ? ? ?
23
Exam Questions (on provided sheet) Q6 ? ? ?
24
Exam Questions (on provided sheet) Q7 ? ? ?
25
Exam Questions (on provided sheet) Q8 ? ? ? ? ?
26
Exam Questions (on provided sheet) Q9 ? ? ?
27
Limitations of correlation
Often thereβs a 3rd variable that explains two others, but the two variables themselves are not connected. Q1: The number of cars on the road has increased, and the number of DVD recorders bought has decreased. Is there a correlation between the two variables? ? Buying a car does not necessarily mean that you will not buy a DVD recorder, so we cannot say there is a correlation between the two. Q2: Over the past 10 years the memory capacity of personal computers has increased, and so has the average life expectancy of people in the western world. Is there are correlation between these two variables? ? The two are not connected, but both are due to scientific development over time (i.e. a third variable!)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.