S1 :: Chapter 6 Correlation

Slides:



Advertisements
Similar presentations
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Advertisements

Linear regression and correlation
CORRELATON & REGRESSION
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph.
S1: Chapter 6 Correlation Dr J Frost Last modified: 21 st November 2013.
Correlation This Chapter is on Correlation We will look at patterns in data on a scatter graph We will be looking at how to calculate the variance and.
Correlation 1 Scatter diagrams Measurement of correlation Using a calculator Using the formula Practice qs.
Examining Relationships in Quantitative Research
Section 4.1 Scatter Diagrams and Correlation. Definitions The Response Variable is the variable whose value can be explained by the value of the explanatory.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Correlation.  It should come as no great surprise that there is an association between height and weight  Yes, as you would expect, taller students.
S1: Chapter 2/3 Data: Measures of Location and Dispersion Dr J Frost Last modified: 9 th September.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Chapter 8: Simple Linear Regression Yang Zhenlin.
CORRELATION ANALYSIS.
Dr J Frost S1: Chapter 7 Regression Dr J Frost
Correlation Assumptions: You can plot a scatter graph You know what positive, negative and no correlation look like on a scatter graph.
Correlation. 2  In this topic, we will look at patterns in data on a scatter graph.  We will see how to numerically measure the strength of correlation.
CORRELATION. Correlation  If two variables vary in such a way that movement in one is accompanied by the movement in other, the variables are said to.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
S1: Chapter 4 Representation of Data Dr J Frost Last modified: 20 th September 2015.
Chapter 3: Describing Relationships
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
GCSE/IGCSE-FM Functions
Copyright © Cengage Learning. All rights reserved.
Scatter Plots and Correlation Coefficients
Correlations FSE 200.
Correlation S1 Maths with Liz.
Warm Up Scatter Plot Activity.
Objectives Fit scatter plot data using linear models with and without technology. Use linear models to make predictions.
CHAPTER 7 LINEAR RELATIONSHIPS
Covariance and Correlation
Coefficient of Determination
P1 Chapter 8 :: Binomial Expansion
C3 Chapter 3: Exponential and Log Functions
S2 Chapter 6: Populations and Samples
Chapter 5 STATISTICS (PART 4).
Objectives Fit scatter plot data using linear models with and without technology. Use linear models to make predictions.
RELATIONS AND FUNCTIONS
Correlation.
Statistics for the Social Sciences
CHAPTER 10 Correlation and Regression (Objectives)
Descriptive Analysis and Presentation of Bivariate Data
2. Find the equation of line of regression
Dr J Frost GCSE: Vectors Dr J Frost Last modified:
Stats1 Chapter 4 :: Correlation
Keller: Stats for Mgmt & Econ, 7th Ed
Regression.
S1: Chapter 4 Representation of Data
Lecture Notes The Relation between Two Variables Q Q
Dr J Frost GCSE Iteration Dr J Frost Last modified:
P1 Chapter 8 :: Binomial Expansion
CORRELATION ANALYSIS.
11A Correlation, 11B Measuring Correlation
y = mx + b Linear Regression line of best fit REMEMBER:
Correlation and the Pearson r
CORRELATION AND MULTIPLE REGRESSION ANALYSIS
Further Stats 1 Chapter 5 :: Central Limit Theorem
Inferential Statistics
Objectives Vocabulary
Correlation and Covariance
Scatter Graphs Spearman’s Rank correlation coefficient
Correlation & Trend Lines
Section 11.1 Correlation.
IGCSE Solving Equations
Stats Yr2 Chapter 1 :: Regression, Correlation & Hypothesis Tests
Linear Regression and Correlation
Scatterplots contd: Correlation The regression line
Presentation transcript:

S1 :: Chapter 6 Correlation www.drfrostmaths.com Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 20th January 2016

Recap of correlation Correlation gives the strength of the relationship (and the type of relationship) between two variables. Weak negative correlation ? ? Type of correlation: Weak positive correlation ? ? strength type No correlation ? ? Strong positive correlation ?

Formula based on definition 𝑆 𝑥𝑥 ! 𝑆 𝑥𝑥 represents the total squared distance from the mean. 𝑆 𝑥𝑥 = 𝑥− 𝑥 2 =Σ 𝑥 2 − Σ𝑥 2 𝑛 = Σ 𝑥 2 𝑛 − Σ𝑥 𝑛 2 Formula based on definition ? Bro Exam Tip: Given in formula booklet, but useful to memorise. Simplified formula ? Recall that variance is defined as “the average squared distance from the mean”. We could therefore express 𝜎 2 in terms of 𝑆 𝑥𝑥 : 𝝈 𝟐 = 𝑺 𝒙𝒙 𝒏 ?

Covariance ? We understand variance as ‘how much a variable varies’. (this won’t be tested in an exam but is intended to provide background) We understand variance as ‘how much a variable varies’. We can extend variance to two variables. We might be interested in how one variable varies with another. ? We can say that as distance (say 𝑥) increases, the cost (say 𝑦) increases. Thus the covariance of 𝑥 and 𝑦 is positive.

Covariance (this won’t be tested in an exam but is intended to provide background) Comment on the covariance between the variables. 𝑦 𝑦 𝑥 𝑥 ? As 𝑦 increases, 𝑥 doesn’t change very much. So the covariance is small (but positive) ? As 𝑥 increases, 𝑦 doesn’t change very much. So the covariance is small (but positive)

Covariance (this won’t be tested in an exam but is intended to provide background) Comment on the covariance between the variables. 𝑦 𝑦 𝑥 𝑥 As 𝑦 varies, 𝑥 doesn’t vary at all. So we say that variables are independent, and the covariance is 0. ? ? As 𝑥 increases, 𝑦 decreases. So the covariance is negative.

𝑆 𝑥𝑦 = 𝑥− 𝑥 (𝑦− 𝑦 ) =Σ𝑥𝑦− Σ𝑥 Σ𝑦 𝑛 𝑆 𝑦𝑦 Just as 𝑆 𝑥𝑥 gave a measure of how much a variable varies, 𝑆 𝑥𝑦 gives a measure of how two variables 𝑥 and 𝑦 vary with each other. 𝑆 𝑥𝑦 = 𝑥− 𝑥 (𝑦− 𝑦 ) =Σ𝑥𝑦− Σ𝑥 Σ𝑦 𝑛 ! Simplified formula ? Interesting things to note (but not examined): Just as 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥 = 𝑆 𝑥𝑥 𝑛 , 𝑪𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒙,𝒚 = 𝑺 𝒙𝒚 𝒏 How could 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑥) be expressed in terms of covariance? 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒙 =𝑪𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒙,𝒙 i.e. variance is the extent to which a variable varies with itself! ? ?

Have an intelligent guess based on the discussion above. Product Moment Correlation Coefficient (PMCC) We saw that 𝑆 𝑥𝑦 gives a measure of how two variables vary with each other. That sounds like correlation! Wouldn’t it be nice if we could somehow ‘normalise’ it so we end up with just a number between -1 and 1… ! 𝑟= 𝑆 𝑥𝑦 𝑆 𝑥𝑥 𝑆 𝑦𝑦 Have an intelligent guess based on the discussion above. ? We’ll interpret what that means in a second. 𝑟 is known as the Product Moment Correlation Coefficient (PMCC).

Interpreting the PMCC We’ve seen the PMCC varies between -1 and 1. means Perfect positive correlation. ? 𝑟=1 means No correlation ? 𝑟=0 means Perfect negative correlation. ? 𝑟=−1

Interpreting the PMCC Match the 𝑟 value to each scatter diagram. 𝑟=0.8 𝑟=−0.4 𝑟=0.96

Example Σ𝑥=191.1 Σ𝑦=229 Σ𝑥𝑦=7296.7 Σ 𝑥 2 =6105.39 Σ 𝑦 2 =8753 𝑛=6 ? ? Baby A B C D E F Head Circumference (𝒙) 31.1 33.3 30.0 31.5 35.0 30.2 Gestation Period (𝒚) 36 37 38 40 Σ𝑥=191.1 ? Σ𝑦=229 ? Σ𝑥𝑦=7296.7 ? Σ 𝑥 2 =6105.39 Σ 𝑦 2 =8753 𝑛=6 ? ? ? 𝑆 𝑥𝑥 =Σ 𝑥 2 − Σ𝑥 2 𝑛 =18.855 ? 𝑟= 𝑆 𝑥𝑦 𝑆 𝑥𝑥 𝑆 𝑦𝑦 =0.196 ? 𝑆 𝑦𝑦 =Σ 𝑦 2 − Σ𝑦 2 𝑛 =12.833 ? 𝑆 𝑥𝑦 =Σ𝑥𝑦− Σ𝑥 Σ𝑦 𝑛 =3.05 ?

Let’s do it on our calculators! Baby A B C D E F Head Circumference (𝒙) 31.1 33.3 30.0 31.5 35.0 30.2 Gestation Period (𝒚) 36 37 38 40 Put in Stats mode: MODE →2 Select 2 for 𝐴+𝐵𝑋 (i.e. calculations to do with linear relationships) Insert the data into your table. Use the arrow keys and ‘=‘ to add the values. Once done, press the 𝐴𝐶 button. This goes to normal calculation input. We want to insert 𝑟 into your calculation. Press 𝑆𝐻𝐼𝐹𝑇+1, and choose 5 for REGRESSION. Select 3 for 𝑟. 𝑟 is now in your calculation, so press =.

Test Your Understanding June 2013 Q1 ? ? ?

Further Practice Quite often the values are given to you in an exam. ? ? ? ? ? ? ? ?

Interpreting the PMCC ? ? “Interpret” vs “State” In general in Statistics exams, the word ‘interpret’ means “explain in context using non-statistical language”. Bob wants to establish if there’s a connection between waiting time (𝑥) at the post office and customer satisfaction (𝑦). He calculates 𝑟 as -0.81. Interpret this correlation coefficient. A bad answer (that may or may not be accepted): “Strong negative correlation” (this is stating the correlation not interpreting it) ? A good answer: “As the waiting time increases, the customer satisfaction tends to decrease”. ?

Exam Questions (on provided sheet) Q1 ? ? ?

(Before you go on to Q2) Effects of coding We know that 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥 = 𝑆 𝑥𝑥 𝑛 and 𝑟= 𝑆 𝑥𝑦 𝑆 𝑥𝑥 𝑆 𝑦𝑦 Therefore, if all our data values 𝑥 get k times bigger in size and values 𝑦 become 𝒒 times bigger, what happens to… (Recap) Variance of 𝑥: 𝑘 2 times as big 𝑆 𝑥𝑥 : 𝑆 𝑦𝑦 : 𝑞 2 times as big 𝑆 𝑥𝑦 : 𝑘𝑞 times as big 𝑟: Unaffected! ? Bro Exam Note: For the purposes of the S1 exam, you just need to remember that: ! Coding affects 𝑆 𝑥𝑥 in the same way that the variance is affected. i.e. If the variance becomes 9 times larger, so does 𝑆 𝑥𝑥 . PMCC is completely unaffected by (linear) coding. ? ? ? ?

Example 𝒙 1020 1032 1028 1034 1023 1038 𝒚 320 335 345 355 360 380 𝑝= 𝑥−1020 1 𝑞= 𝑦−300 5 𝒑 12 8 14 3 18 𝒒 4 7 9 11 16 We can now just find the PMCC of this new data set, and no further adjustment is needed. ? 𝑟=0.655

Exam Questions (on provided sheet) Q2 ? ? ?

Exam Questions (on provided sheet) Q3 ? ?

Exam Questions (on provided sheet) Q4 ? ? ?

Exam Questions (on provided sheet) Q5 ? ? ?

Exam Questions (on provided sheet) Q6 ? ? ?

Exam Questions (on provided sheet) Q7 ? ? ?

Exam Questions (on provided sheet) Q8 ? ? ? ? ?

Exam Questions (on provided sheet) Q9 ? ? ?

Limitations of correlation Often there’s a 3rd variable that explains two others, but the two variables themselves are not connected. Q1: The number of cars on the road has increased, and the number of DVD recorders bought has decreased. Is there a correlation between the two variables? ? Buying a car does not necessarily mean that you will not buy a DVD recorder, so we cannot say there is a correlation between the two. Q2: Over the past 10 years the memory capacity of personal computers has increased, and so has the average life expectancy of people in the western world. Is there are correlation between these two variables? ? The two are not connected, but both are due to scientific development over time (i.e. a third variable!)