Correlation.

Slides:



Advertisements
Similar presentations
Linear regression and correlation
Advertisements

Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Describing the Relation Between Two Variables
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Lecture 3-2 Summarizing Relationships among variables ©
Correlation Scatter Plots Correlation Coefficients Significance Test.
Statistical Analysis to show Relationship Strength.
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?
Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph.
Correlation This Chapter is on Correlation We will look at patterns in data on a scatter graph We will be looking at how to calculate the variance and.
Basic Statistics Correlation Var Relationships Associations.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Section 5.1: Correlation. Correlation Coefficient A quantitative assessment of the strength of a relationship between the x and y values in a set of (x,y)
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
.  Relationship between two sets of data  The word Correlation is made of Co- (meaning "together"), and Relation  Correlation is Positive when the.
CORRELATION ANALYSIS.
Correlation Assumptions: You can plot a scatter graph You know what positive, negative and no correlation look like on a scatter graph.
Correlation. 2  In this topic, we will look at patterns in data on a scatter graph.  We will see how to numerically measure the strength of correlation.
Principles of Biostatistics Chapter 17 Correlation 宇传华 网上免费统计资源(八)
CORRELATION. Correlation  If two variables vary in such a way that movement in one is accompanied by the movement in other, the variables are said to.
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Chapter 2 Linear regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Department of Mathematics
Simple Linear Correlation
Correlation S1 Maths with Liz.
Regression and Correlation
Is there a relationship between the lengths of body parts?
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Correlation and Regression
Scatterplots A way of displaying numeric data
Correlation and Regression
Section 13.7 Linear Correlation and Regression
S1 :: Chapter 6 Correlation
CHAPTER 10 Correlation and Regression (Objectives)
10: Leisure at an International Scale: Sport
Descriptive Analysis and Presentation of Bivariate Data
Theme 7 Correlation.
Basic Practice of Statistics - 5th Edition
Correlation and Regression
Chapter 2 Looking at Data— Relationships
Lecture Notes The Relation between Two Variables Q Q
BUS173: Applied Statistics
Correlation and Regression
CORRELATION ANALYSIS.
Introduction to Probability and Statistics Thirteenth Edition
Chapter 3D Chapter 3, part D Fall 2000.
Using Data to Analyze Trends: Spearman’s Rank
Correlation and Regression
M248: Analyzing data Block D UNIT D3 Related variables.
Coefficient of Correlation
Correlation and the Pearson r
Objectives Identify linear functions and linear equations.
Lecture # 2 MATHEMATICAL STATISTICS
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Unit 2 Quantitative Interpretation of Correlation
Scatter Graphs Spearman’s Rank correlation coefficient
Correlation & Regression
COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.
Business and Economics 7th Edition
Honors Statistics Review Chapters 7 & 8
EE, NCKU Tien-Hao Chang (Darby Chang)
Correlation and Prediction
Presentation transcript:

Correlation

Correlation This Chapter is on Correlation We will look at patterns in data on a scatter graph We will be looking at how to calculate the variance and co-variance of variables We will see how to numerically measure the strength of correlation between two variables

Correlation Positive Negative None 6A Scatter Graphs Scatter Graphs are a way of representing 2 sets of data. It is then possible to see whether they are related. Positive Correlation  As one variable increases, so does the other Negative Correlation  As one variable increases, the other decreases No Correlation  There seems to be no pattern linking the two variables Positive Negative None 6A

Correlation Scatter Graphs In the study of a city, the population density, in people/hectare, and the distance from the city centre, in km, was investigated by choosing sample areas. The results are as follows: Plot a scatter graph and describe the correlation. Interpret what the correlation means. 50 40 Area A B C D E Distance 0.6 3.8 2.4 3.0 2.0 Pop. Density 50 22 14 20 33 Pop. Density (people/hectare) 30 20 10 Area F G H I J Distance 1.5 1.8 3.4 4.0 0.9 Pop. Density 47 25 8 16 38 1 2 3 4 Distance from centre (km) The correlation is negative, which means that as we get further from the city centre, the population density decreases.

Teachings for Exercise 6B and 6C

Correlation Variability of Bivariate Data We learnt in chapter 3 that: In Correlation: Similarly for y: And you can also calculate the Co-variance of both variables (Although remember that this formula changed to make it easier to use) ‘How x varies’ ‘How y varies’ ‘How x and y vary together’ 6B/C

Correlation Variability of Bivariate Data Like in chapter 3, we can use a formula which will make calculations easier BUT: 6B/C

Correlation Variability of Bivariate Data Multiply both sides by ‘n’ The easier formula for variance from chapter 3 For the second fraction, square the top and bottom separately Multiplying both fractions by ‘n’ will cancel a ‘divide by n’ from each of them 6B/C

Correlation Variability of Bivariate Data These are the formulae for Sxx, Syy and Sxy. You are given these in the formula booklet. You do not need to know how to derive them (like we just did!) 6B/C

Correlation Variability of Bivariate Data 6B/C Calculate Sxx, Syy and Sxy, based on the following information. 6B/C

Correlation Variability of Bivariate Data 6B/C The following table shows babies heads’ circumferences (cm) and the gestation period (weeks) for 6 new born babies. Calculate Sxx, Syy and Sxy. We need Baby A B C D E F Head size (x) 31 33 30 31 35 30 Gestation period (y) 36 37 38 38 40 40 x2 961 1089 900 961 1225 900 y2 1296 1369 1444 1444 1600 1600 xy 1116 1221 1140 1178 1400 1200 6B/C

Correlation Variability of Bivariate Data 6B/C The following table shows babies heads’ circumferences (cm) and the gestation period (weeks) for 6 new born babies. Calculate Sxx, Syy and Sxy. We need 6B/C

Correlation Product Moment Correlation Coefficient We can test the correlation of data by calculating the Product Moment Correlation Coefficient. This uses Sxx, Syy and Sxy. The value of this number tells you what the correlation is and how strong it is. The closer to 1, the stronger the positive correlation. The same applies for -1 and negative correlation. A value close to 0 implies no linear correlation. Negative Correlation No Linear Correlation Positive Correlation -1 1 6B/C

There is positive correlation, as x increases, y does as well. Product Moment Correlation Coefficient Given the following data, calculate the Product Moment Correlation Coefficient. There is positive correlation, as x increases, y does as well. 6B/C

Correlation Limitations of the Product Moment Correlation Coefficient Sometimes it may indicate Correlation between unrelated variables  Cars on a particular street have increased, as have the sales of DVDs in town  The PMCC would indicate positive correlation where the two are most likely not linked  The speed of computers has increased, as has life expectancy amongst people  These are not directly linked, but are both due to scientific developments 6B/C

Correlation Using Coding with the PMCC Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 x2 10404 10609 10404 10609 10816 10609 y2 102400 112225 119025 126025 129600 144400 xy 32640 34505 35190 36565 37440 39140 6D

Correlation Using Coding with the PMCC Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 x2 10404 10609 10404 10609 10816 10609 y2 102400 112225 119025 126025 129600 144400 xy 32640 34505 35190 36565 37440 39140 6D

Correlation Using Coding with the PMCC Calculating the PMCC from this table, using coding. x 102 103 102 103 104 103 y 320 335 345 355 360 380 p 2 3 2 3 4 3 q 4 7 9 11 12 16 p2 4 9 4 9 16 9 q2 16 49 81 121 144 256 pq 8 21 18 33 48 48 6D

So coding will not affect the PMCC! Correlation Using Coding with the PMCC Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 p 2 3 2 3 4 3 q 4 7 9 11 12 16 p2 4 9 4 9 16 9 q2 16 49 81 121 144 256 pq 8 21 18 33 48 48 So coding will not affect the PMCC! 6D

Summary We have looked at plotting scatter graphs We have looked at calculating measures of variance, Sxx, Syy and Sxy We have also seen types of correlation and how to recognise them on a graph We have calculated the Product Moment Correlation Coefficient, and interpreted it. It is a numerical measure of correlation.

Spearman’s Rank Correlation CEV

Correlation So far, we have considered the relationship of bivariate data which can be plotted directly, e.g. Results of a set of pupils in two tests, price vs age of cars etc. Sometimes we do not have data which suits this ideal, but we might still want to compare two sets of related data.

Example Consider a manufacturer experimenting with different flavours of a drink. Two tasters put eight flavours, labelled A-H, in order of preference (starting with their favourite). The results are given in the table below: Taster 1 D C G B A E F H Taster 2

Example Taster 1 D C G B A E F H Taster 2 Unsurprisingly the two tasters do not agree exactly. However, there is clearly some consensus between them on the more pleasant flavours (C and D), and the least pleasant (e.g. F). It would be useful to measure how well the tasters agree. This is where we use the idea of “ranking” the flavours for each taster, according to where it appears in their list.

Example Taster 1 D C G B A E F H Taster 2 Flavour Rank for Taster 1, x Rank for Taster 2, y A B C D E F G H

Example Taster 1 D C G B A E F H Taster 2 Flavour Rank for Taster 1, x Rank for Taster 2, y A 5 B 4 C 2 D 1 E 6 F 7 G 3 H 8

Example Taster 1 D C G B A E F H Taster 2 Flavour Rank for Taster 1, x Rank for Taster 2, y A 5 7 B 4 3 C 2 1 D E 6 F 8 G H

Example If we plotted the scatter diagram for this information, we would see some positive correlation. It would seem reasonable to measure the degree of agreement by calculating the PMCC for the two sets of ranks. However, to emphasise that x and y are ranks rather than continuous variables, the coefficient is given a new symbol, rs. We can calculate rs using the same formula as for the PMCC. However, there is a simpler formula...

Example Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 B 4 3 C 2 1 D E 6 F 8 G H Spearman’s Rank Correlation Coefficient is: where di = xi – yi for the ith item

Example Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 -2 B 4 3 1 C 2 D -1 E 6 F 8 G H Spearman’s Rank Correlation Coefficient is: where di = xi – yi for the ith item

Note: Σdi = 0 always! Example Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 -2 4 B 3 1 C 2 D -1 E 6 F 8 G H 9 Spearman’s Rank Correlation Coefficient is: where di = xi – yi for the ith item

Example Note: Σdi = 0 always! Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 -2 4 B 3 1 C 2 D -1 E 6 F 8 G H 9

Interpreting rs When interpreting Spearman’s Rank Correlation Coefficient, it is important to remember that this value is the PMCC of the ranks. If the rank orders for the two tasters had been identical, then the points on the scatter diagram would fall exactly on the line y = x, and rs would equal 1. If one rank order was the exact reverse of the other, similarly, then all the points would fall on y = -x and rs = -1. If there was little agreement between the two rank orders, rs would be close to 0.

Comparing PMCC and Spearman’s For some sets of data it would be reasonable to calculate both r and rs, for example the table below. r = 0.817 rs = 0.745 Student A B C D E F G H I J Pure 42 21 25 32 34 27 23 40 20 16 Stats 41 36 29 35 24 22 47 30 Student A B C D E F G H I J Pure Rank 1 8 6 4 3 5 7 2 9 10 Stats Rank

Comparing PMCC and Spearman’s These two values are not the same, because they are measuring different things. r is measuring how close the points on the scatter diagram are to a straight line, which is the strength of the linear relationship between x and y, whereas rs is measuring the tendency for y to increase as x increases, not necessarily in a linear way. NOTE: For negative correlation, rs measures the tendency for y to decrease as x increases.