Download presentation
Presentation is loading. Please wait.
Published byAmos Davidson Modified over 9 years ago
1
17-1 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 17 Correlation Introductory Mathematics & Statistics
2
17-2 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Learning Objectives Understand correlation analysis and relationships between variables Draw and interpret a scatter diagram Understand and calculate the product-moment correlation coefficient Understand and calculate the rank correlation coefficient Recognise spurious correlation Test a correlation coefficient for significance
3
17-3 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.1 Introduction The consideration of whether there is any relationship or association between two variables is called correlation analysis The correlation coefficient is the index which defines the strength of association between two variables There are many instances where you may want to test whether there is a relationship between two variables –Examples the number of CDs sold in a district and the number of teenagers who live in that district the number of breakdowns of a certain type of machine and the age of the machine the level of education obtained by an individual and his or her income in later working life If there is a relationship between any two variables, it may be possible to predict the value of one of the variables from the value of the other
4
17-4 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.1 Introduction (cont…) Dependent and independent variables –To establish whether there is a relationship between two variables, an appropriate random sample must be taken and a measurement recorded of each of the two variables –Such data are said to be bivariate data, since they consist of two variables –Data may be written as ordered pairs, where they are expressed in a specific order for each individual, i.e. (first variable value, second variable value)
5
17-5 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.1 Introduction (cont…) –To predict the value of one variable from the value of the other (if a relationship exists), there is a basic rule –It is useful to label the variables according to the following The dependent variable is the one whose value is to be predicted. It is usually denoted by the letter y The independent variable is the one whose value is used to make the prediction. It is usually denoted by the letter x –In this form the ordered pairs resemble points on a graph –In doing so, it is possible to get a ‘feel’ for what the actual relationship between the variables may be, even before any calculations are undertaken
6
17-6 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.2 Scatter diagrams A scatter diagram or scatter plot is a display in which ordered pairs of measurements are plotted on a coordinate axes system The independent variable (x) is represented on the horizontal axis The dependent variable (y) is represented on the vertical axis The points representing the data are usually plotted either by dots or crosses
7
17-7 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.2 Scatter diagrams (cont…) Example –An insurance company manager is concerned about the health of female adults, since the company is prepared to give a reduced premium rate to those who have a certain level of fitness. In particular, he would like to investigate how their height is related to their weight, with a view to possibly using these measurements as a fitness criterion. –To this end, he selects a random sample of 12 adult females and measures both their height (in cm) and weight (in kg). The results are: Number123456789101112 Height (cm) 167168165 160156169166162158168 Weight (kg) 71.872.069.370.064.258.174.070.059.359.067.164.0
8
17-8 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.2 Scatter diagrams (cont…) Solution
9
17-9 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient The numerical measure of the degree of association between two variables is given by the product-moment correlation coefficient This index provides a quantitative measure of the extent to which the two variables are associated The value of the correlation coefficient is calculated from the bivariate data by means of a formula that involves the values of the data points The value of the correlation coefficient calculated from a sample is denoted by the letter r The value of the correlation coefficient calculated from a population is denoted by the Greek letter ρ
10
17-10 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) The value of r Where
11
17-11 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) S x = the standard deviation of the x-variable S y = the standard deviation of the y-variable n = the number of pairs of observations Alternative formulae for the calculation of r
12
17-12 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Note: –The value of r must always lie between –1 and +1 (both inclusive) –If r = +1, the two variables have perfect positive correlation. This means that on a scatter diagram, the points all lie on a straight line that has a positive slope –If r = –1, the two variables have perfect negative correlation. This means that on a scatter diagram, the points all lie on a straight line that has a negative slope –If the two variables are positively correlated, but not perfectly so, the coefficient lies between 0 and 1 –If the two variables are negatively correlated, but not perfectly so, the coefficient lies between –1 and 0 –If the two variables have no overall upward or downward trend whatsoever, the coefficient is 0
13
17-13 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Positive and negative correlation –It can be seen that, if correlation exists, it can be in one of two directions: positive or negative y –If two variables x and y are positively correlated, this means that: large values of x are associated with large values of y, and small values of x are associated with small values of y –If two variables x and y are negatively correlated, this means that: large values of x are associated with small values of y, and small values of x are associated with large values of y
14
17-14 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Positive correlation Examples of scatter diagrams
15
17-15 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Negative correlation Examples of scatter diagrams
16
17-16 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Calculation of r Example Calculate the correlation coefficient for the data in the previous example Solution In this case, we denote height by x and weight by y n = 12
17
17-17 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Solution (cont…)
18
17-18 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.4 The Spearman rank correlation coefficient An alternative measure of the degree of association between two variables is the rank correlation coefficient This coefficient does not strictly measure the degree of association between the actual observations, but rather the association between the ranks of the observations Where: d = difference between corresponding pairs of rankings n = number of pairs of observations
19
17-19 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.4 The Spearman rank correlation coefficient (cont..) Being a correlation coefficient, r s has the following properties: –r s = +1 for perfect positive correlation of the ranks, that is when the x-rank = the y-rank in each case, and hence Σd 2 = 0 –r s = –1 for perfect negative correlation of the ranks, that is when they run in precisely opposite order to each other, and hence –All other values of r s lie between –1 and +1 The subscript s in r s stands for ‘Spearman’ and is used to distinguish it from the Pearson product-moment correlation coefficient
20
17-20 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.5 Spurious correlation If two variables are significantly correlated, this does not imply that one must be the cause of the other The degree of association is not directly proportional to the magnitude of the correlation coefficient The correlation coefficient is subject to variations in sampling Correlation between two variables that is really induced by other external variables is referred to as spurious correlation or false correlation
21
17-21 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.6 Interpretation of the correlation coefficient Testing a value of r –The actual test involves the calculation of a t-statistic, which can be found from the values of r and n –The steps are: 1. Assume that the two variables are uncorrelated 2. Calculate the correlation coefficient (r) 3. Calculate the value of the t-statistic: 4. Calculate the value of y, where v = n–2 5. Use Table 7 to find the critical value. This is the value in the 0.05 column If | t| < critical value, there is no correlation between the two variables If |t| > critical value, there is correlation between the two variables The risk that we are incorrect in our conclusion is 5%
22
17-22 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.6 Interpretation of the correlation coefficient (cont…) Testing a value of r s for significance –The value of r s is tested for significance using a different procedure from that for r. It is outlined below: 1. Assume that the two sets of rankings are uncorrelated 2. Find the critical value of rs for the given value of n, using Table 8 3. If | r s | > critical value, reject the assumption in Step 1; a significant relationship does exist between the two sets of rankings 4. If | r s | < critical value, accept the assumption in Step 1; a significant relationship does not exist between the two sets of rankings
23
17-23 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.7 Unusual claimed correlations There is no shortage of research in which some very odd relations are claimed to be found Some include the following 1. Age at which a mother gives birth and life expectancy of the child (the chance of living to 100 is doubled if the child was born to a woman aged under 25 years) 2. Attractiveness of a couple and the sex of their first child (physically attractive couples are 36% more likely than an unattractive couple to produce a girl as their first child) 3. Height of children and mental development (short children perform more poorly on intelligence tests than tall ones) 4. Blood flow to the heart and type of movie watched (watching comedy boosts blood flow; sad or distressing movies lower it)
24
17-24 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Summary We understood correlation analysis and relationships between variables We drew and interpreted a scatter diagram We understood and calculated the product-moment correlation coefficient We understood and calculated the rank correlation coefficient We recognised spurious correlation We tested a correlation coefficient for significance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.