Presentation is loading. Please wait.

Presentation is loading. Please wait.

17-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 17 Correlation.

Similar presentations


Presentation on theme: "17-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 17 Correlation."— Presentation transcript:

1 17-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 17 Correlation Introductory Mathematics & Statistics

2 17-2 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Learning Objectives Understand correlation analysis and relationships between variables Draw and interpret a scatter diagram Understand and calculate the product-moment correlation coefficient Understand and calculate the rank correlation coefficient Recognise spurious correlation Test a correlation coefficient for significance

3 17-3 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.1 Introduction The consideration of whether there is any relationship or association between two variables is called correlation analysis The correlation coefficient is the index which defines the strength of association between two variables There are many instances where you may want to test whether there is a relationship between two variables –Examples  the number of CDs sold in a district and the number of teenagers who live in that district  the number of breakdowns of a certain type of machine and the age of the machine  the level of education obtained by an individual and his or her income in later working life If there is a relationship between any two variables, it may be possible to predict the value of one of the variables from the value of the other

4 17-4 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.1 Introduction (cont…) Dependent and independent variables –To establish whether there is a relationship between two variables, an appropriate random sample must be taken and a measurement recorded of each of the two variables –Such data are said to be bivariate data, since they consist of two variables –Data may be written as ordered pairs, where they are expressed in a specific order for each individual, i.e. (first variable value, second variable value)

5 17-5 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.1 Introduction (cont…) –To predict the value of one variable from the value of the other (if a relationship exists), there is a basic rule –It is useful to label the variables according to the following  The dependent variable is the one whose value is to be predicted. It is usually denoted by the letter y  The independent variable is the one whose value is used to make the prediction. It is usually denoted by the letter x –In this form the ordered pairs resemble points on a graph –In doing so, it is possible to get a ‘feel’ for what the actual relationship between the variables may be, even before any calculations are undertaken

6 17-6 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.2 Scatter diagrams A scatter diagram or scatter plot is a display in which ordered pairs of measurements are plotted on a coordinate axes system The independent variable (x) is represented on the horizontal axis The dependent variable (y) is represented on the vertical axis The points representing the data are usually plotted either by dots or crosses

7 17-7 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.2 Scatter diagrams (cont…) Example –An insurance company manager is concerned about the health of female adults, since the company is prepared to give a reduced premium rate to those who have a certain level of fitness. In particular, he would like to investigate how their height is related to their weight, with a view to possibly using these measurements as a fitness criterion. –To this end, he selects a random sample of 12 adult females and measures both their height (in cm) and weight (in kg). The results are: Number123456789101112 Height (cm) 167168165 160156169166162158168 Weight (kg) 71.872.069.370.064.258.174.070.059.359.067.164.0

8 17-8 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.2 Scatter diagrams (cont…) Solution

9 17-9 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient The numerical measure of the degree of association between two variables is given by the product-moment correlation coefficient This index provides a quantitative measure of the extent to which the two variables are associated The value of the correlation coefficient is calculated from the bivariate data by means of a formula that involves the values of the data points The value of the correlation coefficient calculated from a sample is denoted by the letter r The value of the correlation coefficient calculated from a population is denoted by the Greek letter ρ

10 17-10 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) The value of r Where

11 17-11 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) S x = the standard deviation of the x-variable S y = the standard deviation of the y-variable n = the number of pairs of observations Alternative formulae for the calculation of r

12 17-12 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Note: –The value of r must always lie between –1 and +1 (both inclusive) –If r = +1, the two variables have perfect positive correlation. This means that on a scatter diagram, the points all lie on a straight line that has a positive slope –If r = –1, the two variables have perfect negative correlation. This means that on a scatter diagram, the points all lie on a straight line that has a negative slope –If the two variables are positively correlated, but not perfectly so, the coefficient lies between 0 and 1 –If the two variables are negatively correlated, but not perfectly so, the coefficient lies between –1 and 0 –If the two variables have no overall upward or downward trend whatsoever, the coefficient is 0

13 17-13 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Positive and negative correlation –It can be seen that, if correlation exists, it can be in one of two directions: positive or negative y –If two variables x and y are positively correlated, this means that:  large values of x are associated with large values of y, and  small values of x are associated with small values of y –If two variables x and y are negatively correlated, this means that:  large values of x are associated with small values of y, and  small values of x are associated with large values of y

14 17-14 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Positive correlation Examples of scatter diagrams

15 17-15 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Negative correlation Examples of scatter diagrams

16 17-16 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Calculation of r Example Calculate the correlation coefficient for the data in the previous example Solution In this case, we denote height by x and weight by y n = 12

17 17-17 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.3 The Pearson product-moment correlation coefficient (cont…) Solution (cont…)

18 17-18 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.4 The Spearman rank correlation coefficient An alternative measure of the degree of association between two variables is the rank correlation coefficient This coefficient does not strictly measure the degree of association between the actual observations, but rather the association between the ranks of the observations Where: d = difference between corresponding pairs of rankings n = number of pairs of observations

19 17-19 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.4 The Spearman rank correlation coefficient (cont..) Being a correlation coefficient, r s has the following properties: –r s = +1 for perfect positive correlation of the ranks, that is when the x-rank = the y-rank in each case, and hence Σd 2 = 0 –r s = –1 for perfect negative correlation of the ranks, that is when they run in precisely opposite order to each other, and hence –All other values of r s lie between –1 and +1 The subscript s in r s stands for ‘Spearman’ and is used to distinguish it from the Pearson product-moment correlation coefficient

20 17-20 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.5 Spurious correlation If two variables are significantly correlated, this does not imply that one must be the cause of the other The degree of association is not directly proportional to the magnitude of the correlation coefficient The correlation coefficient is subject to variations in sampling Correlation between two variables that is really induced by other external variables is referred to as spurious correlation or false correlation

21 17-21 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.6 Interpretation of the correlation coefficient Testing a value of r –The actual test involves the calculation of a t-statistic, which can be found from the values of r and n –The steps are: 1. Assume that the two variables are uncorrelated 2. Calculate the correlation coefficient (r) 3. Calculate the value of the t-statistic: 4. Calculate the value of y, where v = n–2 5. Use Table 7 to find the critical value. This is the value in the 0.05 column If | t| < critical value, there is no correlation between the two variables If |t| > critical value, there is correlation between the two variables The risk that we are incorrect in our conclusion is 5%

22 17-22 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.6 Interpretation of the correlation coefficient (cont…) Testing a value of r s for significance –The value of r s is tested for significance using a different procedure from that for r. It is outlined below: 1. Assume that the two sets of rankings are uncorrelated 2. Find the critical value of rs for the given value of n, using Table 8 3. If | r s | > critical value, reject the assumption in Step 1; a significant relationship does exist between the two sets of rankings 4. If | r s | < critical value, accept the assumption in Step 1; a significant relationship does not exist between the two sets of rankings

23 17-23 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 17.7 Unusual claimed correlations There is no shortage of research in which some very odd relations are claimed to be found Some include the following 1. Age at which a mother gives birth and life expectancy of the child (the chance of living to 100 is doubled if the child was born to a woman aged under 25 years) 2. Attractiveness of a couple and the sex of their first child (physically attractive couples are 36% more likely than an unattractive couple to produce a girl as their first child) 3. Height of children and mental development (short children perform more poorly on intelligence tests than tall ones) 4. Blood flow to the heart and type of movie watched (watching comedy boosts blood flow; sad or distressing movies lower it)

24 17-24 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Summary We understood correlation analysis and relationships between variables We drew and interpreted a scatter diagram We understood and calculated the product-moment correlation coefficient We understood and calculated the rank correlation coefficient We recognised spurious correlation We tested a correlation coefficient for significance


Download ppt "17-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 17 Correlation."

Similar presentations


Ads by Google