Download presentation
Presentation is loading. Please wait.
Published byNoreen Parsons Modified over 8 years ago
1
Copyright © 2009 Pearson Education, Inc. 7.1 Seeking Correlation LEARNING GOAL Be able to define correlation, recognize positive and negative correlations on scatter diagrams, and understand the correlation coefficient as a measure of the strength of a correlation.
2
Slide 7.1- 2 Copyright © 2009 Pearson Education, Inc. Definition A correlation exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable.
3
Slide 7.1- 3 Copyright © 2009 Pearson Education, Inc. Here are a few examples of correlations: There is a correlation between the variables amount of smoking and likelihood of lung cancer; that is heavier smokers are more likely to get lung cancer. There is a correlation between the variables height and weight for people; that is, taller people tend to weigh more than shorter people. There is a correlation between the variables demand for apples and price of apples; that is, demand tends to decrease as price increases. There is a correlation between practice time and skill among piano players; that is, those who practice more tend to be more skilled.
4
Slide 7.1- 4 Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Suppose there really were a gene that made people prone to both smoking and lung cancer. Explain why we would still find a strong correlation between smoking and lung cancer in that case, but would not be able to say that smoking causes lung cancer.
5
Slide 7.1- 5 Copyright © 2009 Pearson Education, Inc. Scatter Diagrams Definition A scatter diagram (or scatterplot) is a graph in which each point represents the values of two variables.
6
Slide 7.1- 6 Copyright © 2009 Pearson Education, Inc.
7
1. We assign one variable to each axis and label the axis with values that comfortably fit all the data. Sometimes the axis selection is arbitrary, but if we suspect that one variable depends on the other then we plot the explanatory variable on the horizontal axis and the response variable on the vertical axis. In this case, we expect the diamond price to depend at least in part on its weight; we therefore say that weight is the explanatory variable (because it helps explain the price) and price is the response variable (because it responds to changes in the explanatory variable). Slide 7.1- 7 Copyright © 2009 Pearson Education, Inc. The following procedure describes how to make the scatter diagram in Figure 7.1. Figure 7.1
8
We choose a range of 0 to 2.5 carats for the weight axis and $0 to $16,000 for the price axis. 2. For each diamond in Table 7.1, we plot a single point at the horizontal position corresponding to its weight and the vertical position corresponding to its price. For example, the point for Diamond 10 goes at a position of 1.11 carats on the horizontal axis and $3,670 on the vertical axis. The dashed lines on Figure 7.1 show how we locate this point. 3. (Optional) We can label some (or all) of the data points, as is done for Diamonds 10, 16, and 19 in Figure 7.1. Slide 7.1- 8 Copyright © 2009 Pearson Education, Inc. Figure 7.1 The following procedure describes how to make the scatter diagram in Figure 7.1. 1. (cont.)
9
Slide 7.1- 9 Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Identify the points in Figure 7.1 (previous slide) that represent Diamonds 3, 7, and 23.
10
Slide 7.1- 10 Copyright © 2009 Pearson Education, Inc. Using the data in Table 7.1 (slide 6), create a scatter diagram to look for a correlation between a diamond’s color and price. Comment on the correlation. Solution: We expect price to depend on color, so we plot the explanatory variable color on the horizontal axis and the response variable price on the vertical axis in Figure 7.2. (You should check a few of the points against the data in Table 7.1.) The points appear much more scattered than in Figure 7.1. Nevertheless, you may notice a weak trend diagonally down- ward from the upper left toward the lower right. EXAMPLE 1 Color and Price Figure 7.2
11
Slide 7.1- 11 Copyright © 2009 Pearson Education, Inc. Using the data in Table 7.1 (slide 6), create a scatter diagram to look for a correlation between a diamond’s color and price. Comment on the correlation. Solution: (cont.) This trend represents a weak correlation in which diamonds with more yellow color (higher numbers for color) are less expensive. This trend is consistent with what we would expect, because colorless diamonds appear to sparkle more and are generally considered more desirable. EXAMPLE 1 Color and Price Figure 7.2
12
Slide 7.1- 12 Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Thanks to a large bonus at work, you have a budget of $6,000 for a diamond ring. A dealer offers you the following two choices for that price. One diamond weighs 1.20 carats and has color = 4. The other weighs 1.18 carats and has color = 3. Assuming all other characteristics of the diamonds are equal, which would you choose? Why?
13
Copyright © 2009 Pearson Education, Inc. Slide 7.1- 13 Figure 7.3 Types of correlation seen on scatter diagrams. Types of Correlation (Note: detailed descriptions of these graphs appear in the next few slides.)
14
Slide 7.1- 14 Copyright © 2009 Pearson Education, Inc. Figure 7.3(a-c) Types of correlation seen on scatter diagrams. Parts a to c of Figure 7.3 show positive correlations, in which the values of y tend to increase with increasing values of x. The correlation becomes stronger as we proceed from a to c. In fact, c shows a perfect positive correlation, in which all the points fall along a straight line.
15
Slide 7.1- 15 Copyright © 2009 Pearson Education, Inc. Parts d to f of Figure 7.3 show negative correlations, in which the values of y tend to decrease with increasing values of x. The correlation becomes stronger as we proceed from d to f. In fact, f shows a perfect negative correlation, in which all the points fall along a straight line. Figure 7.3(d-f) Types of correlation seen on scatter diagrams.
16
Slide 7.1- 16 Copyright © 2009 Pearson Education, Inc. Part g of Figure 7.3 shows no correlation between x and y. In other words, values of x do not appear to be linked to values of y in any way. Figure 7.3(g) Types of correlation seen on scatter diagrams.
17
Slide 7.1- 17 Copyright © 2009 Pearson Education, Inc. Part h of Figure 7.3 shows a nonlinear relationship, in which x and y appear to be related but the relationship does not correspond to a straight line. (Linear means along a straight line, and nonlinear means not along a straight line.) Figure 7.3(h) Types of correlation seen on scatter diagrams.
18
Slide 7.1- 18 Copyright © 2009 Pearson Education, Inc. Types of Correlation Positive correlation: Both variables tend to increase (or decrease) together. Negative correlation: The two variables tend to change in opposite directions, with one increasing while the other decreases. No correlation: There is no apparent (linear) relationship between the two variables. Nonlinear relationship: The two variables are related, but the relationship results in a scatter diagram that does not follow a straight-line pattern.
19
Slide 7.1- 19 Copyright © 2009 Pearson Education, Inc. Measuring the Strength of a Correlation Statisticians measure the strength of a correlation with a number called the correlation coefficient, represented by the letter r.
20
Slide 7.1- 20 Copyright © 2009 Pearson Education, Inc. Properties of the Correlation Coefficient, r The correlation coefficient, r, is a measure of the strength of a correlation. Its value can range only from -1 to 1. If there is no correlation, the points do not follow any ascending or descending straightline pattern, and the value of r is close to 0. If there is a positive correlation, the correlation coefficient is positive (0 < r ≤ 1): Both variables increase together. A perfect positive correlation (in which all the points on a scatter diagram lie on an ascending straight line) has a correlation coefficient r = 1. Values of r close to 1 mean a strong positive correlation and positive values closer to 0 mean a weak positive correlation.
21
Slide 7.1- 21 Copyright © 2009 Pearson Education, Inc. Properties of the Correlation Coefficient, r (cont,) If there is a negative correlation, the correlation coefficient is negative (-1 ≤ r < 0): When one variable increases, the other decreases. A perfect negative correlation (in which all the points lie on a descending straight line) has a correlation coefficient r = -1. Values of r close to -1 mean a strong negative correlation and negative values closer to 0 mean a weak negative correlation.
22
Slide 7.1- 22 Copyright © 2009 Pearson Education, Inc. Figure 7.5 shows a scatter diagram for the variables number of farms and mean farm size in the United States. Each dot represents data from a single year between 1950 and 2000; on this diagram, the earlier years generally are on the right and the later years on the left. EXAMPLE 3 U.S. Farm Size Figure 7.5 Scatter diagram for farm size data. Source: U.S. Department of Agriculture. Estimate the correlation coefficient by comparing this diagram to those in Figure 7.3 (slide 13) and discuss the underlying reasons for the correlation.
23
Slide 7.1- 23 Copyright © 2009 Pearson Education, Inc. The scatter diagram shows a strong negative correlation that most closely resembles the scatter diagram in Figure 7.3f, suggesting a correlation coefficient around r = -0.9. The correlation shows that as the number of farms decreases, the size of the remaining farms increases. This trend reflects a basic change in the nature of farming: Prior to 1950, most farms were small family farms. Over time, these small farms have been replaced by large farms owned by agribusiness corporations. EXAMPLE 3 U.S. Farm Size Solution: Figure 7.5 Scatter diagram for farm size data. Source: U.S. Department of Agriculture.
24
Slide 7.1- 24 Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK For further practice, visually estimate the correlation coefficients for the data for diamond weight and price (Figure 7.1) and diamond color and price (Figure 7.2). (They are reproduced on the next slide.)
25
Slide 7.1- 25 Copyright © 2009 Pearson Education, Inc. Figure 7.1 Scatter diagram for the price and weight data in Table 7.1. Figure 7.2 Scatter diagram for the color and price data in Table 7.1.
26
Slide 7.1- 26 Copyright © 2009 Pearson Education, Inc. Calculating the Correlation Coefficient (Optional Section) The formula for the (linear) correlation coefficient r can be expressed in several different ways that are all algebraically equivalent, which means that they produce the same value. The following expression has the advantage of relating more directly to the underlying rationale for r :
27
Slide 7.1- 27 Copyright © 2009 Pearson Education, Inc. The following alternative formula for r has the advantage of simplifying calculations, so it is often used whenever manual calculations are necessary. The following formula is also easy to program into statistical software or calculators: First calculate each of the required sums, then substitute the values into the formula. Be sure to note that (Σx 2 ) and (Σx) 2 are not equal: (Σx 2 ) tells you to first square all the values of the variable x and then add them; (Σx) 2 tells you to add the x values first and then square this sum. In other words, perform the operation within the parentheses first. Similarly, (Σy 2 ) and (Σy) 2 are not the same.
28
Slide 7.1- 28 Copyright © 2009 Pearson Education, Inc. The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.