Download presentation
Presentation is loading. Please wait.
1
7.1 Seeking Correlation LEARNING GOAL
Be able to define correlation, recognize positive and negative correlations on scatter diagrams, and understand the correlation coefficient as a measure of the strength of a correlation. Page 286
2
Definition A correlation exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable. Page 286 Slide
3
Here are a few examples of correlations:
There is a correlation between the variables amount of smoking and likelihood of lung cancer; that is heavier smokers are more likely to get lung cancer. There is a correlation between the variables height and weight for people; that is, taller people tend to weigh more than shorter people. There is a correlation between the variables demand for apples and price of apples; that is, demand tends to decrease as price increases. There is a correlation between practice time and skill among piano players; that is, those who practice more tend to be more skilled. Page 286 Slide
4
TIME OUT TO THINK Suppose there really were a gene that made people prone to both smoking and lung cancer. Explain why we would still find a strong correlation between smoking and lung cancer in that case, but would not be able to say that smoking causes lung cancer. Page 286 Slide
5
Scatter Diagrams Definition
A scatter diagram (or scatterplot) is a graph in which each point represents the values of two variables. Pages Slide
6
Page 287 Slide
7
1. We assign one variable to each axis and label the axis
The following procedure describes how to make the scatter diagram in Figure 7.1. 1. We assign one variable to each axis and label the axis with values that comfortably fit all the data. Sometimes the axis selection is arbitrary, but if we suspect that one variable depends on the other then we plot the explanatory variable on the horizontal axis and the response variable on the vertical axis. In this case, we expect the diamond price to depend at least in part on its weight; we therefore say that weight is the explanatory variable (because it helps explain the price) and price is the response variable (because it responds to changes in the explanatory variable). Figure 7.1 Pages Slide
8
2. For each diamond in Table 7.1, we plot a single point
The following procedure describes how to make the scatter diagram in Figure 7.1. We choose a range of 0 to 2.5 carats for the weight axis and $0 to $16,000 for the price axis. 2. For each diamond in Table 7.1, we plot a single point at the horizontal position corresponding to its weight and the vertical position corresponding to its price. For example, the point for Diamond 10 goes at a position of 1.11 carats on the horizontal axis and $3,670 on the vertical axis. The dashed lines on Figure 7.1 show how we locate this point. 3. (Optional) We can label some (or all) of the data points, as is done for Diamonds 10, 16, and 19 in Figure 7.1. 1. (cont.) Figure 7.1 Pages Slide
9
TIME OUT TO THINK Identify the points in Figure 7.1 (previous slide) that represent Diamonds 3, 7, and 23. Figure 7.1 is on page 288. Slide
10
EXAMPLE 1 Color and Price
Using the data in Table 7.1 (slide 6), create a scatter diagram to look for a correlation between a diamond’s color and price. Comment on the correlation. Solution: We expect price to depend on color, so we plot the explanatory variable color on the horizontal axis and the response variable price on the vertical axis in Figure 7.2. (You should check a few of the points against the data in Table 7.1.) The points appear much more scattered than in Figure 7.1. Nevertheless, you may notice a weak trend diagonally down- ward from the upper left toward the lower right. Pages Figure 7.2 Slide
11
EXAMPLE 1 Color and Price
Using the data in Table 7.1 (slide 6), create a scatter diagram to look for a correlation between a diamond’s color and price. Comment on the correlation. Solution: (cont.) This trend represents a weak correlation in which diamonds with more yellow color (higher numbers for color) are less expensive. This trend is consistent with what we would expect, because colorless diamonds appear to sparkle more and are generally considered more desirable. Pages Figure 7.2 Slide
12
TIME OUT TO THINK Thanks to a large bonus at work, you have a budget of $6,000 for a diamond ring. A dealer offers you the following two choices for that price. One diamond weighs 1.20 carats and has color = 4. The other weighs 1.18 carats and has color = 3. Assuming all other characteristics of the diamonds are equal, which would you choose? Why? Page 289 Slide
13
Types of Correlation (Note: detailed descriptions of these graphs appear in the next few slides.) Page 289 Figure 7.3 Types of correlation seen on scatter diagrams. Slide
14
Figure 7.3(a-c) Types of correlation seen on scatter diagrams.
Parts a to c of Figure 7.3 show positive correlations, in which the values of y tend to increase with increasing values of x. The correlation becomes stronger as we proceed from a to c. In fact, c shows a perfect positive correlation, in which all the points fall along a straight line. Pages Slide
15
Figure 7.3(d-f) Types of correlation seen on scatter diagrams.
Parts d to f of Figure 7.3 show negative correlations, in which the values of y tend to decrease with increasing values of x. The correlation becomes stronger as we proceed from d to f. In fact, f shows a perfect negative correlation, in which all the points fall along a straight line. Pages Slide
16
Figure 7.3(g) Types of correlation seen on scatter diagrams.
Part g of Figure 7.3 shows no correlation between x and y. In other words, values of x do not appear to be linked to values of y in any way. Pages Slide
17
Figure 7.3(h) Types of correlation seen on scatter diagrams.
Part h of Figure 7.3 shows a nonlinear relationship, in which x and y appear to be related but the relationship does not correspond to a straight line. (Linear means along a straight line, and nonlinear means not along a straight line.) Pages Slide
18
Types of Correlation Positive correlation: Both variables tend to increase (or decrease) together. Negative correlation: The two variables tend to change in opposite directions, with one increasing while the other decreases. No correlation: There is no apparent (linear) relationship between the two variables. Nonlinear relationship: The two variables are related, but the relationship results in a scatter diagram that does not follow a straight-line pattern. Page 290 Slide
19
Measuring the Strength of a Correlation
Statisticians measure the strength of a correlation with a number called the correlation coefficient, represented by the letter r. Page 291 Slide
20
Properties of the Correlation Coefficient, r
The correlation coefficient, r, is a measure of the strength of a correlation. Its value can range only from -1 to 1. If there is no correlation, the points do not follow any ascending or descending straightline pattern, and the value of r is close to 0. If there is a positive correlation, the correlation coefficient is positive (0 < r ≤ 1): Both variables increase together. A perfect positive correlation (in which all the points on a scatter diagram lie on an ascending straight line) has a correlation coefficient r = 1. Values of r close to 1 mean a strong positive correlation and positive values closer to 0 mean a weak positive correlation. Page 291 Slide
21
Properties of the Correlation Coefficient, r (cont,)
If there is a negative correlation, the correlation coefficient is negative (-1 ≤ r < 0): When one variable increases, the other decreases. A perfect negative correlation (in which all the points lie on a descending straight line) has a correlation coefficient r = -1. Values of r close to -1 mean a strong negative correlation and negative values closer to 0 mean a weak negative correlation. Page 291 Slide
22
EXAMPLE 3 U.S. Farm Size Figure 7.5 shows a scatter diagram
for the variables number of farms and mean farm size in the United States. Each dot represents data from a single year between 1950 and 2000; on this diagram, the earlier years generally are on the right and the later years on the left. Figure 7.5 Scatter diagram for farm size data. Source: U.S. Department of Agriculture. Pages Figure 7.3 is on page 289. Estimate the correlation coefficient by comparing this diagram to those in Figure 7.3 (slide 13) and discuss the underlying reasons for the correlation. Slide
23
EXAMPLE 3 U.S. Farm Size The scatter diagram shows Solution:
a strong negative correlation that most closely resembles the scatter diagram in Figure 7.3f, suggesting a correlation coefficient around r = -0.9. The correlation shows that as the number of farms decreases, the size of the remaining farms increases. This trend reflects a basic change in the nature of farming: Prior to 1950, most farms were small family farms. Over time, these small farms have been replaced by large farms owned by agribusiness corporations. Solution: Figure 7.5 Scatter diagram for farm size data. Source: U.S. Department of Agriculture. Pages Note that examples 4 and 5, on pages , provide further practice for estimating correlation coefficients. Slide
24
(They are reproduced on the next slide.)
TIME OUT TO THINK For further practice, visually estimate the correlation coefficients for the data for diamond weight and price (Figure 7.1) and diamond color and price (Figure 7.2). (They are reproduced on the next slide.) Page Figures 7.1 and 7.2 are on page 298. Slide
25
Figure 7.1 Scatter diagram for the price and weight data in Table 7.1.
Figures 7.1 and 7.2 are reproduced here for the previous slide. They can be found on page 298. Figure 7.2 Scatter diagram for the color and price data in Table 7.1. Slide
26
Calculating the Correlation Coefficient (Optional Section)
The formula for the (linear) correlation coefficient r can be expressed in several different ways that are all algebraically equivalent, which means that they produce the same value. The following expression has the advantage of relating more directly to the underlying rationale for r : Page 294 Slide
27
The following alternative formula for r has the advantage of simplifying calculations, so it is often used whenever manual calculations are necessary. The following formula is also easy to program into statistical software or calculators: First calculate each of the required sums, then substitute the values into the formula. Be sure to note that (Σx2) and (Σx)2 are not equal: (Σx2) tells you to first square all the values of the variable x and then add them; (Σx)2 tells you to add the x values first and then square this sum. In other words, perform the operation within the parentheses first. Similarly, (Σy2) and (Σy)2 are not the same. Page 294 Slide
28
The End Slide
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.