7.1 Seeking Correlation LEARNING GOAL

Slides:



Advertisements
Similar presentations
Section 7.1 ~ Seeking Correlation
Advertisements

Describing the Relation Between Two Variables
Statistics for the Social Sciences Psychology 340 Fall 2006 Relationships between variables.
Linear Regression Analysis
Descriptive Methods in Regression and Correlation
Correlation By Dr.Muthupandi,. Correlation Correlation is a statistical technique which can show whether and how strongly pairs of variables are related.
Correlation and regression 1: Correlation Coefficient
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Scatterplots are used to investigate and describe the relationship between two numerical variables When constructing a scatterplot it is conventional to.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
7.1 Seeking Correlation LEARNING GOAL
Copyright © 2009 Pearson Education, Inc. 7.1 Seeking Correlation LEARNING GOAL Be able to define correlation, recognize positive and negative correlations.
Copyright © 2011 Pearson Education, Inc. Statistical Reasoning 1 web 39. Weather Maps 40. Cancer Cure 1 world 41. News Graphics 42. Geographical Data.
Unit 5E Correlation and Causality. CORRELATION Heights and weights Study Time and Test Score Available Gasoline and Price of Gasoline A correlation exists.
Welcome to the Unit 5 Seminar Kristin Webster
Correlation & Forecasting
Scatterplots Chapter 6.1 Notes.
Is there a relationship between the lengths of body parts?
Chapter 3: Describing Relationships
Topic 10 - Linear Regression
CHAPTER 7 LINEAR RELATIONSHIPS
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
7.3 Best-Fit Lines and Prediction
Statistics for the Social Sciences
Descriptive Analysis and Presentation of Bivariate Data
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 3: Describing Relationships
Chapter 2 Looking at Data— Relationships
Algebra 1 Section 6.6.
Lecture Notes The Relation between Two Variables Q Q
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
7.3 Best-Fit Lines and Prediction
Correlation and Causality
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Statistical Reasoning
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Scatterplots and Correlation.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Examining Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
AP Stats Agenda Text book swap 2nd edition to 3rd Frappy – YAY
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Dr. Fowler  AFM  Unit 8-5 Linear Correlation
Honors Statistics Review Chapters 7 & 8
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

7.1 Seeking Correlation LEARNING GOAL Be able to define correlation, recognize positive and negative correlations on scatter diagrams, and understand the correlation coefficient as a measure of the strength of a correlation. Page 286

Definition A correlation exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable. Page 286 Slide 7.1- 2

Here are a few examples of correlations: There is a correlation between the variables amount of smoking and likelihood of lung cancer; that is heavier smokers are more likely to get lung cancer. There is a correlation between the variables height and weight for people; that is, taller people tend to weigh more than shorter people. There is a correlation between the variables demand for apples and price of apples; that is, demand tends to decrease as price increases. There is a correlation between practice time and skill among piano players; that is, those who practice more tend to be more skilled. Page 286 Slide 7.1- 3

TIME OUT TO THINK Suppose there really were a gene that made people prone to both smoking and lung cancer. Explain why we would still find a strong correlation between smoking and lung cancer in that case, but would not be able to say that smoking causes lung cancer. Page 286 Slide 7.1- 4

Scatter Diagrams Definition A scatter diagram (or scatterplot) is a graph in which each point represents the values of two variables. Pages 286- 287 Slide 7.1- 5

Page 287 Slide 7.1- 6

1. We assign one variable to each axis and label the axis The following procedure describes how to make the scatter diagram in Figure 7.1. 1. We assign one variable to each axis and label the axis with values that comfortably fit all the data. Sometimes the axis selection is arbitrary, but if we suspect that one variable depends on the other then we plot the explanatory variable on the horizontal axis and the response variable on the vertical axis. In this case, we expect the diamond price to depend at least in part on its weight; we therefore say that weight is the explanatory variable (because it helps explain the price) and price is the response variable (because it responds to changes in the explanatory variable). Figure 7.1 Pages 287-288 Slide 7.1- 7

2. For each diamond in Table 7.1, we plot a single point The following procedure describes how to make the scatter diagram in Figure 7.1. We choose a range of 0 to 2.5 carats for the weight axis and $0 to $16,000 for the price axis. 2. For each diamond in Table 7.1, we plot a single point at the horizontal position corresponding to its weight and the vertical position corresponding to its price. For example, the point for Diamond 10 goes at a position of 1.11 carats on the horizontal axis and $3,670 on the vertical axis. The dashed lines on Figure 7.1 show how we locate this point. 3. (Optional) We can label some (or all) of the data points, as is done for Diamonds 10, 16, and 19 in Figure 7.1. 1. (cont.) Figure 7.1 Pages 287-288 Slide 7.1- 8

TIME OUT TO THINK Identify the points in Figure 7.1 (previous slide) that represent Diamonds 3, 7, and 23. Figure 7.1 is on page 288. Slide 7.1- 9

EXAMPLE 1 Color and Price Using the data in Table 7.1 (slide 6), create a scatter diagram to look for a correlation between a diamond’s color and price. Comment on the correlation. Solution: We expect price to depend on color, so we plot the explanatory variable color on the horizontal axis and the response variable price on the vertical axis in Figure 7.2. (You should check a few of the points against the data in Table 7.1.) The points appear much more scattered than in Figure 7.1. Nevertheless, you may notice a weak trend diagonally down- ward from the upper left toward the lower right. Pages 288-289 Figure 7.2 Slide 7.1- 10

EXAMPLE 1 Color and Price Using the data in Table 7.1 (slide 6), create a scatter diagram to look for a correlation between a diamond’s color and price. Comment on the correlation. Solution: (cont.) This trend represents a weak correlation in which diamonds with more yellow color (higher numbers for color) are less expensive. This trend is consistent with what we would expect, because colorless diamonds appear to sparkle more and are generally considered more desirable. Pages 288-289 Figure 7.2 Slide 7.1- 11

TIME OUT TO THINK Thanks to a large bonus at work, you have a budget of $6,000 for a diamond ring. A dealer offers you the following two choices for that price. One diamond weighs 1.20 carats and has color = 4. The other weighs 1.18 carats and has color = 3. Assuming all other characteristics of the diamonds are equal, which would you choose? Why? Page 289 Slide 7.1- 12

Types of Correlation (Note: detailed descriptions of these graphs appear in the next few slides.) Page 289 Figure 7.3 Types of correlation seen on scatter diagrams. Slide 7.1- 13

Figure 7.3(a-c) Types of correlation seen on scatter diagrams. Parts a to c of Figure 7.3 show positive correlations, in which the values of y tend to increase with increasing values of x. The correlation becomes stronger as we proceed from a to c. In fact, c shows a perfect positive correlation, in which all the points fall along a straight line. Pages 289-290 Slide 7.1- 14

Figure 7.3(d-f) Types of correlation seen on scatter diagrams. Parts d to f of Figure 7.3 show negative correlations, in which the values of y tend to decrease with increasing values of x. The correlation becomes stronger as we proceed from d to f. In fact, f shows a perfect negative correlation, in which all the points fall along a straight line. Pages 289-290 Slide 7.1- 15

Figure 7.3(g) Types of correlation seen on scatter diagrams. Part g of Figure 7.3 shows no correlation between x and y. In other words, values of x do not appear to be linked to values of y in any way. Pages 289-290 Slide 7.1- 16

Figure 7.3(h) Types of correlation seen on scatter diagrams. Part h of Figure 7.3 shows a nonlinear relationship, in which x and y appear to be related but the relationship does not correspond to a straight line. (Linear means along a straight line, and nonlinear means not along a straight line.) Pages 289-290 Slide 7.1- 17

Types of Correlation Positive correlation: Both variables tend to increase (or decrease) together. Negative correlation: The two variables tend to change in opposite directions, with one increasing while the other decreases. No correlation: There is no apparent (linear) relationship between the two variables. Nonlinear relationship: The two variables are related, but the relationship results in a scatter diagram that does not follow a straight-line pattern. Page 290 Slide 7.1- 18

Measuring the Strength of a Correlation Statisticians measure the strength of a correlation with a number called the correlation coefficient, represented by the letter r. Page 291 Slide 7.1- 19

Properties of the Correlation Coefficient, r The correlation coefficient, r, is a measure of the strength of a correlation. Its value can range only from -1 to 1. If there is no correlation, the points do not follow any ascending or descending straightline pattern, and the value of r is close to 0. If there is a positive correlation, the correlation coefficient is positive (0 < r ≤ 1): Both variables increase together. A perfect positive correlation (in which all the points on a scatter diagram lie on an ascending straight line) has a correlation coefficient r = 1. Values of r close to 1 mean a strong positive correlation and positive values closer to 0 mean a weak positive correlation. Page 291 Slide 7.1- 20

Properties of the Correlation Coefficient, r (cont,) If there is a negative correlation, the correlation coefficient is negative (-1 ≤ r < 0): When one variable increases, the other decreases. A perfect negative correlation (in which all the points lie on a descending straight line) has a correlation coefficient r = -1. Values of r close to -1 mean a strong negative correlation and negative values closer to 0 mean a weak negative correlation. Page 291 Slide 7.1- 21

EXAMPLE 3 U.S. Farm Size Figure 7.5 shows a scatter diagram for the variables number of farms and mean farm size in the United States. Each dot represents data from a single year between 1950 and 2000; on this diagram, the earlier years generally are on the right and the later years on the left. Figure 7.5 Scatter diagram for farm size data. Source: U.S. Department of Agriculture. Pages 291-292. Figure 7.3 is on page 289. Estimate the correlation coefficient by comparing this diagram to those in Figure 7.3 (slide 13) and discuss the underlying reasons for the correlation. Slide 7.1- 22

EXAMPLE 3 U.S. Farm Size The scatter diagram shows Solution: a strong negative correlation that most closely resembles the scatter diagram in Figure 7.3f, suggesting a correlation coefficient around r = -0.9. The correlation shows that as the number of farms decreases, the size of the remaining farms increases. This trend reflects a basic change in the nature of farming: Prior to 1950, most farms were small family farms. Over time, these small farms have been replaced by large farms owned by agribusiness corporations. Solution: Figure 7.5 Scatter diagram for farm size data. Source: U.S. Department of Agriculture. Pages 291-292. Note that examples 4 and 5, on pages 292-293, provide further practice for estimating correlation coefficients. Slide 7.1- 23

(They are reproduced on the next slide.) TIME OUT TO THINK For further practice, visually estimate the correlation coefficients for the data for diamond weight and price (Figure 7.1) and diamond color and price (Figure 7.2). (They are reproduced on the next slide.) Page 294. Figures 7.1 and 7.2 are on page 298. Slide 7.1- 24

Figure 7.1 Scatter diagram for the price and weight data in Table 7.1. Figures 7.1 and 7.2 are reproduced here for the previous slide. They can be found on page 298. Figure 7.2 Scatter diagram for the color and price data in Table 7.1. Slide 7.1- 25

Calculating the Correlation Coefficient (Optional Section) The formula for the (linear) correlation coefficient r can be expressed in several different ways that are all algebraically equivalent, which means that they produce the same value. The following expression has the advantage of relating more directly to the underlying rationale for r : Page 294 Slide 7.1- 26

The following alternative formula for r has the advantage of simplifying calculations, so it is often used whenever manual calculations are necessary. The following formula is also easy to program into statistical software or calculators: First calculate each of the required sums, then substitute the values into the formula. Be sure to note that (Σx2) and (Σx)2 are not equal: (Σx2) tells you to first square all the values of the variable x and then add them; (Σx)2 tells you to add the x values first and then square this sum. In other words, perform the operation within the parentheses first. Similarly, (Σy2) and (Σy)2 are not the same. Page 294 Slide 7.1- 27

The End Slide 7.1- 28