Download presentation
Presentation is loading. Please wait.
Published byScot Dalton Modified over 8 years ago
1
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple Regression 10-6 Modeling
2
Slide Slide 2 Section 10-1 Overview
3
Slide Slide 3 Overview This chapter introduces important methods for making inferences about a correlation (or relationship) between two variables, and describing such a relationship with an equation that can be used for predicting the value of one variable given the value of the other variable. We consider sample data that come in pairs.
4
Slide Slide 4 Section 10-2 Correlation
5
Slide Slide 5 Context of data: Important! 1)What is the key issue if this data is test grades of subjects before and after formal instruction? 2)What is the key issue if these data are reasoning tests for sample of men (x) and a separate sample of women (y)? 3)What is the key issue if each pair represents a math reasoning score (x) and a starting salary (y) in thousands of dollars? XY 7889 8593 9299 100 8584
6
Slide Slide 6 Activity Partner up! 1)Measure the length of your thumb; measure your partner’s thumb. 2)Text the following messages to you partner and record the time it takes for them to text the message to you (must be exactly the same): Ms. OLoughlin is the best statistics teacher ever! 3)Record your results on the board 4)Plot your data; which one is x and which one is y? 5)Do you see a relationship between the two variables? Describe it as best you can. Length of thumb Speed of text
7
Slide Slide 7 Key Concept This section introduces the linear correlation coefficient r, which is a numerical measure of the strength of the relationship between two variables representing quantitative data. Because technology can be used to find the value of r, it is important to focus on the concepts in this section, without becoming overly involved with tedious arithmetic calculations. Use r to conclude whether or not there is a relationship between two variables.
8
Slide Slide 8 Definitions A correlation exists between two variables when one of them is related to the other in some way. Part 1: Basic Concepts of Correlation The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values in a sample.
9
Slide Slide 9 Exploring the Data We can often see a relationship between two variables by constructing a scatterplot. Figure 10-2 following shows scatterplots with different characteristics. You should study the “overall” pattern of plotted points! Beware of outliers!
10
Slide Slide 10 Scatterplots of Paired Data Figure 10-2
11
Slide Slide 11 Scatterplots of Paired Data Figure 10-2
12
Slide Slide 12 Requirements 1. The sample of paired ( x, y ) data is a random sample of independent quantitative data. 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern. 3. The outliers must be removed if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.
13
Slide Slide 13 Notation for the Linear Correlation Coefficient n represents the number of pairs of data present. denotes the addition of the items indicated. x denotes the sum of all x- values. x 2 indicates that each x - value should be squared and then those squares added. ( x ) 2 indicates that the x- values should be added and the total then squared. xy indicates that each x -value should be first multiplied by its corresponding y- value. After obtaining all such products, find their sum. r represents linear correlation coefficient for a sample. represents linear correlation coefficient for a population.
14
Slide Slide 14 Formula 10-1 n xy – ( x)( y) n( x 2 ) – ( x) 2 n( y 2 ) – ( y) 2 r =r = The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample. Calculators can compute r Formula
15
Slide Slide 15 Interpreting r Using Table A-6: If the absolute value of the computed value of r exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Using Software: If the computed P-value is less than or equal to the significance level, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation.
16
Slide Slide 16 Round to three decimal places so that it can be compared to critical values in Table A-6. Use calculator or computer if possible. Rounding the Linear Correlation Coefficient r
17
Slide Slide 17 3535 1818 3636 5454 Data x y You! Calculate r… Using the simple random sample of data below, find the value of r. 1) Use Table A-6 to tell if there is a linear correlation at.01 and.05. 2) Use your calculator to see if there is a linear correlation at.01 and.05.
18
Slide Slide 18 Activity Partner up! 1)Measure the length of your thumb; measure your partner’s thumb. 2)Text the following messages to you partner and record the time it takes for them to text the message to you (must be exactly the same): Ms. OLoughlin is the best statistics teacher ever! 3)Record your results on the board 4)Plot your data; which one is x and which one is y? 5)Do you see a relationship between the two variables? Describe it as best you can. 6) Use Table A-6 to tell if there is a linear correlation at.01 and.05. 7) Use your calculator to see if there is a linear correlation at.01 and.05 (LinRegTTest). Length of thumb Speed of text
19
Slide Slide 19 Properties of the Linear Correlation Coefficient r 1. –1 r 1 2. The value of r does not change if all values of either variable are converted to a different scale. 3. The value of r is not affected by the choice of x and y. Interchange all x- and y- values and the value of r will not change. 4. r measures strength of a linear relationship.
20
Slide Slide 20 Interpreting r : Explained Variation The value of r 2 is the proportion of the variation in y that is explained by the linear relationship between x and y.
21
Slide Slide 21 Using the duration/interval data in Table 10-1, we have found that the value of the linear correlation coefficient r = 0.926. What proportion of the variation of the interval after eruption times can be explained by the variation in the duration times? With r = 0.926, we get r 2 = 0.857. We conclude that 0.857 (or about 86%) of the variation in interval after eruption times can be explained by the linear relationship between the duration times and the interval after eruption times. This implies that 14% of the variation in interval after eruption times cannot be explained by the duration times. Example: Old Faithful
22
Slide Slide 22 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no linear correlation.
23
Slide Slide 23 Always graph your data! 1)Find r, r-squared 2)Comment on the form, direction, and strength. Should we use r for this data? X1081391114641275 Y 9.148.148.748.779.268.106.133.109.137.264.74
24
Slide Slide 24 Extra Example Many of us have heard that a tip should be 15% of the bill. The accompanying table lists some sample data collected from some students. 1)Find r, r-squared. Interpret r-squared. 2)Comment on the form, direction, and strength. 3) Is there sufficient evidence to conclude that there is a relationship between the amount of the bill and the amount of the tip? (Use Table A-6 and LinRegTTest) Bill ($) 33.4650.6887.9298.8463.6107.34 Tip ($)5.558.08171216
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.