Copyright © Cengage Learning. All rights reserved.

Slides:



Advertisements
Similar presentations
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Advertisements

© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
Chapter 12 Section 1 Inference for Linear Regression.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
SIMPLE LINEAR REGRESSION
Measures of Regression and Prediction Intervals
Introduction to Linear Regression and Correlation Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Correlation.
Correlation and Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION.
CHAPTER 14 MULTIPLE REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Correlation and Regression 9.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Linear Regression Hypothesis testing and Estimation.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 13.
Lecture Slides Elementary Statistics Twelfth Edition
Hypothesis Testing: One-Sample Inference
Correlation and Regression
Copyright © Cengage Learning. All rights reserved.
Regression and Correlation
Copyright © Cengage Learning. All rights reserved.
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
1 Functions and Applications
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Chapter 5 STATISTICS (PART 4).
Copyright © Cengage Learning. All rights reserved.
SIMPLE LINEAR REGRESSION MODEL
Copyright © Cengage Learning. All rights reserved.
Correlation and Regression
Elementary Statistics
Inferences Concerning Regression Parameters
CHAPTER 10 Correlation and Regression (Objectives)
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Copyright © Cengage Learning. All rights reserved.
Chapter 10 Correlation and Regression
Statistical Inference about Regression
Interval Estimation and Hypothesis Testing
Inference in Linear Models
Correlation and Regression
Elementary Statistics: Looking at the Big Picture
Section 2: Linear Regression.
SIMPLE LINEAR REGRESSION
CHAPTER 12 More About Regression
CHAPTER 14 MULTIPLE REGRESSION
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Copyright © Cengage Learning. All rights reserved.
Measures of Regression Prediction Interval
Presentation transcript:

Copyright © Cengage Learning. All rights reserved. Correlation and Regression 8 10 Copyright © Cengage Learning. All rights reserved.

Inferences for Correlation and Regression Section 10.3 Inferences for Correlation and Regression Copyright © Cengage Learning. All rights reserved.

Focus Points Test the correlation coefficient . • Use sample data to compute the standard error of estimate Se. • Find a confidence interval for the value of y predicted for a specified value of x. Test the slope  of the least-squares line. Find a confidence interval for the slope  of the least-squares line and interpret its meaning.

Inferences for Correlation and Regression Learn more, earn more! We have probably all heard this platitude. The question is whether or not there is some truth in this statement. Do college graduates have an improved chance at a better income? Is there a trend in the general population to support the “learn more, earn more” statement? Consider the following variables: x = percentage of the population 25 or older with at least four years of college and y = percentage growth in per capita income over the past seven years.

Inferences for Correlation and Regression A random sample of six communities in Ohio gave the information (based on Life in America’s Small Cities by G. S. Thomas) shown in Table 9-10. We can compute the sample correlation coefficient r and the least-squares line = a + bx using the data of Table 9-10. Education and Income Growth Percentages Table 9.10

Inferences for Correlation and Regression However, r is only a sample correlation coefficient, and = a + bx is only a “sample-based” least-squares line. What if we used all possible data pairs (x, y) from all U.S. cities, not just six towns in Ohio? If we accomplished this seemingly impossible task, we would have the population of all (x, y) pairs. From this population of (x, y) pairs, we could (in theory) compute the population correlation coefficient, which we call  (Greek letter rho, pronounced like “row”).

Inferences for Correlation and Regression We could also compute the least-squares line for the entire population, which we denote as y =  + x using more Greek letters, (alpha) and  (beta).

Testing the Correlation Coefficient

Testing the Correlation Coefficient The first topic we want to study is the statistical significance of the sample correlation coefficient r. To do this, we construct a statistical test of  , the population correlation coefficient.

Testing the Correlation Coefficient Procedure:

Example 6 – Testing  Let’s return to our data from Ohio regarding the percentage of the population with at least four years of college and the percentage of growth in per capita income (Table 9-10). We’ll develop a test for the population correlation coefficient  . Education and Income Growth Percentages Table 9.10

Example 6 – Solution First, we compute the sample correlation coefficient r. Using a calculator, statistical software, or a “by-hand” calculation, we find r  0.887 Now we test the population correlation coefficient . Remember that x represents percentage college graduates and y represents percentage salary increases in the general population.

Example 6 – Solution cont’d We suspect the population correlation is positive,  > 0. Let’s use a 1% level of significance: H0 :  = 0 (no linear correlation) H1 :  > 0 (positive linear correlation) Convert the sample test statistic r = 0.887 to t using n = 6. with d.f. = n – 2 = 6 – 2 = 4

Example 6 – Solution cont’d The P-value for the sample test statistic t = 3.84 is shown in Figure 9-15. P-value Figure 9.15

Example 6 – Solution cont’d Since we have a right-tailed test, we use the one-tail area in the Student’s t distribution (Table 6 of Appendix II). From Table 9-11, we see that 0.005 < P-value < 0.010 Excerpt from Student’s t Distribution Table 9.11

Example 6 – Solution cont’d Since the interval containing the P-value is less than the level of significance  = 0.01, we reject H0 and conclude that the population correlation coefficient between x and y is positive. Technology gives P-value  0.0092. Caution: Although we have shown that x and y are positively correlated, we have not shown that an increase in education causes an increase in earnings.

Standard Error of Estimate

Standard Error of Estimate Sometimes a scatter diagram clearly indicates the existence of a linear relationship between x and y, but it can happen that the points are widely scattered about the least-squares line. We need a method (besides just looking) for measuring the spread of a set of points about the least-squares line. There are three common methods of measuring the spread. One method uses the standard error of estimate. The others use the coefficient of correlation and the coefficient of determination.

Standard Error of Estimate For the standard error of estimate, we use a measure of spread that is in some ways like the standard deviation of measurements of a single variable. Let = a + bx be the predicted value of y from the least-squares line. Then y – is the difference between the y value of the data point (x, y) shown on the scatter diagram (Figure 9-16) and the value of the point on the least-squares line with the same x value. The Distance Between Points (x, y) and (x, ) Figure 9.16

Standard Error of Estimate The quantity y – is known as the residual. To avoid the difficulty of having some positive and some negative values, we square the quantity (y – ). Then we sum the squares and, for technical reasons, divide this sum by n – 2. Finally, we take the square root to obtain the standard error of estimate, denoted by Se.

Standard Error of Estimate Procedure:

Example 7 – Least-squares line and Se June and Jim are partners in the chemistry lab. Their assignment is to determine how much copper sulfate (CuSO4) will dissolve in water at 10, 20, 30, 40, 50, 60, and 70C. Their lab results are shown in Table 9-12, where y is the weight in grams of copper sulfate that will dissolve in 100 grams of water at xC. Sketch a scatter diagram, find the equation of the least-squares line, and compute Se. Lab Results (x = C, y = amount of CuSo4) Table 9.12

Example 7 – Solution Figure 9-17 includes a scatter diagram for the data of Table 9-12. Scatter Diagram and Least-Squares Line for Chemistry Experiment Figure 9.17

Example 7 – Solution cont’d To find the equation of the least-squares line and the value of Se, we set up a computational table (Table 9-13). Scatter Diagram and Least-Squares Line for Chemistry Experiment Table 9.13

Example 7 – Solution and a = y – bx  30.429 – 0.507(40)  10.149 cont’d and a = y – bx  30.429 – 0.507(40)  10.149

Example 7 – Solution The equation of the least-squares line is cont’d The equation of the least-squares line is = a + bx  10.14 + 0.51x The graph of the least-squares line is shown in Figure 9-17. Notice that it passes through the point (x, y ) = (40, 30.4) . Scatter Diagram and Least-Squares Line for Chemistry Experiment Figure 9.17

Example 7 – Solution cont’d Another point on the line can be found by using x = 15 in the equation of the line = 10.14 + 0.51x . When we use 15 in place of x, we obtain = 10.14 + 0.51(15) = 17.8. The point (15, 17.8) is the other point we used to graph the least-squares line in Figure 9-17.

Example 7 – Solution The standard error of estimate is computed using the computational formula Note: This formula is very sensitive to rounded values of a and b.

Confidence Intervals for y

Confidence Intervals for y The least-squares line gives us a predicted value for a specified x value. However, we used sample data to get the equation of the line. The line derived from the population of all data pairs is likely to have a slightly different slope, which we designate by the symbol  for population slope, and a slightly different y intercept, which we designate by the symbol  for population intercept. In addition, there is some random error ε, so the true y value is y =  + x + ε

Confidence Intervals for y Because of the random variable ε, for each x value there is a corresponding distribution of y values. The methods of linear regression were developed so that the distribution of y values for a given x is centered on the population regression line. Furthermore, the distributions of y values corresponding to each x value all have the same standard deviation, estimated by the standard error of estimate Se. Using all this background, the theory tells us that for a specific x, a c confidence interval for y is given by the next procedure.

Confidence Intervals for y Procedure:

Example 8 – Confidence interval for prediction Using the data of Table 9-13 of Example 7, find a 95% confidence interval for the amount of copper sulfate that will dissolve in 100 grams of water at 45C. Solution: First, we need to find for x = 45C . We use the equation of the least-squares line that we found in Example 7.  10.14 + 0.51x from Example 7  10.14 + 0.51(45) use 45 in place of x  33

Example 8 – Solution A 95% confidence interval for y is then cont’d A 95% confidence interval for y is then – E < y < + E 33 – E < y < 33 + E Where E From Example 7, we have n = 7, x = 280, x2 = 14,000, x = 40, and Se  2.35. Using n – 2 = 7 – 2 = 5 degrees of freedom, we find from Table 6 of Appendix II that t0.95 = 2.571.

Example 8 – Solution E  (2.571)(2.35)  (2.571)(2.35)  6.5 cont’d E  (2.571)(2.35)  (2.571)(2.35)  6.5 A 95% confidence interval for y is 33 – 6.5  y  33 + 6.5 26.5  y  39.5

Example 8 – Solution cont’d This means we are 95% sure that the interval between 26.5 grams and 39.5 grams is one that contains the predicted amount of copper sulfate that will dissolve in 100 grams of water at 45C. The interval is fairly wide but would decrease with more sample data.

Inferences about the Slope 

Inferences about the Slope  Recall that = a + bx is the sample-based least-squares line and that y =  + x is the population-based least-squares line computed (in theory) from the population of all (x, y) data pairs. In many real-world applications, the slope  is very important because  measures the rate at which y changes per unit change in x. Our next topic is to develop statistical tests and confidence intervals for .

Inferences about the Slope  Procedure:

Inferences about the Slope  cont’d

Example 8 – Testing  and finding a confidence interval for  Plate tectonics and the spread of the ocean floor are very important topics in modern studies of earthquakes and earth science in general. A random sample of islands in the Indian Ocean gave the following information. x = age of volcanic island in the Indian Ocean (units in 106 years) y = distance of the island from the center of the midoceanic ridge (units in 100 kilometers)

Example 8 – Testing  and finding a confidence interval for  cont’d (a) Starting from raw data values (x, y), the first step is simple but tedious. In short, you may verify (if you wish) that x = 415, y = 128, x2 = 30,203, y2 = 2558.5, xy = 8133, x = 51.875, and y = 16 (b) The next step is to compute b, a, and Se. Using a calculator, statistical software, or the formulas, we get b  0.1721 and a  7.072

Example 8 – Testing  and finding a confidence interval for  cont’d Since n = 8, we get

Example 9 – Testing  and finding a confidence interval for  cont’d Use an  = 5% level of significance to test the claim that  is positive. Solution: = 0.05; H0:  = 0; H1:  > 0. The sample test statistic is With d.f. = n – 2 = 8 – 2 = 6.

Example 9 – Testing  and finding a confidence interval for  cont’d We use the Student’s t distribution (Table 6 of Appendix II) to find an interval containing the P-value. The test is a one-tailed test. Technology gives P-value  0.0244. Since the interval containing the P-value is less than  = 0.05,we reject H0 and conclude that, at the 5% level of significance, the slope is positive. Excerpt from Table 6, Appendix II Table 9.14

Example 9 – Testing  and finding a confidence interval for  cont’d Find a 75% confidence interval for . Solution: For c = 0.75 and d.f. = n – 2 = 8 – 2 = 6, the critical value tc = 1.273. The margin of error E for the confidence interval is

Example 9 – Testing  and finding a confidence interval for  cont’d Using b  0.17, a 75% confidence interval for  is b – E <  < b + E 0.17 – 0.09 <  < 0.17 + 0.09 0.08 <  < 0.26 (e) Interpretation What does the confidence interval mean? Recall the units involved (x in 106 years and y in 100 kilometers). It appears that, in this part of the world, we can be 75% confident that we have an interval showing that the ocean floor is moving at a rate of between 8 mm and 26 mm per year.

Computation Hints for Sample Test Statistic t Used in Testing  and Testing 

Using this relationship and some algebra, it can be shown that Computation Hints for Sample Test Statistic t Used in Testing  and Testing  In Section 9.2 we saw that the values for the sample correlation coefficient r and slope b of the least-squares line are related by the formula b = r Using this relationship and some algebra, it can be shown that

Computation Hints for Sample Test Statistic t Used in Testing  and Testing  Consequently, when doing calculations or using results from technology, we can use the following strategies. (a) Calculations “by hand”: Find the sample test statistic t corresponding to r. The sample test statistic t corresponding to b is the same. (b) Using computer results: Most computer-based statistical packages provide the sample test statistic t corresponding to b. The sample test statistic t corresponding to r is the same.