Correlation and Linear Regression

Slides:



Advertisements
Similar presentations
Correlation and Linear Regression
Advertisements

Lesson 10: Linear Regression and Correlation
Forecasting Using the Simple Linear Regression Model and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Multiple Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Correlation and Regression
Linear Regression and Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Linear Regression and Correlation
(Regression, Correlation, Time Series) Analysis
Chapter 6 & 7 Linear Regression & Correlation
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Linear Regression
Correlation and Linear Regression
Linear Regression and Correlation
Correlation and Linear Regression
Regression and Correlation
Correlation and Simple Linear Regression
Prepared by Lee Revere and John Large
Product moment correlation
SIMPLE LINEAR REGRESSION
Chapter Thirteen McGraw-Hill/Irwin
Linear Regression and Correlation
Presentation transcript:

Correlation and Linear Regression Chapter 13

Learning Objectives LO13-1 Explain the purpose of correlation analysis. LO13-2 Calculate a correlation coefficient to test and interpret the relationship between two variables. LO13-3 Apply regression analysis to estimate the linear LO13-4 Evaluate the significance of the slope of the regression equation. LO13-5 Evaluate a regression equation’s ability to predict using the standard estimate of the error and the coefficient of determination. LO13-6 Calculate and interpret confidence and prediction intervals. LO13-7 Use a log function to transform a nonlinear relationship.

LO13-1 Explain the purpose of correlation analysis. Correlation Analysis – Measuring the Relationship Between Two Variables Analyzing relationships between two quantitative variables. The basic hypothesis of correlation analysis: Does the data indicate that there is a relationship between two quantitative variables? For the Applewood Auto sales data, the data is displayed in a scatter graph. Are profit per vehicle and age correlated?

Correlation Analysis – Measuring LO13-1 Correlation Analysis – Measuring the Relationship Between Two Variables The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. The sample correlation coefficient is identified by the lowercase letter r. It shows the direction and strength of the linear relationship between two interval- or ratio-scale variables. It ranges from -1 up to and including +1. A value near 0 indicates there is little linear relationship between the variables. A value near +1 indicates a direct or positive linear relationship between the variables. A value near -1 indicates an inverse or negative linear relationship between the variables.

LO13-1 Correlation Analysis – Measuring the Relationship Between Two Variables .

Computing the Correlation Coefficient: LO13-2 Calculate a correlation coefficient to test and interpret the relationship between two variables. Correlation Analysis – Measuring the Relationship Between Two Variables Computing the Correlation Coefficient:

Correlation Analysis – Example LO13-2 Correlation Analysis – Example The sales manager of Copier Sales of America has a large sales force throughout the United States and Canada and wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month. The manager selects a random sample of 15 representatives and determines the number of sales calls each representative made last month and the number of copiers sold. Determine if the number of sales calls and copiers sold are correlated.

Correlation Analysis – Example LO13-2 Correlation Analysis – Example Step 1: State the null and alternate hypotheses. H0:  = 0 (the correlation in the population is 0) H1:  ≠ 0 (the correlation in the population is not 0) Step 2: Select a level of significance. We select a .05 level of significance. Step 3: Identify the test statistic. To test a hypothesis about a correlation we use the t-statistic. For this analysis, there will be n-2 degrees of freedom.

Correlation Analysis – Example LO13-2 Correlation Analysis – Example Step 4: Formulate a decision rule. Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 t > t0.025,13 or t < -t0.025,13 t > 2.160 or t < -2.160

Correlation Coefficient – Example LO13-2 Correlation Coefficient – Example Step 5: Take a sample, calculate the statistics, arrive at a decision. . Numerator

Correlation Coefficient – Example LO13-2 Correlation Coefficient – Example Step 5 (continued): Take a sample, calculate the statistics, arrive at a decision. The t-test statistic, 6.216, is greater than 2.160. Therefore, reject the null hypothesis that the correlation coefficient is zero. Step 6: Interpret the result. The data indicate that there is a significant correlation between the number of sales calls and copiers sold. We can also observe that the correlation coefficient is .865, which indicates a strong, positive relationship. In other words, more sales calls are strongly related to more copier sales. Please note that this statistical analysis does not provide any evidence of a causal relationship. Another type of study is needed to test that hypothesis.

LO13-3 Apply regression analysis to estimate the linear relationship between two variables. Correlation Analysis tests for the strength and direction of the relationship between two quantitative variables. Regression Analysis evaluates and “measures” the relationship between two quantitative variables with a linear equation. This equation has the same elements as any equation of a line, that is, a slope and an intercept. The relationship between X and Y is defined by the values of the intercept, a, and the slope, b. In regression analysis, we use data (observed values of X and Y) to estimate the values of a and b. Y = a + b X

Regression Analysis LO13-3 EXAMPLES Assuming a linear relationship between the size of a home, measured in square feet, and the cost to heat the home in January, how does the cost vary relative to the size of the home? In a study of automobile fuel efficiency, assuming a linear relationship between miles per gallon and the weight of a car, how does the fuel efficiency vary relative to the weight of a car?

Regression Analysis: Variables LO13-3 Regression Analysis: Variables Y = a + b X Y is the Dependent Variable. It is the variable being predicted or estimated. X is the Independent Variable. For a regression equation, it is the variable used to estimate the dependent variable, Y. X is the predictor variable. Examples of dependent and independent variables: How does the size of a home, measured in number of square feet, relate to the cost to heat the home in January? We would use the home size as, X, the independent variable to predict the heating cost, and Y as the dependent variable. Regression equation: Heating cost = a + b (home size) How does the weight of a car relate to the car’s fuel efficiency? We would use car weight as, X, the independent variable to predict the car’s fuel efficiency, and Y as the dependent variable. Regression equation: Miles per gallon = a + b (car weight)

Regression Analysis – Example LO13-3 Regression Analysis – Example Regression analysis estimates a and b by fitting a line to the observed data. Each line (Y = a + bX) is defined by values of a and b. A way to find the line of “best fit” to the data is the: LEAST SQUARES PRINCIPLE Determining a regression equation by minimizing the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y.

Regression Analysis – Example LO13-3 Regression Analysis – Example Recall the example involving Copier Sales of America. The sales manager gathered information on the number of sales calls made and the number of copiers sold for a random sample of 15 sales representatives. Use the least squares method to determine a linear equation to express the relationship between the two variables. In this example, the number of sales calls is the independent variable, X, and the number of copiers sold is the dependent variable, Y. What is the expected number of copiers sold by a representative who made 20 calls? Number of Copiers Sold = a + b ( Number of Sales Calls)

Regression Analysis – Example LO13-3 Regression Analysis – Example Descriptive statistics: Correlation coefficient:

Regression Analysis - Example LO13-3 Regression Analysis - Example Step 1: Find the slope (b) of the line. Step 2: Find the y-intercept (a). Step 3: Create the regression equation. Number of Copiers Sold = 19.9632 + 0.2608 ( Number of Sales Calls) Step 4: What is the predicted number of sales if someone makes 20 sales calls? Number of Copiers Sold = 25.1792 = 19.9632 + 0.2608(20)

Regression Analysis ANOVA (Excel) – Example LO13-3 Regression Analysis ANOVA (Excel) – Example a b Number of Copiers Sold = 19.9800 + 0.2606 ( Number of Sales Calls) * Note that the Excel differences in the values of a and b are due to rounding.

Regression Analysis: Testing the Significance of the Slope – Example LO13-4 Evaluate the significance of the slope of the regression equation. Regression Analysis: Testing the Significance of the Slope – Example Step 1: State the null and alternate hypotheses. H0: β = 0 (the slope of the regression equation is 0) H1: β ≠ 0 (the slope of the regression equation is not 0) Step 2: Select a level of significance. We select a .05 level of significance. Step 3: Identify the test statistic. To test a hypothesis about the slope of a regression equation, we use the t-statistic. For this analysis, there will be n-2 degrees of freedom.

Regression Analysis: Testing the Significance of the Slope – Example Step 4: Formulate a decision rule. Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 t > t0.025,13 or t < -t0.025,13 t > 1.771 or t < -1.771

Regression Analysis: Testing the Significance of the Slope – Example Step 5: Take a sample, calculate the ANOVA (Excel), arrive at a decision. . Decision: Reject the null hypothesis that the slope of the regression equation is equal to zero.

Regression Analysis: Testing the Significance of the Slope – Example Step 6: Interpret the result. For the regression equation that predicts the number of copier sales based on the number of sales calls, the data indicate that the slope, (0.2606), is not equal to zero. Therefore, the slope can be interpreted and used to relate the dependent variable (number of copier sales) to the independent variable (number of sales calls). In fact, the value of the slope indicates that for an increase of 1 sales call, the number of copiers sold will increase 0.2606. If a salesperson increases their number of sales calls by 10, the value of the slope indicates that the number of copiers sold is predicted to increase by 2.606. As in correlation analysis, please note that this statistical analysis does not provide any evidence of a causal relationship. Another type of study is needed to test that hypothesis.

Regression Analysis: The Standard Error of Estimate LO13-5 Evaluate a regression equation’s ability to predict using the standard estimate of the error and the coefficient of determination. Regression Analysis: The Standard Error of Estimate The standard error of estimate measures the scatter, or dispersion, of the observed values around the line of regression for a given value of X. The standard error of estimate is important in the calculation of confidence and prediction intervals. Formula used to compute the standard error: .

Regression Analysis ANOVA: The Standard Error of Estimate – Example LO13-5 Regression Analysis ANOVA: The Standard Error of Estimate – Example Recall the example involving Copier Sales of America. The sales manager determined the least squares regression. Determine the standard error of estimate as a measure of how well the values fit the regression line.

Regression Analysis: Coefficient of Determination LO13-5 Regression Analysis: Coefficient of Determination The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. It ranges from 0 to 1. It does not provide any information on the direction of the relationship between the variables.

Regression Analysis: Coefficient of Determination – Example LO13-5 Regression Analysis: Coefficient of Determination – Example The coefficient of determination, r2, is 0.748. It can be computed as the correlation coefficient, squared: (0.865)2. The coefficient of determination is expressed as a proportion or percent; we say that 74.8 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls.

Regression Analysis ANOVA: Coefficient of Determination – Example LO13-5 Regression Analysis ANOVA: Coefficient of Determination – Example The Coefficient of Determination can also be computed based on its definition. We can divide the Regression Sum of Squares (the variation in the dependent variable explained by the regression equation) divided by the Total Sum of Squares (the total variation in the dependent variable).

Regression Analysis: Computing Interval Estimates for Y LO13-6 Calculate and interpret confidence and prediction intervals. Regression Analysis: Computing Interval Estimates for Y A regression equation is used to predict or estimate the population value of the dependent variable, Y, for a given X. In general, estimates of population parameters are subject to sampling error. Recall that confidence intervals account for sampling error by providing an interval estimate of a population parameter. In regression analysis, interval estimates are also used to provide a complete picture of the point estimate of Y for a given X by computing an interval estimate that accounts for sampling error. In regression analysis, there are two types of intervals: A confidence interval reports the interval estimate for the mean value of Y for a given X. A prediction interval reports the interval estimate for an individual value of Y for a particular value of X.

Assumptions underlying linear regression: LO13-6 LO13-6 Regression Analysis: Computing Interval Estimates for Y Assumptions underlying linear regression: For each value of X, the Y values are normally distributed. The means of these normal distributions of Y values all lie on the regression line. The standard deviations of these normal distributions are equal. The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.

Regression Analysis: Computing Interval Estimates for Y – Example LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example We return to the Copier Sales of America illustration. Determine a 95 percent confidence interval for all sales representatives, that is, the population mean number of copiers sold, who make 50 sales calls. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. Thus, the 95% confidence interval for all sales representatives who make 50 calls is from 27.3942 up to 38.6122. To interpret, let’s round the values. For all sales representative who make 50 calls, the predicted mean number of copiers sold is 33. The mean sales will range from 27 to 39 copiers.

Regression Analysis: Computing Interval Estimates for Y – Example LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example Comments on calculation: The t-statistic is 2.160 based on a two-tailed test with n – 2 = 15 – 2 = 13 degrees of freedom. The only new value is . Note that the width of the interval or the margin of error when predicting the dependent variable is related to the standard error of the estimate. *Note the values of “a” and “b” differ from the EXCEL values due to rounding.

Regression Analysis: Computing Interval Estimates for Y – Example LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example We return to the Copier Sales of America illustration. Determine a 95 percent prediction interval for individual sales representatives, such as Sheila Baker, who makes 50 sales calls. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. Thus, the prediction interval of copiers sold by an individual sales person, such as Sheila Baker, who makes 50 sales calls is from 17.442 up to 48.5644 copiers. Rounding these results, the predicted number of copiers sold will be between 17 and 49. This interval is quite large. It is much larger than the confidence interval for all sales representatives who made 50 calls. It is logical, however, that there should be more variation in the sales estimate for an individual than for the mean of a group.

Regression Analysis: Computing Interval Estimates for Y – Example LO13-6 Regression Analysis: Computing Interval Estimates for Y – Example Comments on calculation: The t-statistic is 2.160 based on a two-tailed test with n – 2 = 15 – 2 = 13 degrees of freedom. The only new value is . Note that the width of the interval or the margin of error when predicting the dependent variable is related to the standard error of the estimate. Also, note that the prediction interval is wider because 1 is added to the sum under the square root sign. *Note the values of “a” and “b” differ from the EXCEL values due to rounding.

LO13-6 Regression Analysis: Computing Interval Estimates for Y – Minitab Illustration ConfidenceIntervals Prediction Intervals

Regression Analysis: Transforming Non-linear Relationships LO13-7 Use a log function to transform a nonlinear relationship. Regression Analysis: Transforming Non-linear Relationships One of the assumptions of regression analysis is that the relationship between the dependent and independent variables is LINEAR. Sometimes, two variables have a NON-LINEAR relationship. When this occurs, the data can be transformed to create a linear relationship. The regression analysis is applied on the transformed variables.

Regression Analysis: Transforming Non-Linear Relationships LO13-7 Regression Analysis: Transforming Non-Linear Relationships In this case, the dependent variable, sales, is transformed to the log(sales). The graph shows that the relationship between log(sales) and price is linear. Now regression analysis can be used to create the regression equation between log(sales) and price.