Introduction to Linear Regression and Correlation Analysis

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
Chapter 12 Simple Linear Regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation & Regression
Correlation and Linear Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
SIMPLE LINEAR REGRESSION
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
Linear Regression and Correlation
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Chapter 6 & 7 Linear Regression & Correlation
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Chapter 13 Simple Linear Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Linear Regression and Correlation Analysis
Correlation and Regression
PENGOLAHAN DAN PENYAJIAN
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Chapter Thirteen McGraw-Hill/Irwin
Presentation transcript:

Introduction to Linear Regression and Correlation Analysis Chapter 11 Introduction to Linear Regression and Correlation Analysis

Chapter 11 - Chapter Outcomes After studying the material in this chapter, you should be able to: Calculate and interpret the simple correlation between two variables. Determine whether the correlation is significant. Calculate and interpret the simple linear regression coefficients for a set of data. Understand the basic assumptions behind regression analysis. Determine whether a regression model is significant.

Chapter 11 - Chapter Outcomes (continued) After studying the material in this chapter, you should be able to: Calculate and interpret confidence intervals for the regression coefficients. Recognize regression analysis applications for purposes of prediction and description. Recognize some potential problems if regression analysis is used incorrectly. Recognize several nonlinear relationships between two variables.

Scatter Diagrams A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred to as a scatter diagram.

Dependent and Independent Variables A dependent variable is the variable to be predicted or explained in a regression model. This variable is assumed to be functionally related to the independent variable.

Dependent and Independent Variables An independent variable is the variable related to the dependent variable in a regression equation. The independent variable is used in a regression model to estimate the value of the dependent variable.

Two Variable Relationships (Figure 11-1) Y X (a) Linear

Two Variable Relationships (Figure 11-1) Y X (b) Linear

Two Variable Relationships (Figure 11-1) Y X (c) Curvilinear

Two Variable Relationships (Figure 11-1) Y X (d) Curvilinear

Two Variable Relationships (Figure 11-1) Y X (e) No Relationship

Correlation The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation ranges from + 1.0 to - 1.0. A correlation of  1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship.

SAMPLE CORRELATION COEFFICIENT where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable

SAMPLE CORRELATION COEFFICIENT or the algebraic equivalent:

Correlation (Example 11-1) (Table 11-1)

Correlation (Example 11-1)

Correlation (Example 11-1) Correlation between Years and Sales Excel Correlation Output (Figure 11-5)

TEST STATISTIC FOR CORRELATION where: t = Number of standard deviations r is from 0 r = Simple correlation coefficient n = Sample size

Correlation Significance Test (Example 11-1) Rejection Region  /2 = 0.025 Rejection Region  /2 = 0.025 Since t=4.752 > 2.048, reject H0, there is a significant linear relationship

Correlation Spurious correlation occurs when there is a correlation between two otherwise unrelated variables.

Simple Linear Regression Analysis Simple linear regression analysis analyzes the linear relationship that exists between a dependent variable and a single independent variable.

Simple Linear Regression Analysis SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL) where: y = Value of the dependent variable x = Value of the independent variable = Population’s y-intercept = Slope of the population regression line = Error term, or residual

Simple Linear Regression Analysis The simple linear regression model has four assumptions: Individual values if the error terms, i, are statistically independent of one another. The distribution of all possible values of  is normal. The distributions of possible i values have equal variances for all value of x. The means of the dependent variable, for all specified values of the independent variable, y, can be connected by a straight line called the population regression model.

Simple Linear Regression Analysis REGRESSION COEFFICIENTS In the simple regression model, there are two coefficients: the intercept and the slope.

Simple Linear Regression Analysis The interpretation of the regression slope coefficient is that is gives the average change in the dependent variable for a unit increase in the independent variable. The slope coefficient may be positive or negative, depending on the relationship between the two variables.

Simple Linear Regression Analysis The least squares criterion is used for determining a regression line that minimizes the sum of squared residuals.

Simple Linear Regression Analysis A residual is the difference between the actual value of the dependent variable and the value predicted by the regression model.

Simple Linear Regression Analysis 390 400 Sales in Thousands 300 312 200 Residual = 312 - 390 = -78 100 X 4 Years with Company

Simple Linear Regression Analysis ESTIMATED REGRESSION MODEL (SAMPLE MODEL) where: = Estimated, or predicted, y value b0 = Unbiased estimate of the regression intercept b1 = Unbiased estimate of the regression slope x = Value of the independent variable

Simple Linear Regression Analysis LEAST SQUARES EQUATIONS algebraic equivalent: and

Simple Linear Regression Analysis SUM OF SQUARED ERRORS

Simple Linear Regression Analysis (Midwest Example) (Table 11-3)

Simple Linear Regression Analysis (Table 11-3) The least squares regression line is:

Simple Linear Regression Analysis (Figure 11-11) Excel Midwest Distribution Results

Least Squares Regression Properties The sum of the residuals from the least squares regression line is 0. The sum of the squared residuals is a minimum. The simple regression line always passes through the mean of the y variable and the mean of the x variable. The least squares coefficients are unbiased estimates of 0 and 1.

Simple Linear Regression Analysis SUM OF RESIDUALS SUM OF SQUARED RESIDUALS

Simple Linear Regression Analysis TOTAL SUM OF SQUARES where: TSS = Total sum of squares n = Sample size y = Values of the dependent variable = Average value of the dependent variable

Simple Linear Regression Analysis SUM OF SQUARES ERROR (RESIDUALS) where: SSE = Sum of squares error n = Sample size y = Values of the dependent variable = Estimated value for the average of y for the given x value

Simple Linear Regression Analysis SUM OF SQUARES REGRESSION where: SSR = Sum of squares regression = Average value of the dependent variable y = Values of the dependent variable = Estimated value for the average of y for the given x value

Simple Linear Regression Analysis SUMS OF SQUARES

Simple Linear Regression Analysis The coefficient of determination is the portion of the total variation in the dependent variable that is explained by its relationship with the independent variable. The coefficient of determination is also called R-squared and is denoted as R2.

Simple Linear Regression Analysis COEFFICIENT OF DETERMINATION (R2)

Simple Linear Regression Analysis (Midwest Example) COEFFICIENT OF DETERMINATION (R2) 69.31% of the variation in the sales data for this sample can be explained by the linear relationship between sales and years of experience.

Simple Linear Regression Analysis COEFFICIENT OF DETERMINATION SINGLE INDEPENDENT VARIABLE CASE where: R2 = Coefficient of determination r = Simple correlation coefficient

Simple Linear Regression Analysis STANDARD DEVIATION OF THE REGRESSION SLOPE COEFFICIENT (POPULATION) where: = Standard deviation of the regression slope (Called the standard error of the slope) = Population standard error of the estimate

Simple Linear Regression Analysis ESTIMATOR FOR THE STANDARD ERROR OF THE ESTIMATE where: SSE = Sum of squares error n = Sample size k = number of independent variables in the model

Simple Linear Regression Analysis ESTIMATOR FOR THE STANDARD DEVIATION OF THE REGRESSION SLOPE where: = Estimate of the standard error of the least squares slope = Sample standard error of the estimate

Simple Linear Regression Analysis TEST STATISTIC FOR TEST OF SIGNIFICANCE OF THE REGRESSION SLOPE where: b1 = Sample regression slope coefficient 1 = Hypothesized slope sb1 = Estimator of the standard error of the slope

Significance Test of Regression Slope (Example 11-5) Rejection Region  /2 = 0.025 Rejection Region  /2 = 0.025 Since t=4.753 > 2.048, reject H0: conclude that the true slope is not zero

Simple Linear Regression Analysis MEAN SQUARE REGRESSION where: SSR = Sum of squares regression k = Number of independent variables in the model

Simple Linear Regression Analysis MEAN SQUARE ERROR where: SSE = Sum of squares error n = Sample size k = Number of independent variables in the model

Significance Test (Example 11-6) Rejection Region  = 0.05 Since F= 22.59 > 4.96, reject H0: conclude that the regression model explains a significant amount of the variation in the dependent variable

Simple Regression Steps Develop a scatter plot of y and x. You are looking for a linear relationship between the two variables. Calculate the least squares regression line for the sample data. Calculate the correlation coefficient and the simple coefficient of determination, R2. Conduct one of the significance tests.

Simple Linear Regression Analysis CONFIDENCE INTERVAL ESTIMATE FOR THE REGRESSION SLOPE or equivalently: where: sb1 = Standard error of the regression slope coefficient s = Standard error of the estimate

Simple Linear Regression Analysis CONFIDENCE INTERVAL FOR where: = Point estimate of the dependent variable t = Critical value with n - 2 d.f. s = Standard error of the estimate n = Sample size xp = Specific value of the independent variable = Mean of independent variable observations

Simple Linear Regression Analysis PREDICTION INTERVAL FOR

Residual Analysis Before using a regression model for description or prediction, you should do a check to see if the assumptions concerning the normal distribution and constant variance of the error terms have been satisfied. One way to do this is through the use of residual plots.

Key Terms Coefficient of Determination Correlation Coefficient Dependent Variable Independent Variable Least Squares Criterion Regression Coefficients Regression Slope Coefficient Residual Scatter Plot Simple Linear Regression Analysis Spurious Correlation