Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Similar presentations


Presentation on theme: "Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after."— Presentation transcript:

1 Chapter 4 Linear Regression 1

2 Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after considering the relationship between advertising expenditures and sales, a marketing manager might attempt to predict sales for a given level of advertising expenditures. Sometimes a manager will rely on intuition to judge how two variables are related. However, if data can be obtained, a statistical procedure called regression analysis can be used to develop an equation showing how the variables are related. 2

3 Introduction Dependent variable or response: Variable being predicted Independent variables or predictor variables: Variables being used to predict the value of the dependent variable. Simple regression: A regression analysis involving one independent variable and one dependent variable. In statistical notation: y = dependent variable x = independent variable 3

4 Introduction Linear regression: A regression analysis for which any one unit change in the independent variable, x, is assumed to result in the same change in the dependent variable, y. Multiple regression: Regression analysis involving two or more independent variables. 4

5 The Simple Linear Regression Model 5

6 Regression Model: The equation that describes how y is related to x and an error term. Simple Linear Regression Model: y = β 0 + β 1 x + ε Parameters: The characteristics of the population, β 0 and β 1 Random variable - Error term, ε The error term accounts for the variability in y that cannot be explained by the linear relationship between x and y. 6

7 The Simple Linear Regression Model Regression equation: The equation that describes how the expected value of y, denoted E(y), is related to x. Regression equation for simple linear regression: E(y|x) = β 0 + β 1 x E ( y|x ) = expected value of y for a given value of x β 0 = y -intercept of the regression line β 1 = slope The graph of the simple linear regression equation is a straight line. 7

8 Figure 4.1 - Possible Regression Lines in Simple Linear Regression 8

9 The Simple Linear Regression Model The parameter values are usually not known and must be estimated using sample data. Sample statistics (denoted b 0 and b 1 ) are computed as estimates of the population parameters β 0 and β 1. Estimated regression equation: The equation obtained by substituting the values of the sample statistics b 0 and b 1 for β 0 and β 1 in the regression equation. 9

10 The Simple Linear Regression Model 10

11 Figure 4.2 - The Estimation Process in Simple Linear Regression 11

12 Least Squares Method 12

13 Least Squares Method Least squares method: A procedure for using sample data to find the estimated regression equation. Here, we will determine the values of b 0 and b 1. Interpretation of b 0 and b 1 : The slope b 1 is the estimated change in the mean of the dependent variable y that is associated with a one unit increase in the independent variable x. The y -intercept b 0 is the estimated value of the dependent variable y when the independent variable x is equal to 0. 13

14 Table 4.1 - Miles Traveled and Travel Time (in Hours) for Ten Butler Trucking Company Driving Assignments 14

15 Figure 4.3 - Scatter Chart of Miles Traveled and Travel Time in Hours for Sample of Ten Butler Trucking Company Driving Assignments 15

16 Least Squares Method 16

17 Least Squares Method 17

18 Least Squares Method Least squares estimates of the regression parameters using Excel: 18

19 Least Squares Method 19

20 Least Squares Method Interpretation of b 1 - If the length of a driving assignment were 1 unit (1 mile) longer, the mean travel time for that driving assignment would be 0.0678 units (0.0678 hours, or approximately 4 minutes) longer. Interpretation of b 0 - If the driving distance for a driving assignment was 0 units (0 miles), the mean travel time would be 1.2739 units (1.2739 hours, or approximately 76 minutes). 20

21 Least Squares Method Experimental region: The range of values of the independent variables in the data used to estimate the model. The regression model is valid only over this region. Extrapolation: Prediction of the value of the dependent variable outside the experimental region. It is risky. 21

22 Least Squares Method 22

23 Table 4.2 - Predicted Travel Time in Hours and the Residuals for Ten Butler Trucking Company Driving Assignments 23

24 Figure 4.4 - Scatter Chart of Miles Traveled and Travel Time in Hours for Butler Trucking Company Driving Assignments with Regression Line Superimposed 24

25 Figure 4.5 - A Geometric Interpretation of the Least Squares Method Applied to the Butler Trucking Company Example 25

26 Assessing the Fit of the Simple Linear Regression Model 26

27 Assessing the Fit of the Simple Linear Regression Model 27

28 28

29 Table 4.3 - Calculations for the Sum of Squares Total for the Butler Trucking Simple Linear Regression 29

30 Assessing the Fit of the Simple Linear Regression Model 30

31 Assessing the Fit of the Simple Linear Regression Model 31

32 Figure 4.9 - Excel Spreadsheet with Original Data, Scatter Chart, Estimated Regression Line, Estimated Regression Equation, and Coefficient of Determination r 2 for Butler Trucking Company 32

33 The Multiple Regression Model 33

34 The Multiple Regression Model Regression model and regression equation Multiple regression model y = β 0 + β 1 x 1 + β 2 x 2 + ∙ ∙ ∙ + β q x q + ε y = dependent variable x 1, x 2,..., x q = independent variables β 0, β 1, β 2,..., β q = parameters ε = error term (accounts for the variability in y that cannot be explained by the linear effect of the q independent variables.) 34

35 The Multiple Regression Model Interpretation of slope coefficient β j : Represents the change in the mean value of the dependent variable y that corresponds to a one unit increase in the independent variable x j, holding the values of all other independent variables in the model constant. The multiple regression equation that describes how the mean value of y is related to x 1, x 2,..., x q : E ( y | x 1, x 2,..., x q ) = β 0 + β 1 x 1 + β 2 x 2 + ∙ ∙ ∙ + β q x q 35

36 The Multiple Regression Model 36

37 Figure 4.10 - The Estimation Process for Multiple Regression 37

38 The Multiple Regression Model 38

39 The Multiple Regression Model 39

40 Figure 4.11 - Data Analysis Tools Box Illustration: Butler Trucking Company and multiple regression (Contd.) Using Excel’s regression tool to develop the estimated multiple regression equation: 40

41 Figure 4.12 - Regression Dialog Box 41

42 Figure 4.13 - Excel Spreadsheet with Results for the Butler Trucking Company Multiple Regression with Miles and Deliveries as Independent Variables 42

43 Figure 4.14 - Graph of the Regression Equation for Multiple Regression Analysis with Two Independent Variables 43

44 Inference and Regression 44

45 Inference and Regression 45

46 Inference and Regression Conditions necessary for valid inference in the least squares regression model: For any given combination of values of the independent variables x 1, x 2,..., x q, the population of potential error terms ε is normally distributed with a mean of 0 and a constant variance. The values of ε are statistically independent. 46

47 Figure 4.15 - Illustration of the Conditions for Valid Inference in Regression 47

48 Figure 4.16 - Example of an Ideal Scatter Chart of Residuals and Predicted Values of the Dependent variable 48

49 Figure 4.17 - Examples of Diagnostic Scatter charts of Residuals from Four Regressions 49

50 Figure 4.18 - Excel Residual Plots for the Butler Trucking Company Multiple Regression 50

51 Inference and Regression Testing for an overall regression relationship: Use an F test based on the F probability distribution. If the F test leads us to reject the hypothesis that the values of β 1, β 2,..., β q are all zero: Conclude that there is an overall regression relationship. Otherwise, conclude that there is no overall regression relationship. 51

52 Inference and Regression 52

53 Inference and Regression Testing individual regression parameters: To determine whether statistically significant relationships exist between the dependent variable y and each of the independent variables x 1, x 2,..., x q individually. If a β j = 0, there is no linear relationship between the dependent variable y and the independent variable x j. If a β j ≠ 0, there is a linear relationship between y and x j. 53

54 Inference and Regression 54

55 Inference and Regression Testing individual regression parameters (Contd.) Confidence interval can be used to test whether each of the regression parameters β 0, β 1, β 2,..., β q is equal to zero. Confidence interval: An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence. Confidence level: Indicates how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating. 55

56 Inference and Regression Addressing nonsignificant independent variables: If practical experience dictates that the nonsignificant independent variable has a relationship with the dependent variable, the independent variable should be left in the model. If the model sufficiently explains the dependent variable without the nonsignificant independent variable, then consider rerunning the regression without the nonsignificant independent variable. The appropriate treatment of the inclusion or exclusion of the y - intercept when b 0 is not statistically significant may require special consideration. 56

57 Inference and Regression Multicollinearity: Correlation among the independent variables in multiple regression analysis. In t tests for the significance of individual parameters, the difficulty caused by multicollinearity is that it is possible to conclude that a parameter associated with one of the multicollinear independent variables is not significantly different from zero when the independent variable actually has a strong relationship with the dependent variable. This problem is avoided when there is little correlation among the independent variables. 57

58 Inference and Regression Inference and very large samples: Because virtually all relationships between independent variables and the dependent variable will be statistically significant: If the sample size is sufficiently large, inference can no longer be used to discriminate between meaningful and specious relationships. This is because the variability in potential values of an estimator b j of a regression parameter β j depends on two factors: (1)How closely the members of the population adhere to the relationship between x j and y that is implied by β j. (2)The size of the sample on which the value of the estimator b j is based 58


Download ppt "Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after."

Similar presentations


Ads by Google