Download presentation
Presentation is loading. Please wait.
Published byLily Blankenship Modified over 8 years ago
1
Chapter 4 Linear Regression 1
2
Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after considering the relationship between advertising expenditures and sales, a marketing manager might attempt to predict sales for a given level of advertising expenditures. Sometimes a manager will rely on intuition to judge how two variables are related. However, if data can be obtained, a statistical procedure called regression analysis can be used to develop an equation showing how the variables are related. 2
3
Introduction Dependent variable or response: Variable being predicted Independent variables or predictor variables: Variables being used to predict the value of the dependent variable. Simple regression: A regression analysis involving one independent variable and one dependent variable. In statistical notation: y = dependent variable x = independent variable 3
4
Introduction Linear regression: A regression analysis for which any one unit change in the independent variable, x, is assumed to result in the same change in the dependent variable, y. Multiple regression: Regression analysis involving two or more independent variables. 4
5
The Simple Linear Regression Model 5
6
Regression Model: The equation that describes how y is related to x and an error term. Simple Linear Regression Model: y = β 0 + β 1 x + ε Parameters: The characteristics of the population, β 0 and β 1 Random variable - Error term, ε The error term accounts for the variability in y that cannot be explained by the linear relationship between x and y. 6
7
The Simple Linear Regression Model Regression equation: The equation that describes how the expected value of y, denoted E(y), is related to x. Regression equation for simple linear regression: E(y|x) = β 0 + β 1 x E ( y|x ) = expected value of y for a given value of x β 0 = y -intercept of the regression line β 1 = slope The graph of the simple linear regression equation is a straight line. 7
8
Figure 4.1 - Possible Regression Lines in Simple Linear Regression 8
9
The Simple Linear Regression Model The parameter values are usually not known and must be estimated using sample data. Sample statistics (denoted b 0 and b 1 ) are computed as estimates of the population parameters β 0 and β 1. Estimated regression equation: The equation obtained by substituting the values of the sample statistics b 0 and b 1 for β 0 and β 1 in the regression equation. 9
10
The Simple Linear Regression Model 10
11
Figure 4.2 - The Estimation Process in Simple Linear Regression 11
12
Least Squares Method 12
13
Least Squares Method Least squares method: A procedure for using sample data to find the estimated regression equation. Here, we will determine the values of b 0 and b 1. Interpretation of b 0 and b 1 : The slope b 1 is the estimated change in the mean of the dependent variable y that is associated with a one unit increase in the independent variable x. The y -intercept b 0 is the estimated value of the dependent variable y when the independent variable x is equal to 0. 13
14
Table 4.1 - Miles Traveled and Travel Time (in Hours) for Ten Butler Trucking Company Driving Assignments 14
15
Figure 4.3 - Scatter Chart of Miles Traveled and Travel Time in Hours for Sample of Ten Butler Trucking Company Driving Assignments 15
16
Least Squares Method 16
17
Least Squares Method 17
18
Least Squares Method Least squares estimates of the regression parameters using Excel: 18
19
Least Squares Method 19
20
Least Squares Method Interpretation of b 1 - If the length of a driving assignment were 1 unit (1 mile) longer, the mean travel time for that driving assignment would be 0.0678 units (0.0678 hours, or approximately 4 minutes) longer. Interpretation of b 0 - If the driving distance for a driving assignment was 0 units (0 miles), the mean travel time would be 1.2739 units (1.2739 hours, or approximately 76 minutes). 20
21
Least Squares Method Experimental region: The range of values of the independent variables in the data used to estimate the model. The regression model is valid only over this region. Extrapolation: Prediction of the value of the dependent variable outside the experimental region. It is risky. 21
22
Least Squares Method 22
23
Table 4.2 - Predicted Travel Time in Hours and the Residuals for Ten Butler Trucking Company Driving Assignments 23
24
Figure 4.4 - Scatter Chart of Miles Traveled and Travel Time in Hours for Butler Trucking Company Driving Assignments with Regression Line Superimposed 24
25
Figure 4.5 - A Geometric Interpretation of the Least Squares Method Applied to the Butler Trucking Company Example 25
26
Assessing the Fit of the Simple Linear Regression Model 26
27
Assessing the Fit of the Simple Linear Regression Model 27
28
28
29
Table 4.3 - Calculations for the Sum of Squares Total for the Butler Trucking Simple Linear Regression 29
30
Assessing the Fit of the Simple Linear Regression Model 30
31
Assessing the Fit of the Simple Linear Regression Model 31
32
Figure 4.9 - Excel Spreadsheet with Original Data, Scatter Chart, Estimated Regression Line, Estimated Regression Equation, and Coefficient of Determination r 2 for Butler Trucking Company 32
33
The Multiple Regression Model 33
34
The Multiple Regression Model Regression model and regression equation Multiple regression model y = β 0 + β 1 x 1 + β 2 x 2 + ∙ ∙ ∙ + β q x q + ε y = dependent variable x 1, x 2,..., x q = independent variables β 0, β 1, β 2,..., β q = parameters ε = error term (accounts for the variability in y that cannot be explained by the linear effect of the q independent variables.) 34
35
The Multiple Regression Model Interpretation of slope coefficient β j : Represents the change in the mean value of the dependent variable y that corresponds to a one unit increase in the independent variable x j, holding the values of all other independent variables in the model constant. The multiple regression equation that describes how the mean value of y is related to x 1, x 2,..., x q : E ( y | x 1, x 2,..., x q ) = β 0 + β 1 x 1 + β 2 x 2 + ∙ ∙ ∙ + β q x q 35
36
The Multiple Regression Model 36
37
Figure 4.10 - The Estimation Process for Multiple Regression 37
38
The Multiple Regression Model 38
39
The Multiple Regression Model 39
40
Figure 4.11 - Data Analysis Tools Box Illustration: Butler Trucking Company and multiple regression (Contd.) Using Excel’s regression tool to develop the estimated multiple regression equation: 40
41
Figure 4.12 - Regression Dialog Box 41
42
Figure 4.13 - Excel Spreadsheet with Results for the Butler Trucking Company Multiple Regression with Miles and Deliveries as Independent Variables 42
43
Figure 4.14 - Graph of the Regression Equation for Multiple Regression Analysis with Two Independent Variables 43
44
Inference and Regression 44
45
Inference and Regression 45
46
Inference and Regression Conditions necessary for valid inference in the least squares regression model: For any given combination of values of the independent variables x 1, x 2,..., x q, the population of potential error terms ε is normally distributed with a mean of 0 and a constant variance. The values of ε are statistically independent. 46
47
Figure 4.15 - Illustration of the Conditions for Valid Inference in Regression 47
48
Figure 4.16 - Example of an Ideal Scatter Chart of Residuals and Predicted Values of the Dependent variable 48
49
Figure 4.17 - Examples of Diagnostic Scatter charts of Residuals from Four Regressions 49
50
Figure 4.18 - Excel Residual Plots for the Butler Trucking Company Multiple Regression 50
51
Inference and Regression Testing for an overall regression relationship: Use an F test based on the F probability distribution. If the F test leads us to reject the hypothesis that the values of β 1, β 2,..., β q are all zero: Conclude that there is an overall regression relationship. Otherwise, conclude that there is no overall regression relationship. 51
52
Inference and Regression 52
53
Inference and Regression Testing individual regression parameters: To determine whether statistically significant relationships exist between the dependent variable y and each of the independent variables x 1, x 2,..., x q individually. If a β j = 0, there is no linear relationship between the dependent variable y and the independent variable x j. If a β j ≠ 0, there is a linear relationship between y and x j. 53
54
Inference and Regression 54
55
Inference and Regression Testing individual regression parameters (Contd.) Confidence interval can be used to test whether each of the regression parameters β 0, β 1, β 2,..., β q is equal to zero. Confidence interval: An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence. Confidence level: Indicates how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating. 55
56
Inference and Regression Addressing nonsignificant independent variables: If practical experience dictates that the nonsignificant independent variable has a relationship with the dependent variable, the independent variable should be left in the model. If the model sufficiently explains the dependent variable without the nonsignificant independent variable, then consider rerunning the regression without the nonsignificant independent variable. The appropriate treatment of the inclusion or exclusion of the y - intercept when b 0 is not statistically significant may require special consideration. 56
57
Inference and Regression Multicollinearity: Correlation among the independent variables in multiple regression analysis. In t tests for the significance of individual parameters, the difficulty caused by multicollinearity is that it is possible to conclude that a parameter associated with one of the multicollinear independent variables is not significantly different from zero when the independent variable actually has a strong relationship with the dependent variable. This problem is avoided when there is little correlation among the independent variables. 57
58
Inference and Regression Inference and very large samples: Because virtually all relationships between independent variables and the dependent variable will be statistically significant: If the sample size is sufficiently large, inference can no longer be used to discriminate between meaningful and specious relationships. This is because the variability in potential values of an estimator b j of a regression parameter β j depends on two factors: (1)How closely the members of the population adhere to the relationship between x j and y that is implied by β j. (2)The size of the sample on which the value of the estimator b j is based 58
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.