Download presentation
Presentation is loading. Please wait.
Published byCory Armstrong Modified over 9 years ago
1
Business Statistics, Can. ed. By Black, Chakrapani & Castillo
Chapter 14 Building Multiple Regression Models Prepared by Dr. Clarence S. Bayne JMSB, Concordia University Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
2
Learning Objectives Analyze and interpret nonlinear variables in multiple regression analysis. Understanding the role of qualitative variables and how to use them in multiple regression analysis. How to build and evaluate multiple regression models. What is multicollinearity and how to deal with it Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 2
3
Mathematical Transformations: Recoding Independent Variables to Create Non-linear Models
Description of Models Equations First-order model with Two Independent Variables Second-order Model with One Independent variable Second-order Model with an Interaction Term Second-order with Two Independent Variables Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
4
A Curvilinear Scatter Plot of Sales Data for 13 Manufacturing Companies
50 100 150 200 250 300 350 400 450 500 2 4 6 8 10 12 Number of Representatives Sales Manufacturer ($1,000,000) Number of Manufacturing Representatives 1 2.1 3.6 3 6.2 10.4 5 22.8 35.6 7 57.1 83.5 9 109.4 128.6 11 196.8 280.0 13 462.3 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 28
5
Excel Simple Linear Regression Output for the Manufacturing Example
Regression Statistics Multiple R 0.933 R Square 0.870 Adjusted R Square 0.858 Standard Error 51.10 Observations 13 Coefficients Standard Error t Stat P-value Intercept 28.737 -3.72 0.003 numbers 41.026 4.779 8.58 0.000 ANOVA df SS MS F Significance F Regression 1 192395 73.69 0.000 Residual 11 28721 2611 Total 12 221117 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 29
6
Second Order Model with one Independent Variable: Manufacturing Sales Data: Table 14.2
Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 30
7
Scatter Plots Showing Original Curvilinear With More Linear Transformed Data: Figure 14.2
Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
8
Computer Output for Quadratic Model to Predict Sales
Regression Statistics Multiple R 0.986 R Square 0.973 Adjusted R Square 0.967 Standard Error 24.593 Observations 13 Coefficients Standard Error t Stat P-value Intercept 18.067 24.673 0.73 0.481 MfgrRp 9.5450 - 1.65 0.131 MfgrRpSq 4.750 0.776 6.12 0.000 ANOVA df SS MS F Significance F Regression 2 215069 107534 177.79 0.000 Residual 10 6048 605 Total 12 221117 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 32
9
Tukey’s Ladder of Transformation The Four Quadrant Approach
Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
10
Regression Models With Interactions
Often in the real world of business and economics interaction occurs between two variables One variable acts differently over a range of values for the second variable than it does over another range of values for the second variable In a manufacturing plant humidity might affect the hardness of material differently at differently at different temperatures The ANOVA model in Chapter 11 addressed this problem by using an interaction variable as a blocking variable In regression analysis, interaction can be examined as a separate independent variable This is illustrated by using the second-order model design with two independent variables and an interaction term. Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
11
Table 14.3 Share Prices of Three Stocks over a 15-Month Period
41 36 35 39 38 32 45 51 52 43 55 47 57 49 58 54 62 65 70 77 72 75 74 33 83 81 28 101 92 31 107 91 Problem Definition: The data represent the closing prices for three corporations over a 15 months period. An investment firm wants to use the prices for stocks 2 and 3 to develop a regression model to predict the price of stock 1 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
12
Develop Model Using Step by Step Approach and Explore for Interaction
First-order with Two Independent Variables Second-order with an Interaction Term Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 35
13
Initial Regression First-order Model with Two Independent Variables
The regression equation is Stock 1 = Stock Stock 3 Predictor Coef StDev T P Constant Stock Stock S = R-Sq = 47.2% R-Sq(adj) = 38.4% Analysis of Variance Source DF SS MS F P Regression Error Total Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
14
Excel Regression Second-order Model with Interaction Term for the Three Stocks
The regression equation is Stock 1 = Stock Stock 3 – Inter Predictor Coef StDev T P Constant Stock Stock Inter S = R-Sq = 80.4% R-Sq(adj) = 75.1% Analysis of Variance Source DF SS MS F P Regression Error Total Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
15
Response Surface for the Stock Example- Without and With Interaction
Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
16
Summary Regression Statistics for Share Prices of Three Stocks
Regression Statistics from Two Excel Output Summaries With and Without Interaction Summary Regression Statistics for Share Prices of Three Stocks Summary Output : With No Interaction Summary Output With Interaction Multiple R R Square Adjusted R Square Standard Error Observations 15 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
17
Analysis and Conclusions
By using the interaction term the coefficient of determination( R2) increases from 0.47 to 0.80 The Standard error decreases from in the first model down to in the second. The t ratios for the X1 term and the interaction term are statistically significant in the second model T = 3.36 with a p value of for X1 and t= with a probability of for X1X2 . Inclusion of X1X2 helped the model account for a substantially greater amount of the dependent variable. It is a significant contributor to the model The second graph in figure 14.6 shows how the interaction term bends the curve to fit the data as stock 2 is increased Be cautious in interpreting the accuracy of the partial coefficients because of the high likelihood of multicollinearity Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
18
Model-Building: Search Procedures
Search procedure are processes whereby more than one multiple regression model is developed for a given database, and the models are compared and sorted by different criteria, depending on the given procedure There are many search procedures. Among the most widely known are All Possible Regressions Stepwise Regression Forward Selection Backward Elimination Which approach is best is subject to much debate and depends on the disciplines and the philosophy of enquiry that the researcher brings to the research. Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 43
19
All Possible Regressions
All possible regressions search procedure computes all possible linear multiple regression models from the data using all variables If a data set contains k independent variables all possible regressions will determine 2k – 1 different models This produces all possible different models with single predictors; two predictors; three predictors up to all k predictors The next slide show predictors for all possible regressions for five independent variables If a research methodology and study design exist that identifies all essential variables, the procedure enables the business researcher to examine every model Warning. This search through all possible models can be tedious, time consuming, inefficient, and perhaps overwhelming Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
20
All Possible Regressions with Five Independent Variables
Four Predictors X 1 ,X 2 3 4 5 Single Predictor Two Three Five Predictors Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 58
21
Stepwise Regression Stepwise regression is a step-by-step process that begins by developing a regression model with a single predictor variable and adds and deletes predictors one step at a time It allows the researcher to examine the fit of the model at each step until no more significant predictors remain outside the model This starts by choosing the single predictor regression with the highest t or F value and which is significant at some predetermined Alpha value. If none of the independent variables meet this criteria, no model is recommended. Incrementally other variables are added to the equation and tested for the significance of their contribution to explaining Total variation relative to other variable, then test for the significance. This procedure continues until all significant predictor are included Stepwise regression allows checks for multicollinearity and the dropping of variables that were included in earlier stages Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
22
Forward Selection Like stepwise, except that variables are not reevaluated after entering the model Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
23
Backward Elimination Start with the “full model” (all k predictors)
If all predictors are significant, stop Otherwise, eliminate the most nonsignificant predictor; and return to previous step Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
24
Stepwise Regression Perform k simple regressions; and select the best as the initial model Evaluate each variable not in the model If none meet the criterion, stop Add the best variable to the model; reevaluate previous variables, and drop any which are not significant Return to previous step The criteria for inclusion and exclusion of variables may be of a technical nature; common sense observational nature; based on a body of theory; the usefulness of the discovery of new relationships as insights to meaning The researcher has to be keenly aware of the problem of spurious relationships when using these search procedures Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
25
Choosing the Variables for a Stepwise Regression Predicting World Crude Oil Production Example
Problem Definition: Predicting world crude oil production Choice of a method: many different types of prediction models can be constructed. the researcher adopts an econometric approach using multiple regression After a preliminary survey of the industry and the factors surrounding it, the researcher realizes that much of the world crude oil market is driven by variables related to the usage and production in the USA The researcher identifies five independent variables as predictors: 1.U.S. energy consumption 2. Gross U.S. nuclear electricity generation 3.U.S. Coal production 4.Total U.S. dry gas (natural gas) production 5. Fuel rate of U.S. owned automobiles Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
26
Systematic Framework Underlying Data Collection
A survey of published and other data on energy production and usage suggest that world production of crude oil is driven by previous years activities in the U.S. Expected that as energy consumption of the U.S. increased, so would world production of crude oil It seemed reasonable to introduce nuclear electricity generation, coal production, dry gas production and fuel rates to the study Rationale: their increase output may be expected to have a negative effect on crude oil production if energy consumption remained fixed. Data on five independent variables and the dependent variable (world crude oil production) was gathered and is presented on the next slide Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
27
Definition and Measurement of Variables: Data for Multiple Regression Model to Predict World Crude Oil Production Y X1 X2 X3 X4 X5 55.7 74.3 83.5 598.6 21.7 13.30 72.5 114.0 610.0 20.7 13.42 52.8 70.5 172.5 654.6 19.2 13.52 57.3 74.4 191.1 684.9 19.1 13.53 59.7 76.3 250.9 697.2 13.80 60.2 78.1 276.4 670.2 14.04 62.7 78.9 255.2 781.1 19.7 14.41 59.6 76.0 251.1 829.7 19.4 15.46 56.1 74.0 272.7 823.8 15.94 53.5 70.8 282.8 838.1 17.8 16.65 53.3 293.7 782.1 16.1 17.14 54.5 74.1 327.6 895.9 17.5 17.83 54.0 383.7 883.6 16.5 18.20 56.2 414.0 890.3 18.27 56.7 76.9 455.3 918.8 16.6 19.20 58.7 80.2 527.0 950.3 17.1 19.87 59.9 81.3 529.4 980.7 17.3 20.31 60.6 576.9 1029.1 21.02 81.1 612.6 996.0 17.7 21.69 82.1 618.8 997.5 21.68 83.9 610.3 945.4 18.2 21.04 60.9 85.6 640.4 1033.5 18.9 21.48 Y World Crude Oil Production (millions of barrels per Day) X1 U.S. Energy Consumption (quadrillion BTUs generation per year) X2 U.S. Nuclear Generation (billion kilowatts-hours) X3 U.S. Coal Production (million short-tons) X4 U.S. Dry Gas Production (trillion cubic feet) X5 U.S. Fuel Rate for Autos (miles per gallon) Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
28
Step 1: Stepwise Regression Results with One Predictor
The results of simple regression using each independent variable to predict oil production produces the initial regression equation y = x1 where y is world crude oil production and x1 is U.S. Energy consumption. Note the t value (11.77) in Table 14.8 is the highest of all variables tried, an R-squared is 85.2% Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
29
Excel Output of Regression for Crude Oil Production
Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
30
Step 2: Stepwise Regression Results with Two Predictors
X2 is retained initially in the model and a search is conducted to determine which of the other models together with it produces the highest significant t value( add most to explaining variation in Y). The new equation emerging from computer calculation is y = x1 – 0.517x2 . X2 is U.S. fuel rate. It has a t value of and an r-squared of Both very significant. Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
31
Step 3: Regression Results with Three Predictors
Step 3 continues the search for additional predictor variables Table shows that any other values added make no significant contributions to the regression obtained at step 2. The t values are very small. Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
32
Minitab Stepwise Output
Stepwise Regression F-to-Enter: F-to-Remove: Response is Coiler on 5 predictors, with N = 26 Step Constant Seconds T-Value P-value Fuel Rate T-Value P-value S R-Sq Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 53
33
Key Concerns The search procedures provide a framework for an analysis and must be applied subject to commonsense and an explanatory theory or analysis Avoid the mistake of using the strict sequential order in which variables come into a computer print out ( on stepwise and forward selection) to rank the importance of the variable In multiple regression (unlike simple regression) the importance of an independent variable is ranked in terms of its net contribution to explaining Y when used with other variables; not in terms of its individual correlation with y Problems of multicollinearity require transformation or omission of variable(s) before or as analysis proceeds . Adding a variable that is highly correlated with other independent variables is very problematic. It distorts the value of coefficients and renders all tests unreliable. An increase in R-squared is not in and of itself a good indicator of the importance of the last variable added. Common sense and use value is the final arbiter in choosing the final model Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
34
Multicollinearity Condition that occurs when two or more of the independent variables of a multiple regression model are highly correlated Effect of Multicollinearity Difficult, if not impossible, to interpret the estimates of the regression coefficients Inordinately small t values for the regression coefficients Standard deviations of regression coefficients are overestimated: t-tests and F test may have no meaning Algebraic sign of predictor variable’s coefficient opposite of what expected In practice correlations as high as 60 to 70 percent may be tolerated without causing a serious problem of multicollinearity Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd. 59
35
Testing for Multicollinearity
Two techniques for determining the possible existence of Multicollinearity Prepare a correlation matrix of the independent variables using an Excel or other software program and identify those pairs of variables that have correlations in excess of 0.70 The Variance Inflation factor (VIF): conduct a regression analysis to predict one independent variable by the other. Thus the independent variable being predicted becomes the dependent variable. This is done for all possible different pairs and R-squares (Coefficients of determination) for each calculated. is the measure that determines whether the standard errors of the estimates are inflated. Some researchers follow a guideline that for VIF greater than 10 or an R2 greater than for the largest VIFs indicates a severe multicollinearity problem Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
36
Correlations among Oil Production Predictor Variables
Energy Consumption Nuclear Coal Dry Gas Fuel Rate 1 0.856 0.791 0.057 0.952 -0.404 0.972 -0.448 0.968 - 0.796 -0.423 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
37
Problem of Interpretation When Multicollinearity Exists: World Crude Oil Production Regression
The algebraic signs in a regression model must conform to common sense observation or established theory Note the following three equations considered at different stages f the stepwise regression analysis Ŷ = (fuel rate). The positive fuel rate coefficient can be interpreted in terms of economic theory: price substitution effect. Ŷ = (coal). The positive coal coefficient is explainable in a complementary sense. Ŷ = (coal) – (fuel rate). The negative fuel rate coefficient is opposite to that in equation 1 and is contrary to what by normally expected in economic theory or common sense observation The reason for the apparent contradiction in equation 3 can be attributed to multicollinearity: R2 = or VIF =31 Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
38
Copyright Notice Copyright © 2010 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information herein. Business Statistics, Can. Ed. © 2010 John Wiley & Sons Canada, Ltd.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.