Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 13 Additional Topics in Regression Analysis

Similar presentations


Presentation on theme: "Chapter 13 Additional Topics in Regression Analysis"— Presentation transcript:

1 Chapter 13 Additional Topics in Regression Analysis
Statistics for Business and Economics 8th Edition Chapter 13 Additional Topics in Regression Analysis Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

2 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chapter Goals After completing this chapter, you should be able to: Explain regression model-building methodology Apply dummy variables for categorical variables with more than two categories Explain how dummy variables can be used in experimental design models Incorporate lagged values of the dependent variable is regressors Describe specification bias and multicollinearity Examine residuals for heteroscedasticity and autocorrelation Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

3 Model-Building Methodology
13.1 The Stages of Statistical Model Building Model Specification * Understand the problem to be studied Select dependent and independent variables Identify model form (linear, quadratic…) Determine required data for the study Coefficient Estimation Model Verification Interpretation and Inference Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

4 The Stages of Model Building
(continued) Model Specification Estimate the regression coefficients using the available data Form confidence intervals for the regression coefficients For prediction, goal is the smallest se If estimating individual slope coefficients, examine model for multicollinearity and specification bias Coefficient Estimation * Model Verification Interpretation and Inference Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

5 The Stages of Model Building
(continued) Model Specification Logically evaluate regression results in light of the model (i.e., are coefficient signs correct?) Are any coefficients biased or illogical? Evaluate regression assumptions (i.e., are residuals random and independent?) If any problems are suspected, return to model specification and adjust the model Coefficient Estimation Model Verification * Interpretation and Inference Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

6 The Stages of Model Building
(continued) Model Specification Coefficient Estimation Interpret the regression results in the setting and units of your study Form confidence intervals or test hypotheses about regression coefficients Use the model for forecasting or prediction Model Verification Interpretation and Inference * Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

7 Multiple Regression Analysis Application Procedure
12.9 Multiple Regression Analysis Application Procedure Errors (residuals) from the regression model: ei = (yi – yi) < Assumptions: The errors are normally distributed Errors have a constant variance The model errors are independent Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 12-7

8 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Analysis of Residuals These residual plots are used in multiple regression: Residuals vs. yi Residuals vs. x1i Residuals vs. x2i Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 12-8

9 Dummy Variables and Experimental Design
13.2 Dummy variables can be used in situations in which the categorical variable of interest has more than two categories Dummy variables can also be useful in experimental design Experimental design is used to identify possible causes of variation in the value of the dependent variable Y outcomes are measured at specific combinations of levels for treatment and blocking variables The goal is to determine how the different treatments influence the Y outcome Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

10 Dummy Variable Models (More than 2 Levels)
Consider a categorical variable with K levels The number of dummy variables needed is one less than the number of levels, K – 1 Example: y = house price ; x1 = square feet If style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

11 Dummy Variable Models (More than 2 Levels)
(continued) Example: Let “condo” be the default category, and let x2 and x3 be used for the other two categories: y = house price x1 = square feet x2 = 1 if ranch, 0 otherwise x3 = 1 if split level, 0 otherwise The multiple regression equation is: Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

12 Interpreting the Dummy Variable Coefficients (with 3 Levels)
Consider the regression equation: For a condo: x2 = x3 = 0 With the same square feet, a ranch will have an estimated average price of thousand dollars more than a condo For a ranch: x2 = 1; x3 = 0 With the same square feet, a split-level will have an estimated average price of thousand dollars more than a condo. For a split level: x2 = 0; x3 = 1 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

13 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Experimental Design Consider an experiment in which four treatments will be used, and the outcome also depends on three environmental factors that cannot be controlled by the experimenter Let variable z1 denote the treatment, where z1 = 1, 2, 3, or 4. Let z2 denote the environment factor (the “blocking variable”), where z2 = 1, 2, or 3 To model the four treatments, three dummy variables are needed To model the three environmental factors, two dummy variables are needed Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

14 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Experimental Design (continued) Define five dummy variables, x1, x2, x3, x4, and x5 Let treatment level 1 be the default (z1 = 1) Define x1 = 1 if z1 = 2, x1 = 0 otherwise Define x2 = 1 if z1 = 3, x2 = 0 otherwise Define x3 = 1 if z1 = 4, x3 = 0 otherwise Let environment level 1 be the default (z2 = 1) Define x4 = 1 if z2 = 2, x4 = 0 otherwise Define x5 = 1 if z2 = 3, x5 = 0 otherwise Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

15 Experimental Design: Dummy Variable Tables
The dummy variable values can be summarized in a table: Z1 X1 X2 X3 1 2 3 4 Z2 X4 X5 1 2 3 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

16 Experimental Design Model
The experimental design model can be estimated using the equation The estimated value for β2 , for example, shows the amount by which the y value for treatment 3 exceeds the value for treatment 1 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

17 demand for pizza1 three other brands B2, B3 B4

18

19

20 Exercise 13.3 predict cost per unit produced
as a function of factory type classic tech. , computer controled mach computer controled material country Colombia South Africa japan

21 Solution

22 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Specification Bias 13.4 Suppose an important independent variable z is omitted from a regression model If z is uncorrelated with all other included independent variables, the influence of z is left unexplained and is absorbed by the error term, ε But if there is any correlation between z and any of the included independent variables, some of the influence of z is captured in the coefficients of the included variables Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

23 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Specification Bias (continued) If some of the influence of omitted variable z is captured in the coefficients of the included independent variables, then those coefficients are biased… …and the usual inferential statements from hypothesis test or confidence intervals can be seriously misleading In addition the estimated model error will include the effect of the missing variable(s) and will be larger Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

24 Exercise 13.21 Y: miles per gasolone İndepependent variables:
vehicle hoursepower and vehicle weight Suppose weight variable is omited what can be concluded about coef of hoursePower?

25 Solution

26 Solution (cont.)

27 Solution (cont.)

28 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Multicollinearity 13.5 Collinearity: High correlation exists among two or more independent variables This means the correlated variables contribute redundant information to the multiple regression model Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

29 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Multicollinearity (continued) Including two highly correlated explanatory variables can adversely affect the regression results No new information provided Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

30 Indicators of Multicollinearity
Coefficients differ from the values expected by theory or experience, or have incorrect signs Coefficients of variables believed to be a strong influence have small t statistics indicating that their values do not differ from 0 All the coefficient student t statistics are small, indicating no individual effect, but the overall F statistic indicates a strong effect for the total regression model High correlations exist between individual independent variables or one or more of the independent variables have a strong linear regression relationship to the other independent variables or a combination of both Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

31 Detecting Multicollinearity
Examine the simple correlation matrix to determine if strong correlation exists between any of the model independent variables Look for a large change in the value of a previous coefficient when a new variable is added to the model Does a previously significant variable become insignificant when a new independent variable is added? Does the estimate of the standard deviation of the model increase when a variable is added to the model? Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

32 Corrections for Multicollinearity
Remove one or more of the highly correlated independent variables. But, as shown in Section 13.4, this might lead to a bias in coefficient estimation. Change the model specification, including possibly a new independent variable that is a function of several correlated independent variables. Obtain additional data that do not have the same strong correlations between the independent variables. Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

33 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Heteroscedasticity 13.6 Homoscedasticity The probability distribution of the errors has constant variance Heteroscedasticity The error terms do not all have the same variance The size of the error variances may depend on the size of the dependent variable value, for example Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

34 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Heteroscedasticity (continued) When heteroscedasticity is present: least squares is not the most efficient procedure to estimate regression coefficients The usual procedures for deriving confidence intervals and tests of hypotheses is not valid Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

35 Residual Analysis for Homoscedasticity
x x x x residuals residuals Constant variance Non-constant variance Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

36 Tests for Heteroscedasticity
To test the null hypothesis that the error terms, εi, all have the same variance against the alternative that their variances depend on the expected values Estimate the simple regression Let R2 be the coefficient of determination of this new regression The null hypothesis is rejected if nR2 is greater than 21, where 21, is the critical value of the chi-square random variable with 1 degree of freedom and probability of error  Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

37 Exercise 13.26 Retail sales v.s. Disposible income per household

38 Solution

39

40 Exercise 13.29 Home prices vs family variables mainly income

41 Solution

42 Solution (cont.)

43 Solution (cont.)

44 Solution (cont.)

45 Variable-Selection Procedures
Stepwise Regression At each iteration, the first consideration is to see whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. If no variable can be removed, the procedure checks to see whether the most significant variable not in the model can be added because its F value, FMAX, is greater than the user-specified or default F value, FENTER. If no variable can be removed and no variable can be added, the procedure stops.

46 Variable-Selection Procedures
Forward Selection This procedure is similar to stepwise-regression, but does not permit a variable to be deleted. This forward-selection procedure starts with no independent variables. It adds variables one at a time as long as a significant reduction in the error sum of squares (SSE) can be achieved.

47 Variable-Selection Procedures
Backward Elimination This procedure begins with a model that includes all the independent variables the modeler wants considered. It then attempts to delete one variable at a time by determining whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. Once a variable has been removed from the model it cannot reenter at a subsequent step.

48 Variable-Selection Procedures
Best-Subsets Regression The three preceding procedures are one-variable-at-a-time methods offering no guarantee that the best model for a given number of variables will be found. Some software packages include best-subsets regression that enables the use to find, given a specified number of independent variables, the best regression model. Minitab output identifies the two best one-variable estimated regression equations, the two best two-variable equation, and so on.

49 Exercise 11.81 Data file: Student GPA Variables: GPA Sex SATworb
SATmath ACTeng ACT worbel ACTmath ACT math ACTss ACT social ACTcomp composite HSPert high schol performance EconGPA under grad in econ courses

50 Aim What determines econGPA?
Math SATmath, ACtmath or worble: SATworb, ACtworb a) Scatter plots of depndent variable vs independent variables b) What is the best predictor of econGPA? c) Descriptive statistics of variables

51

52

53

54

55

56

57

58

59

60

61 Exercise 12.99 a) Use SAT variables and class rank
b)Use ACT variables and class rank c) Which model is better

62 12.99

63

64

65

66

67

68

69

70

71

72 Exercise 12.108 ACT variables , gender and high schol prformance
a) Develop a model b) your strategy c) what would you advide to school directors

73 12.108

74

75

76

77

78 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chapter Summary Discussed regression model building Introduced dummy variables for more than two categories and for experimental design Used lagged values of the dependent variable as regressors Discussed specification bias and multicollinearity Described heteroscedasticity Defined autocorrelation and used the Durbin-Watson test to detect positive and negative autocorrelation Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

79 Printed in the United States of America.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall


Download ppt "Chapter 13 Additional Topics in Regression Analysis"

Similar presentations


Ads by Google