1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

Slides:



Advertisements
Similar presentations
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Advertisements

Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 11 日 第十二週:建立迴歸模型.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 13 Multiple Regression
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Chapter 12 Multiple Regression
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
Chapter 11 Multiple Regression.
Multiple Linear Regression
Multiple Regression and Correlation Analysis
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 Chapter 11 – Test for the Equality of k Population Means nRejection Rule where the value of F  is based on an F distribution with k - 1 numerator d.f.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Chapter 13 Multiple Regression
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Research Methods
ANalysis Of VAriance can be used to test for the equality of three or more population means. H 0 :  1  =  2  =  3  = ... =  k H a : Not all population.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 Pertemuan 19 Analisis Varians Klasifikasi Satu Arah Matakuliah: I Statistika Tahun: 2008 Versi: Revisi.
Chapter 11 – Test for the Equality of k
Chapter 15 Multiple Regression Model Building
Statistics Analysis of Variance.
John Loucks St. Edward’s University . SLIDES . BY.
John Loucks St. Edward’s University . SLIDES . BY.
Statistics for Business and Economics (13e)
Econ 3790: Business and Economic Statistics
Multiple Regression Chapter 14.
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Slides by John Loucks St. Edward’s University

2 2 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 16 Regression Analysis: Model Building n Multiple Regression Approach to Experimental Design Experimental Design n General Linear Model n Determining When to Add or Delete Variables n Variable Selection Procedures n Autocorrelation and the Durbin-Watson Test Durbin-Watson Test

3 3 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Models in which the parameters (  0,  1,...,  p ) all Models in which the parameters (  0,  1,...,  p ) all have exponents of one are called linear models. General Linear Model n A general linear model involving p independent variables is n Each of the independent variables z is a function of x 1, x 2,..., x k (the variables for which data have been collected).

4 4 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. General Linear Model n The simplest case is when we have collected data for just one variable x 1 and want to estimate y by using a straight-line relationship. In this case z 1 = x 1. n This model is called a simple first-order model with one predictor variable.

5 5 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Modeling Curvilinear Relationships n This model is called a second-order model with one predictor variable. n To account for a curvilinear relationship, we might set z 1 = x 1 and z 2 =.

6 6 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Interaction n This type of effect is called interaction. n In this model, the variable z 5 = x 1 x 2 is added to account for the potential effects of the two variables acting together. n If the original data set consists of observations for y and two independent variables x 1 and x 2 we might develop a second-order model with two predictor variables.

7 7 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Transformations Involving the Dependent Variable n Another approach, called a reciprocal transformation, is to use 1/ y as the dependent variable instead of y. n Often the problem of nonconstant variance can be corrected by transforming the dependent variable to a different scale. n Most statistical packages provide the ability to apply logarithmic transformations using either the base-10 (common log) or the base e = (natural log).

8 8 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n We can transform this nonlinear model to a linear model by taking the logarithm of both sides. Nonlinear Models That Are Intrinsically Linear Models in which the parameters (  0,  1,...,  p ) have exponents other than one are called nonlinear models. Models in which the parameters (  0,  1,...,  p ) have exponents other than one are called nonlinear models. n In some cases we can perform a transformation of variables that will enable us to use regression analysis with the general linear model. n The exponential model involves the regression equation:

9 9 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Determining When to Add or Delete Variables n To test whether the addition of x 2 to a model involving x 1 (or the deletion of x 2 from a model involving x 1 and x 2 ) is statistically significant we can perform an F Test. n The F Test is based on a determination of the amount of reduction in the error sum of squares resulting from adding one or more independent variables to the model.

10 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Determining When to Add or Delete Variables n The p –value criterion can also be used to determine whether it is advantageous to add one or more dependent variables to a multiple regression model. The p –value associated with the computed F statistic can be compared to the level of significance . The p –value associated with the computed F statistic can be compared to the level of significance . It is difficult to determine the p –value directly from the tables of the F distribution, but computer software packages, such as Minitab or Excel, provide the p  value. It is difficult to determine the p –value directly from the tables of the F distribution, but computer software packages, such as Minitab or Excel, provide the p  value.

11 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection Procedures n Stepwise Regression n Forward Selection n Backward Elimination Iterative; one independent variable at a time is added or deleted based on the F statistic Different subsets of the independent variables are evaluated n Best-Subsets Regression The first 3 procedures are heuristics and therefore offer no guarantee that the best model will be found. The first 3 procedures are heuristics and therefore offer no guarantee that the best model will be found.

12 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Stepwise Regression n If no variable can be removed and no variable can be added, the procedure stops. n At each iteration, the first consideration is to see whether the least significant variable currently in the model can be removed because its F value is less than the user-specified or default Alpha to remove. n If no variable can be removed, the procedure checks to see whether the most significant variable not in the model can be added because its F value is greater than the user-specified or default Alpha to enter.

13 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Stepwise Regression Compute F stat. and p -value for each indep. variable not in model Start with no indep. variables in model variables in model Any p -value > alpha to remove ? Stop Indep. variable with largest p -value is removed from model Compute F stat. and p -value for each indep. variable in model Any p -value < alpha to enter ? Indep. variable with smallest p -value is entered into model No No Yes Yesnextiteration

14 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Forward Selection n This procedure is similar to stepwise regression, but does not permit a variable to be deleted. n This forward-selection procedure starts with no independent variables. n It adds variables one at a time as long as a significant reduction in the error sum of squares (SSE) can be achieved.

15 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Start with no indep. variables in model Stop Compute F stat. and p -value for each indep. variable not in model Any p -value < alpha to enter ? Indep. variable with smallest p -value is entered into model No Yes Variable Selection: Forward Selection

16 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n This procedure begins with a model that includes all the independent variables the modeler wants considered. n It then attempts to delete one variable at a time by determining whether the least significant variable currently in the model can be removed because its p -value is less than the user-specified or default value. n Once a variable has been removed from the model it cannot reenter at a subsequent step.

17 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination Stop Compute F stat. and p -value for each indep. variable in model Any p -value > alpha to remove ? Indep. variable with largest p -value is removed from model No Yes Start with all indep. variables in model

18 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Tony Zamora, a real estate investor, has just Tony Zamora, a real estate investor, has just moved to Clarksville and wants to learn about the city’s residential real estate market. Tony has randomly selected 25 house-for-sale listings from the Sunday newspaper and collected the data partially listed on the next slide. Variable Selection: Backward Elimination n Example: Clarksville Homes Develop, using the backward elimination Develop, using the backward elimination procedure, a multiple regression model to predict the selling price of a house in Clarksville.

19 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n n Partial Data Note: Rows are not shown.

20 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n n Regression Output Greatest p -value >.05 Variable to be removed

21 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n Cars (garage size) is the independent variable with the highest p -value (.697) >.05. n Cars variable is removed from the model. n Multiple regression is performed again on the remaining independent variables.

22 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n n Regression Output Greatest p -value >.05 Variable to be removed

23 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n Bedrooms is the independent variable with the highest p -value (.281) >.05. n Bedrooms variable is removed from the model. n Multiple regression is performed again on the remaining independent variables.

24 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n n Regression Output Greatest p -value >.05 Variable to be removed

25 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n Bathrooms is the independent variable with the highest p -value (.110) >.05. n Bathrooms variable is removed from the model. n Multiple regression is performed again on the remaining independent variable.

26 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n n Regression Output Greatest p -value is <.05

27 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable Selection: Backward Elimination n House size is the only independent variable remaining in the model. n The estimated regression equation is:

28 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Minitab output identifies the two best one-variable estimated regression equations, the two best two- variable equation, and so on. Variable Selection: Best-Subsets Regression n The three preceding procedures are one-variable-at- a-time methods offering no guarantee that the best model for a given number of variables will be found. n Some software packages include best-subsets regression that enables the user to find, given a specified number of independent variables, the best regression model.

29 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. The Professional Golfers Association keeps a The Professional Golfers Association keeps a variety of statistics regarding performance measures. Data include the average driving distance, percentage of drives that land in the fairway, percentage of greens hit in regulation, average number of putts, percentage of sand saves, and average score. n Example: PGA Tour Data Variable Selection: Best-Subsets Regression

30 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Variable Names and Definitions Variable-Selection Procedures Score : average score for an 18-hole round Sand : percentage of sand saves (landing in a sand trap and still scoring par or better) Putt : average number of putts for greens that have been hit in regulation been hit in regulation Green : percentage of greens hit in regulation (a par-3 green is “hit in regulation” if the player’s first shot lands on the green) shot lands on the green) Fair : percentage of drives that land in the fairway Drive : average length of a drive in yards

31 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part Variable-Selection Procedures Drive Fair Green Putt Sand Score Drive Fair Green Putt Sand Score n Sample Data (Part 1)

32 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part Variable-Selection Procedures Drive Fair Green Putt Sand Score Drive Fair Green Putt Sand Score n Sample Data (Part 2)

33 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part Variable-Selection Procedures Drive Fair Green Putt Sand Score Drive Fair Green Putt Sand Score n Sample Data (Part 3)

34 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Sample Correlation Coefficients Variable-Selection Procedures Sand Putt Green Fair Drive Score Drive Fair Green Putt

35 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Best Subsets Regression of SCORE Variable-Selection Procedures Vars R-sq R-sq(a) C-p s D F G P S X X X XX XX XXX XXX XXXX XXXX XXXXX

36 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable-Selection Procedures s =.2691 R-sq = 72.4% R-sq(adj) = 66.8% n Minitab Output Putt Green The regression equation Score = (Drive) (Fair) (Green) (Putt) (Green) (Putt) Fair Predictor Coef Stdev t-ratio p Drive Constant

37 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Variable-Selection Procedures n Minitab Output Total Error Regression SOURCE DF SS MS F P Analysis of Variance

38 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design n We will use the results of multiple regression to perform the ANOVA test on the difference in the means of three populations. n The use of dummy variables in a multiple regression equation can provide another approach to solving analysis of variance and experimental design problems.

39 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design n Example: Reed Manufacturing Janet Reed would like to know if there is any Janet Reed would like to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants (in Buffalo, Pittsburgh, and Detroit). A simple random sample of five managers from A simple random sample of five managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide.

40 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part Plant 1 Buffalo Plant 2 Pittsburgh Plant 3 Detroit Observation Sample Mean Sample Variance Multiple Regression Approach to Experimental Design

41 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design n We begin by defining two dummy variables, A and B, that will indicate the plant from which each sample B, that will indicate the plant from which each sample observation was selected. observation was selected. A = 0, B = 1 if observation is from Detroit plant A = 1, B = 0 if observation is from Pittsburgh plant A = 0, B = 0 if observation is from Buffalo plant n In general, if there are k populations, we need to define k – 1 dummy variables. define k – 1 dummy variables.

42 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design Plant 1 Buffalo Plant 2 Pittsburgh Plant 3 Detroit A B y n Input Data

43 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. E ( y ) = expected number of hours worked E ( y ) = expected number of hours worked =  0 +  1 A +  2 B =  0 +  1 A +  2 B Multiple Regression Approach to Experimental Design For Detroit: E ( y ) =  0 +  1 (0) +  2 (1) =  0 +  2 For Pittsburgh: E ( y ) =  0 +  1 (1) +  2 (0) =  0 +  1 For Buffalo: E ( y ) =  0 +  1 (0) +  2 (0) =  0

44 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design Excel produced the regression equation: Plant Estimate of E ( y ) Buffalo Pittsburgh Detroit b0 b0 b0 b0 = 55 b0 b0 b0 b0 + b1 b1 b1 b1 = = 68 b0 b0 b0 b0 + b2 b2 b2 b2 = = 57 y = A + 2B

45 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design n Next, we observe that if there is no difference in the means: E ( y ) for the Pittsburgh plant – E ( y ) for the Buffalo plant = 0 E ( y ) for the Detroit plant – E ( y ) for the Buffalo plant = 0

46 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design Because  0 equals E ( y ) for the Buffalo plant and  0 +  1 equals E ( y ) for the Pittsburgh plant, the first difference is equal to (  0 +  1 ) -  0 =  1. Because  0 equals E ( y ) for the Buffalo plant and  0 +  1 equals E ( y ) for the Pittsburgh plant, the first difference is equal to (  0 +  1 ) -  0 =  1. Because  0 +  2 equals E ( y ) for the Detroit plant, the second difference is equal to (  0 +  2 ) -  0 =  2. Because  0 +  2 equals E ( y ) for the Detroit plant, the second difference is equal to (  0 +  2 ) -  0 =  2. We would conclude that there is no difference in the three means if  1 = 0 and  2 = 0. We would conclude that there is no difference in the three means if  1 = 0 and  2 = 0.

47 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design n The null hypothesis for a test of the difference of means is H 0 :  1 =  2 = 0 n To test this null hypothesis, we must compare the value of MSR/MSE to the critical value from an F distribution with the appropriate numerator and denominator degrees of freedom.

48 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design Regression Error Total Source of Variation Sum of Squares Degrees of Freedom MeanSquares 9.55 F ANOVA Table Produced by Excel ANOVA Table Produced by Excel p.003

49 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Multiple Regression Approach to Experimental Design n At a.05 level of significance, the critical value of F with k – 1 = 3 – 1 = 2 numerator d.f. and n T – k = 15 – 3 = 12 denominator d.f. is n Because the observed value of F (9.55) is greater than the critical value of 3.89, we reject the null hypothesis. Alternatively, we reject the null hypothesis because the p -value of.003 <  =.05. Alternatively, we reject the null hypothesis because the p -value of.003 <  =.05.

50 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Autocorrelation and the Durbin-Watson Test n Often, the data used for regression studies in business and economics are collected over time. n It is not uncommon for the value of y at one time period to be related to the value of y at previous time periods. n In this case, we say autocorrelation (or serial correlation) is present in the data.

51 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n With positive autocorrelation, we expect a positive residual in one period to be followed by a positive residual in the next period. n With positive autocorrelation, we expect a negative residual in one period to be followed by a negative residual in the next period. n With negative autocorrelation, we expect a positive residual in one period to be followed by a negative residual in the next period, then a positive residual, and so on. Autocorrelation and the Durbin-Watson Test

52 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n When autocorrelation is present, one of the regression assumptions is violated: the error terms are not independent. n When autocorrelation is present, serious errors can be made in performing tests of significance based upon the assumed regression model. n The Durbin-Watson statistic can be used to detect first-order autocorrelation. Autocorrelation and the Durbin-Watson Test

53 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Durbin-Watson Test Statistic The i th residual is denoted Autocorrelation and the Durbin-Watson Test

54 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Durbin-Watson Test Statistic A value of two indicates no autocorrelation. A value of two indicates no autocorrelation. If successive values of the residuals are close If successive values of the residuals are close together (positive autocorrelation is present), together (positive autocorrelation is present), the statistic will be small. the statistic will be small. The statistic ranges in value from zero to four. The statistic ranges in value from zero to four. If successive values are far apart (negative If successive values are far apart (negative autocorrelation is present), the statistic will autocorrelation is present), the statistic will be large. be large. Autocorrelation and the Durbin-Watson Test

55 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Suppose the values of  (residuals) are not independent but are related in the following manner: Suppose the values of  (residuals) are not independent but are related in the following manner: where  is a parameter with an absolute value less than one and z t is a normally and independently distributed random variable with a mean of zero and variance of  2. We see that if  = 0, the error terms are not related. We see that if  = 0, the error terms are not related.  t =  t -1 + z t The Durbin-Watson test uses the residuals to determine whether  = 0. The Durbin-Watson test uses the residuals to determine whether  = 0. Autocorrelation and the Durbin-Watson Test

56 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n The null hypothesis always is: n The alternative hypothesis is: to test for positive autocorrelation to test for negative autocorrelation to test for positive or negative autocorrelation there is no autocorrelation Autocorrelation and the Durbin-Watson Test

57 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. A Sample Of Critical Values For The Durbin-Watson Test For Autocorrelation Significance Points of d L and d U :  =.05 Number of Independent Variables n dLdLdLdL dUdUdUdU dLdLdLdL dUdUdUdU dLdLdLdL dUdUdUdU dUdUdUdU dUdUdUdU dUdUdUdU dUdUdUdU Autocorrelation and the Durbin-Watson Test

58 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Positiveautocor-relation Incon-clusive No evidence of positive autocorrelation 0 dLdLdLdL dUdUdUdU d L 4- d U Negativeautocor-relation Incon-clusive No evidence of negative autocorrelation 0 dLdLdLdL dUdUdUdU d L 4- d U Incon-clusive No evidence of autocorrelation 0 dLdLdLdL dUdUdUdU d L 4- d U Incon-clusive Negativeautocor-relation Positiveautocor-relation Autocorrelation and the Durbin-Watson Test

59 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. End of Chapter 16