Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Simple Linear Regression and Correlation
Multiple Regression [ Cross-Sectional Data ]
Analysis of Economic Data
Chapter 13 Multiple Regression
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
Statistics for Business and Economics
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
EPI809/Spring Models With Two or More Quantitative Variables.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Statistics for Business and Economics
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
EPI809/Spring Testing Individual Coefficients.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Chapter 8 Forecasting with Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Purpose of Regression Analysis Regression analysis is used primarily to model causality and provide prediction –Predicts the value of a dependent (response)
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
© 2003 Prentice-Hall, Inc.Chap 11-1 Business Statistics: A First Course (3 rd Edition) Chapter 11 Multiple Regression.
Chapter 12 Multiple Regression and Model Building.
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chapter 14 Introduction to Multiple Regression
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 4 Introduction to Multiple Regression
Lecture 10: Correlation and Regression Model.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
12b - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part II.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Statistics for Managers using Microsoft Excel 3rd Edition
Multiple Regression Analysis and Model Building
Statistics for Business and Economics
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Presentation transcript:

Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building

Learning Objectives 1.Explain the Linear Multiple Regression Model 2.Describe Inference About Individual Parameters 3.Test Overall Significance 4.Explain Estimation and Prediction 5.Describe Various Types of Models 6.Describe Model Building 7.Explain Residual Analysis 8.Describe Regression Pitfalls

Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- Linear Non- Linear

Models With Two or More Quantitative Variables

Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- Linear Non- Linear

Multiple Regression Model General form: k independent variables x 1, x 2, …, x k may be functions of variables —e.g. x 2 = (x 1 ) 2

Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation

Probability Distribution of Random Error

Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation

Assumptions for Probability Distribution of ε 1.Mean is 0 2.Constant variance, σ 2 3.Normally Distributed 4.Errors are independent

Linear Multiple Regression Model

Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation

First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error

Relationship between 1 dependent and 2 independent variables is a linear function Model Assumes no interaction between x 1 and x 2 —Effect of x 1 on E(y) is the same regardless of x 2 values First-Order Model With 2 Independent Variables

y x1x1  0 Response Plane (Observed y)  i Population Multiple Regression Model Bivariate model: x2x2 (x 1i, x 2i )

Sample Multiple Regression Model y  0 Response Plane (Observed y)  i ^ ^ Bivariate model: x1x1 x2x2 (x 1i, x 2i )

No Interaction Effect (slope) of x 1 on E(y) does not depend on x 2 value E(y) x1x E(y) = 1 + 2x 1 + 3(2) = 7 + 2x 1 E(y) = 1 + 2x 1 + 3x 2 E(y) = 1 + 2x 1 + 3(1) = 4 + 2x 1 E(y) = 1 + 2x 1 + 3(0) = 1 + 2x 1 E(y) = 1 + 2x 1 + 3(3) = x 1

Parameter Estimation

Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation

:::: First-Order Model Worksheet Run regression with y, x 1, x 2 Case, iyiyi x1ix1i x2ix2i

Multiple Linear Regression Equations Too complicated by hand! Ouch!

Interpretation of Estimated Coefficients 2.Y-Intercept (  0 ) Average value of y when x k = 0 ^ ^ ^ ^ 1.Slope (  k ) Estimated y changes by  k for each 1 unit increase in x k holding all other variables constant –Example: if  1 = 2, then sales (y) is expected to increase by 2 for each 1 unit increase in advertising (x 1 ) given the number of sales rep’s (x 2 )

1 st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You’ve collected the following data: (y) (x 1 ) (x 2 ) RespSizeCirc

Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP ADSIZE CIRC Parameter Estimation Computer Output 22 ^ 00 ^ 11 ^

Interpretation of Coefficients Solution^ 2.Slope (  2 ) Number of responses to ad is expected to increase by.2805 (28.05) for each 1 unit (1,000) increase in circulation holding ad size constant ^ 1.Slope (  1 ) Number of responses to ad is expected to increase by.2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant

Estimation of σ 2

Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation

Estimation of σ 2 For a model with k independent variables

Calculating s 2 and s Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find SSE, s 2, and s.

Analysis of Variance Source DF SS MS F P Regression Residual Error Total Analysis of Variance Computer Output SSE S2S2

Evaluating the Model

Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation

Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

Variation Measures

Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

Multiple Coefficient of Determination Proportion of variation in y ‘explained’ by all x variables taken together Never decreases when new x variable is added to model —Only y values determine SS yy —Disadvantage when comparing models

Adjusted Multiple Coefficient of Determination Takes into account n and number of parameters Similar interpretation to R 2

Estimation of R 2 and R a 2 Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find R 2 and R a 2.

Excel Computer Output Solution R2R2 Ra2Ra2

Testing Parameters

Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

Inference for an Individual β Parameter Confidence Interval Hypothesis Test Ho: β i = 0 Ha: β i ≠ 0 (or ) Test Statistic df = n – (k + 1)

Confidence Interval Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find a 95% confidence interval for β 1.

Excel Computer Output Solution

Confidence Interval Solution

Hypothesis Test Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α =.05.

Hypothesis Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t Reject H 0  2 = 0  2  = 3

Excel Computer Output Solution

Hypothesis Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t Reject H 0  2 = 0  2  = 3 Reject at  =.05 There is evidence the mean ad response increases as circulation increases

Excel Computer Output Solution P–Value

Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

Testing Overall Significance Shows if there is a linear relationship between all x variables together and y Hypotheses — H 0 :  1 =  2 =... =  k = 0  No linear relationship — H a : At least one coefficient is not 0  At least one x variable affects y

Test Statistic Degrees of Freedom 1 = k 2 = n – (k + 1) —k = Number of independent variables —n = Sample size Testing Overall Significance

Testing Overall Significance Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α =.05.

Testing Overall Significance Solution H 0 : H a :  = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F  =.05 β 1 = β 2 = 0 At least 1 not zero

Testing Overall Significance Computer Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total k n – (k + 1) MS(Error) MS(Model)

Testing Overall Significance Solution H 0 : H a :  = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F  =.05 β 1 = β 2 = 0 At least 1 not zero Reject at  =.05 There is evidence at least 1 of the coefficients is not zero

Testing Overall Significance Computer Output Solution Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total P-Value MS(Model) MS(Error)

Interaction Models

Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

Interaction Model With 2 Independent Variables Hypothesizes interaction between pairs of x variables —Response to one x variable varies at different levels of another x variable Contains two-way cross product terms Can be combined with other models —Example: dummy-variable model

Effect of Interaction Given: Without interaction term, effect of x 1 on y is measured by  1 With interaction term, effect of x 1 on y is measured by  1 +  3 x 2 —Effect increases as x 2 increases

Interaction Model Relationships Effect (slope) of x 1 on E(y) depends on x 2 value E(y) x1x E(y) = 1 + 2x 1 + 3x 2 + 4x 1 x 2 E(y) = 1 + 2x 1 + 3(1) + 4x 1 (1) = 4 + 6x 1 E(y) = 1 + 2x 1 + 3(0) + 4x 1 (0) = 1 + 2x 1

Interaction Model Worksheet ::::: Multiply x 1 by x 2 to get x 1 x 2. Run regression with y, x 1, x 2, x 1 x 2 Case, iyiyi x1ix1i x2ix2i x 1i x 2i

Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct a test for interaction. Use α =.05.

Interaction Model Worksheet Multiply x 1 by x 2 to get x 1 x 2. Run regression with y, x 1, x 2, x 1 x 2 x1ix1i x2ix2i x 1i x 2i yiyi

Excel Computer Output Solution Global F–test indicates at least one parameter is not zero P-Value F

Interaction Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t Reject H  3 = 0  3 ≠ = 4

Excel Computer Output Solution

Interaction Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t Reject H  3 = 0  3 ≠ = 4 Do no reject at  =.05 There is no evidence of interaction t =

Second–Order Models

Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

Second-Order Model With 1 Independent Variable Relationship between 1 dependent and 1 independent variable is a quadratic function Useful 1 st model if non-linear relationship suspected Model Linear effect Curvilinear effect

y Second-Order Model Relationships yy y  2 > 0  2 < 0 x1x1 x1x1 x1x1 x1x1

Second-Order Model Worksheet Create x 2 column. Run regression with y, x, x :::: Case, iyiyi xixi 2 xixi

2 nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2 nd order model, conduct the global F–test, and test if β 2 ≠ 0. Use α =.05 for all tests. Errors (y) Weeks (x)

Second-Order Model Worksheet Create x 2 column. Run regression with y, x, x yiyi xixi 2 xixi :::

Excel Computer Output Solution

Overall Model Test Solution Global F–test indicates at least one parameter is not zero P-Value F

β 2 Parameter Test Solution β 2 test indicates curvilinear relationship exists P-Value t

Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

Second-Order Model With 2 Independent Variables Relationship between 1 dependent and 2 independent variables is a quadratic function Useful 1 st model if non-linear relationship suspected Model

Second-Order Model Relationships y x2x2 x1x1  4 +  5 > 0 y  4 +  5 < 0 y  3 2 > 4  4  5 x2x2 x1x1 x2x2 x1x1

Second-Order Model Worksheet ::::::: Multiply x 1 by x 2 to get x 1 x 2 ; then create x 1 2, x 2 2. Run regression with y, x 1, x 2, x 1 x 2, x 1 2, x 2 2. Case, iyiyi x1ix1i 2 x1ix1i 2 x2ix2i x2ix2i x1ix2ix1ix2i

Models With One Qualitative Independent Variable

Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

Dummy-Variable Model Involves categorical x variable with 2 levels —e.g., male-female; college-no college Variable levels coded 0 and 1 Number of dummy variables is 1 less than number of levels of variable May be combined with quantitative variable (1 st order or 2 nd order model)

Dummy-Variable Model Worksheet :::: x 2 levels: 0 = Group 1; 1 = Group 2. Run regression with y, x 1, x 2 Case, iyiyi x1ix1i x2ix2i

Interpreting Dummy- Variable Model Equation Given: y = Starting salary of college graduates x 1 = GPA 0 if Male 1 if Female Same slopes x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):

Dummy-Variable Model Example Computer Output: Same slopes 0 if Male 1 if Female x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):

Dummy-Variable Model Relationships y x1x1 0 0 Same Slopes  1 00  0 +  2 ^ ^ ^ ^ Female Male

Nested Models

Comparing Nested Models Contains a subset of terms in the complete (full) model Tests the contribution of a set of x variables to the relationship with y Null hypothesis H 0 :  g+1 =... =  k = 0 —Variables in set do not improve significantly the model when all other variables are included Used in selecting x variables or models Part of most computer programs

Selecting Variables in Model Building

A butterfly flaps its wings in Japan, which causes it to rain in Nebraska. -- Anonymous Use Theory Only!Use Computer Search!

Model Building with Computer Searches Rule: Use as few x variables as possible Stepwise Regression —Computer selects x variable most highly correlated with y —Continues to add or remove variables depending on SSE Best subset approach —Computer examines all possible sets

Residual Analysis

Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

Residual Analysis Graphical analysis of residuals —Plot estimated errors versus x i values  Difference between actual y i and predicted y i  Estimated errors are called residuals —Plot histogram or stem-&-leaf of residuals Purposes —Examine functional form (linear v. non-linear model) —Evaluate violations of assumptions

Residual Plot for Functional Form Add x 2 TermCorrect Specification x e ^ x e ^

Residual Plot for Equal Variance Unequal VarianceCorrect Specification Fan-shaped. Standardized residuals used typically. x e ^ x e ^

Residual Plot for Independence x Not IndependentCorrect Specification Plots reflect sequence data were collected. e ^ x e ^

Residual Analysis Computer Output Dep Var Predict Student Obs SALES Value Residual Residual | |** | | *| | | | | | **| | | |*** | Plot of standardized (student) residuals

Regression Pitfalls

Parameter Estimability —Number of different x–values must be at least one more than order of model Multicollinearity —Two or more x–variables in the model are correlated Extrapolation —Predicting y–values outside sampled range Correlated Errors

Multicollinearity High correlation between x variables Coefficients measure combined effect Leads to unstable coefficients depending on x variables in model Always exists – matter of degree Example: using both age and height as explanatory variables in same model

Detecting Multicollinearity Significant correlations between pairs of x variables are more than with y variable Non–significant t–tests for most of the individual parameters, but overall model test is significant Estimated parameters have wrong sign

Solutions to Multicollinearity Eliminate one or more of the correlated x variables Avoid inference on individual parameters Do not extrapolate

y Interpolation x Extrapolation Sampled Range Extrapolation

Conclusion 1.Explained the Linear Multiple Regression Model 2.Described Inference About Individual Parameters 3.Tested Overall Significance 4.Explained Estimation and Prediction 5.Described Various Types of Models 6.Described Model Building 7.Explained Residual Analysis 8.Described Regression Pitfalls