Analysis of Variance and Regression Using Dummy Variables

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

BA 275 Quantitative Business Methods
Qualitative Variables and
Chapter 10 Simple Regression.
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Linear Regression and Correlation Analysis
Predictive Analysis in Marketing Research
Chapter 11 Multiple Regression.
© 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Ch. 14: The Multiple Regression Model building
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Example of Simple and Multiple Regression
Introduction to Linear Regression and Correlation Analysis
5.1 Basic Estimation Techniques  The relationships we theoretically develop in the text can be estimated statistically using regression analysis,  Regression.
Analysis of Variance: Some Review and Some New Ideas
Multiple Regression Analysis Multivariate Analysis.
Econ 3790: Business and Economics Statistics
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chapter 14 Introduction to Multiple Regression
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Regression Models Residuals and Diagnosing the Quality of a Model.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Multivariate Models Analysis of Variance and Regression Using Dummy Variables.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Analysis of Covariance Combines linear regression and ANOVA Can be used to compare g treatments, after controlling for quantitative factor believed to.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Chapter 14 Introduction to Multiple Regression
Analysis of Variance and Covariance
REGRESSION (R2).
CHAPTER 7 Linear Correlation & Regression Methods
Multiple Regression Analysis and Model Building
Analysis of Variance and Covariance
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Correlation and regression
Quantitative Methods Simple Regression.
Residuals and Diagnosing the Quality of a Model
Analysis of Variance Correlation and Regression Analysis
Correlation and Regression
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Analysis of Variance: Some Review and Some New Ideas
LEARNING OUTCOMES After studying this chapter, you should be able to
Chapter 13 Group Differences
Multiple Regression Chapter 14.
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Multivariate Models Regression.
Presentation transcript:

Analysis of Variance and Regression Using Dummy Variables Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Models A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. Steps in the Process of Quantitative Analysis: Specification of the model Estimation of the model Evaluation of the model

Model of Housing Values and Building Size Historian A hypothesizes that there is a linear relationship among housing value, building size and the number of families in the dwelling. Building Size = Square Feet/1000 Housing Value = 1905 Property Assessment in 2002 dollars/1000 Families = Number of families in the dwelling Housing Value = a + b1(Building Size) + b2(Families).

The Model of Determinants of Housing Value Dep Var: NEWVAL N: 467 Multiple R: 0.724 Squared multiple R: 0.524 Adjusted squared multiple R: 0.522 Standard error of estimate: 20.284 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT -2.551 3.029 0.000 . -0.842 0.400 NEWSIZE 25.893 1.146 0.734 0.972 22.595 0.000 FAMILIES -5.626 2.094 -0.087 0.972 -2.687 0.007 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression 210541.070 2 105270.535 255.858 0.000 Residual 190908.482 464 411.441

New Questions… Historian B suggests that there will be a neighborhood effect on housing values, and suggests that the values will be different, even taking size and number of families into consideration, on the north side, south side and east side. Historian B poses the problem to Historian A.

New Possibility: Analysis of Variance Comparison of the levels of an interval level dependent variable and a categorical or nominal independent variable. Are the property values different in the three neighborhoods, East, NW and South. Take a look first at the mean differences.

Value by Neighborhood

But… Are the results statistically significant? What is the strength of the relationship? How would we integrate this information into the earlier regression model?

Concepts We partition the total variation or variance into two components: (1) variance which is a function of the group membership, that is the differences between the groups; and (2) variance within the groups. More formally: Total Sum of Squares = Between Groups Sum of Squares + Within Groups Sum of Squares

Equation Total Sum of Squares = Within Groups Sum of Squares + Between Groups Sum of Squares TSS= SSW + SSB

Calculations

LET SSBETWEEN = N* (MEAN-28.818)* (MEAN -28.818) Case VAR00001$ MEAN N SD VARIANCE SSBETWEEN 3 EASTSIDE 47.313 92.000 18.334 336.134 31469.982 4 NW 26.035 308.000 12.096 146.305 2385.316 5 SOUTHSID 17.992 78.000 8.994 80.890 9141.271 6 Total 28.818 478.000 16.171 261.487 0.000 . . . 42996.569 LET SSBETWEEN = N* (MEAN-28.818)* (MEAN -28.818)

Anova Table DF between = k -1 DF within = N – k

Degrees of Freedom DF between = k -1 DF within = N – k Website for F Table: http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm#ONE-05-1-10 Eta Squared = SSBetween/Total SS = .345 (equivalent to R Square)

So, now what… We know that the neighborhood affects the value of the house. How do we integrate that knowledge into a regression model?

A Dilemma…. Regression requires interval level measurement. One cannot include categorical variables in the equation. Historian A proposes testing separate models for the three neighborhoods.

Results Regression Models for the Three Wards: Determinants of Housing Value Northwest East Side South Side Constant 5.90* -13.26 5.35* Newsize 11.99* 41.49* 14.88* Families 1.37 -19.90* -1.38 N 295 98 74 R Squared .57 .55 .60 *Statistically significant at the .05 level.

Is there another way? Can we develop one model instead of three? Answer: Yes, by remeasuring the neighborhood at the interval level. How? By conceiving of new variables identifying the presence or the absence of the neighborhood, that is a set of binary variables, called dummy variables.

Illustration of Dummy Variables Neighborhood East Side South Side Northwest Side 1

Illustration continued… Two new binary variables provide all the information needed for the three categories. Rule: Create k -1 dummy variables for the original categorical variable. The omitted category represents the value of the equation when the other dummy variables = 0.

New variables: Northwest Side as the Omitted Category Variable: Eastside. Codes: Yes=1; No=0 Variable: South. Codes: Yes=1; No=0 By implication: For a household on the Eastside, Eastside=1 and South=0 For a household on the Southside, Eastside=0 and Southside=1 For a household in the Northwest Side, Eastside = 0 and South = 0.

Results Newval = a + b1(Newsize) + b2(Families) + b3(Eastside) + b4(South) Dep Var: NEWVAL N: 467 Multiple R: 0.75 Squared multiple R: 0.56 Adjusted squared multiple R: 0.55 Standard error of estimate: 19.61 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT -3.32 2.95 0.00 . -1.13 0.26 NEWSIZE 23.60 1.32 0.67 0.68 17.88 0.00 FAMILIES -5.27 2.15 -0.08 0.87 -2.46 0.01 EASTSIDE 14.06 2.53 0.20 0.78 5.56 0.00 SOUTH 6.08 2.75 0.08 0.81 2.21 0.03

Implications 1. Separate regressions for each neighborhood imply that the other coefficients in the equation vary by ward. 2. Regression with dummy variables implies that the neighborhood effect is a movement of the Y intercept. There may be interactions between the slope coefficients and the dummy variables, i.e., both 1 and 2 may be the case.