The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent.

Slides:



Advertisements
Similar presentations
Applied Econometrics Second edition
Advertisements

Factorial ANOVA More than one categorical explanatory variable.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
The General Linear Model. The Simple Linear Model Linear Regression.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Chapter 12 Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Project #3 by Daiva Kuncaite Problem 31 (p. 190)
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Simple Linear Regression
Example of Simple and Multiple Regression
Chapter 13: Inference in Regression
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Applications The General Linear Model. Transformations.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Chapter 14 Introduction to Multiple Regression
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Copyright © 2011 Pearson Education, Inc. Analysis of Variance Chapter 26.
Fitting Equations to Data. A Common situation: Suppose that we have a single dependent variable Y (continuous numerical) and one or several independent.
Linear correlation and linear regression + summary of tests
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Linear Regression Analysis 5E Montgomery, Peck & Vining 1 Chapter 8 Indicator Variables.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Discussion of time series and panel models
ANOVA: Analysis of Variance.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Analysis of Covariance Combines linear regression and ANOVA Can be used to compare g treatments, after controlling for quantitative factor believed to.
Lecture 10: Correlation and Regression Model.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Categorical Independent Variables STA302 Fall 2013.
General Linear Model.
Orthogonal Linear Contrasts A technique for partitioning ANOVA sum of squares into individual degrees of freedom.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Handout Twelve: Design & Analysis of Covariance
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Summary of the Statistics used in Multiple Regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 26 Analysis of Variance.
29 October 2009 MRC CBU Graduate Statistics Lectures 4: GLM: The General Linear Model - ANOVA & ANCOVA1 MRC Cognition and Brain Sciences Unit Graduate.
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Regression and Correlation
REGRESSION G&W p
Interactions and Factorial ANOVA
Multiple Regression.
CHAPTER 29: Multiple Regression*
Hypothesis testing and Estimation
Multiple Regression – Split Sample Validation
Regression and Categorical Predictors
Presentation transcript:

The Use of Dummy Variables

In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent variables are categorical. Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.

Example: Comparison of Slopes of k Regression Lines with Common Intercept

Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both –Y (the response variable) and –X (an independent variable) Y is assumed to be linearly related to X with –the slope dependent on treatment (population), while –the intercept is the same for each treatment

The Model:

This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments. Dummy variables are variables that are artificially defined

In this case we define a new variable for each category of the categorical variable. That is we will define X i for each category of treatments as follows :

Then the model can be written as follows: The Complete Model: where

In this case Dependent Variable: Y Independent Variables: X 1, X 2,..., X k

In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis (q = k – 1)

The Reduced Model: Dependent Variable: Y Independent Variable: X = X 1 + X X k

Example: In the following example we are measuring –Yield Y as it depends on –the amount (X) of a pesticide. Again we will assume that the dependence of Y on X will be linear. (I should point out that the concepts that are used in this discussion can easily be adapted to the non- linear situation.)

Suppose that the experiment is going to be repeated for three brands of pesticides: A, B and C. The quantity, X, of pesticide in this experiment was set at 4 different levels: –2 units/hectare, –4 units/hectare and –8 units per hectare. Four test plots were randomly assigned to each of the nine combinations of test plot and level of pesticide.

Note that we would expect a common intercept for each brand of pesticide since when the amount of pesticide, X, is zero the four brands of pesticides would be equivalent.

The data for this experiment is given in the following table: 248 A B C

PesticideX (Amount)X1X1 X2X2 X3X3 Y A A A A B B B B C C C C A A A A B B B B C C C C A A A A B B B B C C C C The data as it would appear in a data file. The variables X 1, X 2 and X 3 are the “dummy” variables

Fitting the complete model : ANOVA dfSSMSFSignificance F Regression E-07 Residual Total Coefficients Intercept X1X X2X X3X

Fitting the reduced model : ANOVA dfSSMSFSignificance F Regression Residual Total Coefficients Intercept X

The Anova Table for testing the equality of slopes dfSSMSFSignificance F common slope zero E-06 Slope comparison Residual Total

Example: Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)

Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. Y is called the response variable, while X is called the covariate.

The Model:

Equivalent Forms of the Model: 1) 2)

This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.

In this case we define a new variable for each category of the categorical variable. That is we will define X i for categories I i = 1, 2, …, (k – 1) of treatments as follows:

Then the model can be written as follows: The Complete Model: where

In this case Dependent Variable: Y Independent Variables: X 1, X 2,..., X k-1, X

In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis (q = k – 1)

The Reduced Model: Dependent Variable: Y Independent Variable: X

Example: In the following example we are interested in comparing the effects of five workbooks (A, B, C, D, E) on the performance of students in Mathematics. For each workbook, 15 students are selected (Total of n = 15×5 = 75). Each student is given a pretest (pretest score ≡ X) and given a final test (final score ≡ Y). The data is given on the following slide

The data The Model:

Graphical display of data

Some comments 1.The linear relationship between Y (Final Score) and X (Pretest Score), models the differing aptitudes for mathematics. 2.The shifting up and down of this linear relationship measures the effect of workbooks on the final score Y.

The Model:

The data as it would appear in a data file.

The data as it would appear in a data file with Dummy variables, (X1, X2, X3, X4 )added

Here is the data file in SPSS with the Dummy variables, (X1, X2, X3, X4 )added. The can be added within SPSS

Fitting the complete model The dependent variable is the final score, Y. The independent variables are the Pre-score X and the four dummy variables X 1, X 2, X 3, X 4.

The Output

The Output - continued

The interpretation of the coefficients The common slope

The interpretation of the coefficients The intercept for workbook E

The interpretation of the coefficients The changes in the intercept when we change from workbook E to other workbooks.

1.When the workbook is E then X 1 = 0,…, X 4 = 0 and The model can be written as follows: The Complete Model: 2.When the workbook is A then X 1 = 1,…, X 4 = 0 and hence  1 is the change in the intercept when we change form workbook E to workbook A.

Testing for the equality of the intercepts The reduced model The dependent variable in only X (the pre-score)

Fitting the reduced model The dependent variable is the final score, Y. The independent variables is only the Pre-score X.

The Output for the reduced model Lower R 2

The Output - continued Increased R.S.S

The F Test

The Reduced model The Complete model

The F test

Testing for zero slope The reduced model The dependent variables are X 1, X 2, X 3, X 4 (the dummies)

The Reduced model The Complete model

The F test

The Analysis of Covariance This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA) The package sets up the dummy variables automatically

Here is the data file in SPSS. The Dummy variables are no longer needed.

In SPSS to perform ANACOVA you select from the menu – Analysis->General Linear Model->Univariatee

This dialog box will appear

You now select: 1.The dependent variable Y (Final Score) 2.The Fixed Factor (the categorical independent variable – workbook) 3.The covariate (the continuous independent variable – pretest score)

Compare this with the previous computed table The output: The ANOVA TABLE

This is the sum of squares in the numerator when we attempt to test if the slope is zero (and allow the intercepts to be different) The output: The ANOVA TABLE

Another application of the use of dummy variables The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes). Y X nodes

The model Y X x1x1 x2x2 xkxk 11 22 kk or

Now define Etc.

Then the model can be written

An Example In this example we are measuring Y at time X. Y is growing linearly with time. At time X = 10, an additive is added to the process which may change the rate of growth. The data

Graph

Now define the dummy variables

The data as it appears in SPSS – x1, x2 are the dummy variables

We now regress y on x1 and x2.

The Output

Graph

Testing for no change in slope Here we want to test H 0 :  1 =  2 vs H A :  1 ≠  2 The reduced model is Y =  0 +  1 (X 1 + X 2 ) +  =  0 +  1 X + 

Fitting the reduced model We now regress y on x.

The Output

Graph – fitting a common slope

The test for the equality of slope