Dummy Variables Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Multiple Regression and Model Building
The Regression Equation  A predicted value on the DV in the bi-variate case is found with the following formula: Ŷ = a + B (X1)
Multiple Regression Models: Interactions and Indicator Variables
Example 1 To predict the asking price of a used Chevrolet Camaro, the following data were collected on the car’s age and mileage. Data is stored in CAMARO1.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Soc 3306a Lecture 6: Introduction to Multivariate Relationships Control with Bivariate Tables Simple Control in Regression.
FIN822 Li11 Binary independent and dependent variables.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Qualitative Variables and
Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple.
The Use and Interpretation of the Constant Term
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Choosing a Functional Form
Chapter 5 Heteroskedasticity. What is in this Chapter? How do we detect this problem What are the consequences of this problem? What are the solutions?
1 Qualitative Independent Variables Sometimes called Dummy Variables.
7 Dummy Variables Thus far, we have only considered variables with a QUANTITATIVE MEANING -ie: dollars, population, utility, etc. In this chapter we will.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Levels of Measures Undergraduate Motor Learning. Types of Measures Nominal –lowest level of measurement. The nominal level measurement places people,
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Multiple Regression – Basic Relationships
 Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables. 
Multiple Regression 1 Sociology 8811 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
17-1 McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv.
Measures of Central Tendency
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
1 Dummy Variables. 2 Topics for This Chapter 1. Intercept Dummy Variables 2. Slope Dummy Variables 3. Different Intercepts & Slopes 4. Testing Qualitative.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.
Interactions POL 242 Renan Levine March 13/15, 2007.
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Review. POL 242 – Strong Correlation. Positive or Negative?
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
Multiple Regression Review Sociology 229A Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
9.1 Chapter 9: Dummy Variables A Dummy Variable: is a variable that can take on only 2 possible values: yes, no up, down male, female union member, non-union.
Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Dummy Variables; Multiple Regression July 21, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
1 Psych 5510/6510 Chapter 13: ANCOVA: Models with Continuous and Categorical Predictors Part 3: Within a Correlational Design Spring, 2009.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction effects from OLS coefficients: Interaction between 1 categorical.
Stats Methods at IC Lecture 3: Regression.
Curvilinear Relationships
Overview of categorical by categorical interactions: Part I: Concepts, definitions, and shapes Interactions in regression models occur when the association.
26134 Business Statistics Week 5 Tutorial
Using Indicator Variables
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Multiple Regression Analysis with Qualitative Information
LEARNING OUTCOMES After studying this chapter, you should be able to
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Presentation transcript:

Dummy Variables Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure. The term “ dummy ” appears to refer to the fact that the presence of the trait indicated by the code of 1 represents a factor or collection of factors that are not measurable by any better means within the context of the analysis.

Coding of dummy Variables Take for instance the race of the respondent in a study of voter preferences Race coded white(0) or black(1) There are a whole set of factors that are possibly different, or even likely to be different, between voters of different races Income, socialization, experience of racial discrimination, attitudes toward a variety of social issues, feelings of political efficacy, etc. Since we cannot measure all of those differences within the confines of the study we are doing, we use a dummy variable to capture these effects.

Multiple categories Now picture race coded white(0), black(1), Hispanic(2), Asian(3) and Native American(4) If we put the variable race into a regression equation, the results will be nonsense since the coding implicitly required in regression assumes at least ordinal level data – with approximately equal differences between ordinal categories. Regression using a 3 (or more) category nominal variable yields un-interpretable and meaningless results.

Creating Dummy variables The simple case of race is already coded correctly Black: coded 0 for white and 1 for black Note the coding can be reversed and leads only to changes in sign and direction of interpretation. The complex nominal version turns into 5 variables: White; coded 1 for whites and 0 for non-whites Black; coded 1 for blacks and 0 for non-blacks Hispanic; coded 1 for Hispanics and 0 for non- Hispanics Asian; coded 1 for Asians and 0 for non- Asians AmInd; coded 1 for native Americans and 0 for non-native Americans

Regression with Dummy Variables The dummy variable is then added the regression model Interpretation of the dummy variable is usually quite straightforward. The intercept term represents the intercept for the omitted category The slope coefficient for the dummy variable represents the change in the intercept for the category coded 1 (blacks)

Regression with only a dummy When we regress a variable on only the dummy variable, we obtain the estimates for the means of the depended variable. a is the mean of Y for Whites and a+B 1 is the mean of Y for Blacks.

Omitting a category When we have a single dummy variable, we have information for both categories in the model Also note that White = 1 – Black Thus having both a dummy for White and one for Blacks is redundant. As a result of this, we always omit one category, whose intercept is the model ’ s intercept. This omitted category is called the reference category In the dichotomous case, the reference category is simply the category coded 0 When we have a series of dummies, you can see that the reference category is also the omitted variable.

Suggestions for selecting the reference category Make it a well defined group – ‘ other ’ or an obscure one (low n) is usually a poor choice. If there is some underlying ordinality in the categories, select the highest or lowest category as the reference. (e.g. blue-collar, white-collar, professional) It should have ample number of cases. The modal category is also often a good choice.

Multiple dummy Variables The model for the full dummy variable scheme for race is: Note that the dummy for White has been omitted, and the intercept a is the intercept for Whites.

Tests of Significance With dummy variables, the t tests test whether the coefficient is different from the reference category, not whether it is different from 0. Thus if a = 50, and B 1 = -45, the coefficient for Blacks might not be significantly different from 0, while Whites are significantly different from 0

Interaction terms When the research hypotheses state that different categories may have differing responses to other independent variables, we need to use interaction terms. For example, race and income interact with each other so that the relationship between income and ideology is different (stronger or weaker) for Whites than Blacks.

Creating Interaction terms To create an interaction term is easy Multiply the category * the independent variable The full model is thus: a is the intercept for Whites; (a + B 1 ) is the intercept for Blacks; B 2 is the slope for Whites; and (B 2 + B 3 ) is the slope for Blacks t-tests for B 1 and B 3 are whether they are different than a and B 2

Separating Effects The literature is unclear on how to fully interpret interaction effects There is multicolinearity between a dummy and its interaction terms, and also the regular independent variable It is suggested that you do not use a model with Interactions terms and no intercept!