Dummy Variables. Outline Objective Why forming dummy variables to use nominal variables as independent variables in regressions are important. How to.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

The Regression Equation  A predicted value on the DV in the bi-variate case is found with the following formula: Ŷ = a + B (X1)
Extension The General Linear Model with Categorical Predictors.
Dummy Variables Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal.
Slide 1 Incorporating Nonmetric Data with Dummy Variables For many of the multivariate techniques we will study, it is assumed that the independent or.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Elections in Canada. Voting Any Canadian over the age of 18 can vote in any election. Canadians vote for a Member of Parliament Members of Parliament.
1 Qualitative Independent Variables Sometimes called Dummy Variables.
Exam 1 Review GOVT 120.
Multiple Regression – Basic Relationships
UNAIDS World AIDS Day Report | 2011 Core Epidemiology Slides.
Assumption of Homoscedasticity
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Dummies (no, this lecture is not about you) POL 242 Renan Levine February 13/15, 2007.
Meaning of Measurement and Scaling
Chapter 8: Bivariate Regression and Correlation
Political Parties and Elections in Canada
Political Parties Civics ESL.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
Political Parties and Elections in Canada D Brown St Francis Xavier University Pol Sci 222 Winter term 2013.
Chapter 13: Inference in Regression
Logistic Regression- Dichotomous Dependent Variables March 21 & 23, 2011.
1 July 2008 e Global summary of the AIDS epidemic, December 2007 Total33 million [30 – 36 million] Adults30.8 million [28.2 – 34.0 million] Women15.5 million.
Political Cartoon of the Day
Legislative Branch Crown Governor General House of Commons
Chapter 1: The What and the Why of Statistics
Chapter 18 Four Multivariate Techniques Angela Gillis & Winston Jackson Nursing Research: Methods & Interpretation.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
Chapter 1: The What and the Why of Statistics  The Research Process  Asking a Research Question  The Role of Theory  Formulating the Hypotheses  Independent.
Interactions POL 242 Renan Levine March 13/15, 2007.
Multiple Regression Lab Chapter Topics Multiple Linear Regression Effects Levels of Measurement Dummy Variables 2.
World regions review game Honors World History. Question 1 This world region is where the Mali, Songhai, and Ghana Empires were located.
Logistic Regression July 28, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Review. POL 242 – Strong Correlation. Positive or Negative?
Data Lab #8 July 23, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Chapter 9.  In 2003 Iraq held its first real election in more than 30 years?  Despite threats of terrorism there was a very good turn out to vote...
World Regions Geography Review Game
Dummy Variables; Multiple Regression July 21, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204.
SW388R7 Data Analysis & Computers II Slide 1 Incorporating Nonmetric Data with Dummy Variables The logic of dummy-coding Dummy-coding in SPSS.
1 World Regions. 1 Southeast Asia 2 World Regions.
Regression Chapter 5 January 24 – Part II.
Political Socialization. Political socialization – The process through which an individual acquires his or her particular political orientations, including.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Latitude & Longitude.
Political Parties Civics. What is a political party?  An organized group of people who share similar ideas about the way in which government should operate.
Regional HIV and AIDS statistics and features, 2006
ExplaiN: The world map Practice your skills and show what you know about the continents and oceans.
BINARY LOGISTIC REGRESSION
The Democratizing Power of Elections in Africa and Asia
Grade 9 Social Studies Ms. Deck
Bi-variate #1 Cross-Tabulation
Regional HIV and AIDS statistics and features, 2003 and 2005
Regions ( Around the World.
Estimated adult and child deaths from AIDS  2009
Southwest Asia Map Analysis
Geographic Knowledge GLE (3, 5)
Western & Central Europe
Global summary of the HIV and AIDS epidemic, 2005
Regional HIV and AIDS statistics 2008 and 2001
Children (<15 years) estimated to be living with HIV as of end 2005
Regional HIV and AIDS statistics and features for women, 2004 and 2006
Earth’s Hemispheres.
Regression Part II.
Global summary of the HIV and AIDS epidemic, 2005
Presentation transcript:

Dummy Variables

Outline Objective Why forming dummy variables to use nominal variables as independent variables in regressions are important. How to use and interpret dummy variables.  Rules of use.  Recommended best practices. Interpretation example

Objective Learn how to use nominal variables as independent variables in regression models. These include variables like:  Continent/region (Africa, Western Europe, etc).  U.S. Party Vote (Democrats, Republicans, Other).  Marital status (Married, Single, Widowed, etc).  Religion (Catholic, Protestant, Muslim, etc).

Independent Variables in Regressions When you run an ordinary least squares (OLS) regression analysis, each B coefficient can be interpreted as the predicted change in Y (the dependent variable) as a result of increasing the independent variable (X) by one unit. For example: To explain differences in countries’ life expectancy rates, a regression was run using literacy rates as an independent variable.  Both variables are interval.

Interpreting Coefficients The B (unstandardized) coefficient for literacy rates was We interpret this coefficient as:  When literacy increases by one point, the model predicts that life expectancy will increase by 0.28 points when controlling for all other variables.

What if? What if another reader suggested that the relationship between literacy and life expectancy was different in Africa than everywhere else in the world? Fortunately, there is a variable in the dataset for continent/region: North America, South America, Western Europe, Eastern Europe, Africa, the Middle East, Central & South Asia, East Asia and Oceania.

A problem with nominal variables The continent/region variable is nominal. This poses a problem when used in regression analyses, because without an order to the values, we cannot interpret the coefficient.  It would be silly to say that “for every one point increase in region…” or “for every one point increase from North America…”

Solution for nominal variables Transform nominal variable into many dichotomous variables, called “dummies.”  Dichotomous variables have only two value categories or options, like “yes” and “no”.  So, recode the region variable so that all African countries are coded as 1 and all other countries are 0. With only two options, coefficients can be interpreted as the difference from one value category to the other value category.  The coefficient for a dichotomous variable for African countries would be interpreted as the difference between African and non-African countries.

Dummy interpretation IV = Africa= 1, All others = 0 DV = Life expectancy Unstandardized B Coefficient = ## The model predicts that compared to all other countries, countries in Africa have ## lower/higher life expectancy when controlling for all other variables.

Rules for Dummies All dummy variables must be dichotomous with only two options or categories.  Continents: Africa=1, all other regions = 0 AND/OR a separate variable that is: Western Europe =1, all other regions = 0.  Party voted for when there is a Green Party, Tea Party or other third party candidate: Democrat=1, all other parties = 0 AND/OR a separate variable that is: Republican = 1, all other parties= 0.

More rules for dummies You can use more than one dummy variable as independent variables in a regression equation.  Region/continents example: Africa=1, all other regions = 0 Western Europe =1, all other regions= 0. East Asia=1, all other regions= 0. When you add new dummies, the observations covered by the omitted category (zero) decreases.

Note on adding additional dummies Each time you create a new dummy variable out of a nominal variable, that category is no longer included in the omitted category (zero). For example, if you have only one dummy variable, Africa=1, then all other regions = 0. If you add a dummy for Western Europe =1, then all other regions is really “all other regions except Africa and Western Europe.” If you a dummy for East Asia=1 too, then for each of the three dummies, 0= “all other regions except Africa, Western Europe and East Asia.”

Maximum number of dummies The number of dummy variables used must be NO MORE than one less than the total number of value categories in the original nominal variable. For example, the original continent/region variable had NINE value categories:  North America, South America, Western Europe, Eastern Europe, Africa, the Middle East, Central & South Asia, East Asia and Oceania. Therefore, one can use up to EIGHT different dummy variables. There must always be at least one region as a baseline, remaining as zero.

What dummy do you exclude? It does not matter to your overall model which category you exclude if you include the maximum number of variables. However, there are best practices that one ought to follow when choosing the excluded category. The excluded category is like a baseline, so certain categories make results easier to understand and interpret. It is best if the excluded category is:  The mode or most common category  The observations in that category are relatively similar or homogenous.

Recommended: exclude the mode Exclude the mode, the most common or best known category.  For example, if your original variable was U.S. vote choice, with three categories, Democrats, Republicans or “Other”, exclude the well-known Democrats or Republicans. It will be easier for you and your readers to interpret the coefficient for the dummy variable in regards to a well-known group of voters.

Recommended: homogenous baseline Since the excluded category provides a baseline, interpretations are easier when the excluded category is relatively homogenous.  In the region example, it may make sense to exclude Western Europe since almost all of the countries in those regions share certain attributes like high levels of literacy relative to countries in other regions. This category may often appear “extreme”.

Dummy interpretation: all others? IVs = Canadian Party Vote. Canada has five parties represented in Parliament: the Conservatives, the Liberals, the NDP, the Bloc Quebecois, and the Greens. Other parties also run. Conservatives = 1, All others = 0 NDP = 1, All others = 0 Bloc Quebecois = 1, All others = 0 Greens = 1, All others = 0 Other small parties = 1, All others = 0 What party or parties are included in “all others” at this point?

Example when maximum number of dummies are used. Canadian vote example from previous slide Independent Variables = Vote (All others = last remaining party = Liberals) Conservatives = 1, Liberals = 0 NDP = 1, Liberals = 0 Bloc Quebecois = 1, Liberals = 0 Greens = 1, Liberals = 0 Other small parties = 1, Liberals = 0 Dependent Variable = Feeling towards Prime Minister Harper (Conservative)

Interpretation of example Independent variable = party vote, dependent variable is feelings towards Prime Minister Harper, the unstandardized B coefficient is ##. The model predicts that compared to Liberals, NDP voters’ opinions are ## lower [or higher] when controlling for all other variables. The model predicts that compared to Liberals, Conservative voters’ opinions are ## higher [or lower] when controlling for all other variables.