Categorical Variables in Regression Let’s categorize students. What’s the purpose of ANOVA? If I wanted to know PSGE 7211
Guiding Questions How are ANOVA and regression related? How do I analyze categorical variables in regression?
Reading Test Scores by Sex (T-test)
Reading Test Scores by Sex (BR)
What’s your birthday? Let’s divide the room according to birthdays Count Winter birthday Spring birthday Summer birthday Autumn birthday
vs. Methods of Coding When is your birthday? 1. Winter 2. Spring 3. Summer 4. Fall Is your b-day: YES NO 1. Winter? 1 2. Spring? 3. Summer? 4. Fall? vs.
vs. Methods of Coding What is your religious affiliation? 1. Protestant 2. Catholic 3. Jewish 4. Muslim 5. Other (or none) Are you: YES NO 1. Protestant? 1 2. Catholic? 3. Jewish? 4. Muslim? 5. Other? vs.
Methods of Coding What is your religious affiliation? 1. Protestant 2. Catholic 3. Jewish 4. Muslim 5. Other (or none)
DUMMY CODES Methods of Coding Are you: YES NO 1. Protestant? 1 2. Catholic? 3. Jewish? 4. Muslim? 5. Other? DUMMY CODES
False Memory and Sexual Abuse Bremner, Shobe, & Kihlstrom, 2000
False Memory ANOVA
For one-way ANOVA (one factor) ANOVA in SPSS For one-way ANOVA (one factor)
ANOVA in SPSS
What about in Regression? Step 1: Create categorical variables for group membership Need to create as many dummy variables as there are categories (g) minus 1 3 categories = we need to create 2 dummy variables
Intercept: the predicted value when the other variables=0 Converting to Dummy Variables Group VARIABLE 1: Abused, PTSD VARIABLE 2: Abused, Non-PTSD 1 Nonabused, Non-PTSD Intercept: the predicted value when the other variables=0 Intercept reflects Non-abused, Non-PTSD group; examining effect of Abused PTSD while controlling for Abused, non-PTSD Intercept reflects Non-abused, Non-PTSD group; examining effect of Abused Non-PTSD while controlling for Abused, PTSD
group abusepts no_PTSD. Remember to check your recoding List var group abusepts no_PTSD.
the means for all three groups are provided False Memory with Dummy Vars Post-hoc info and the means for all three groups are provided
Need for g - 1 Dummy Variables DV: Group Membership IVs: 2 dummy variables Note: 100% of variance is accounted for!
Was Regression Necessary? Doesn’t make much sense to run a multiple regression with only categorical IVs (much easier to run an ANOVA) But regression is great if you want to include both categorical and continuous variables
intercept reflects grand mean (or overall mean) of all three groups Effect Coding Group Abused, PTSD (EFFECT 1) Abused, Non-PTSD (EFFECT 2) Abused, PTSD 1 Abused, Non-PTSD Nonabused, Non-PTSD -1 Similar to dummy coding but the variable that won’t be included in the analysis (contrast variable) is assigned -1 intercept reflects grand mean (or overall mean) of all three groups
Criterion Scaling Do you have a categorical variable with a lot of categories? Instead of creating many dummy-coded variables, you can use criterion scaling where you form a single variable but each member of each group is coded with that group’s mean score
Summary
Recoding in SPSS Step 1
Recoding in SPSS Step 2 – select the variable you want to dummy code (gender) and then enter info about new variable in “output variable” and click “change” Variable name for new dummy coded variable
Step 3: Select “Old and New Values” Recoding in SPSS Step 3: Select “Old and New Values”
Step 4: Change old values(1,2) into new ones (0,1) Recoding in SPSS Step 4: Change old values(1,2) into new ones (0,1) OLD NEW
Step 4: Change old values(1,2) into new ones (0,1) Recoding in SPSS Step 4: Change old values(1,2) into new ones (0,1) Press continue, and okay – Done!
RECODE gender (MISSING=SYSMIS) (2=0) (1=1) INTO DummyGender. EXECUTE. Recoding in SPSS RECODE gender (MISSING=SYSMIS) (2=0) (1=1) INTO DummyGender. EXECUTE.
HW 7 The purpose of this HW is to get a better understanding of the interrelationship between ANOVA and MR. To this end, run the following analyses on a dataset of your choice: A one-way ANOVA with an IV with at least three categories or factors (e.g., Repeat this same analysis using Multiple Regression with dummy-coded variables Demonstrate mathematically that these analyses are essentially the same. Make frequent references to the output (specific stats) to demonstrate equivalence across the ANOVA and your regression analysis.
Moderation = Interaction Interaction effects = Moderation, or when the magnitude of the effect of one variable depends on another There is also a handout on this topic in the files containing answers to exercises
Self-esteem, Sex, & Achv’t
Self-esteem on Sex & Achv’t NEWSEX = 1 = FEMALES
Example of a cross-over or disordinal interaction Possible Interaction? Example of a cross-over or disordinal interaction Example of how gender moderates effect of achievement on self-esteem
Step 1 - Center Center the CONTINUOUS IV of interest (the one that will be used in the interaction term)
Centering Multicollinearity occurs when IVs are highly correlated, r > .8 or .9 Multicollinearity makes regression equations unstable It also violates one of the main assumptions of regression (independence of IVs)
Centering To center, you create a new variable by subtracting mean from original achievement variable Compute ACH_CENT=BYTESTS – 51.5758. EXECUTE.
Step 2: Create Cross-Product Term To create interaction (or cross-product) term, you multiply the two variables (gender x achievement), using the centered continuous variable Compute SEX_ACH=Sex * ACH_CENT. EXECUTE.
Without centering, this r would be higher Self-esteem, Sex, and Achv’t Centering reduces multicollinearity Without centering, this r would be higher
Step 3: Run Regression Analysis Model 1 SEX ACH_CENT Model 2 SEX_ACH
Self-esteem, Sex, & Achv’t No significant R2 change - interaction not significant... (regression lines for boys and girls are parallel)
Note: Interpretation on p. 136 Self-esteem, Sex, & Achv’t Since interaction is not significant, concentrate on interpreting coefficients from Model 1...What does the intercept now represent? Note: Interpretation on p. 136
Testing Interactions in MR The procedure is generally the same for testing interactions with two continuous variables Interactions are less stable than main effects; replication of interaction effects are somewhat rare Don’t throw out the model just because you don’t get a statistically significant main effect or interaction! Lack of statistical significance is sometimes equally important
Ethnic background & Achv’t on Self-esteem A significant interaction? Ethnic background & Achv’t on Self-esteem
Ethnicity x Achievement Conduct follow up to see where the statistical significance comes from: Is regression of SE on Achv’t significant for Whites or Non-Whites?
Self-esteem, Ethnicity & Achv’t Majority = 1 Since interaction was statistically significant, interpret coefficients from Model 2...What does the intercept represent?
Recap Center the IV of interest Create cross-products by multiplying centered IV x the dummy variable Regress DV on IVs - use centered IV Add the cross-products sequentially Is step statistically significant? If so, graph and conduct follow up analyses If not, then interpret the findings without the cross- product
HW 8 The purpose of this homework is to investigate interaction effects, specifically interactions between categorical and continuous variables. Said differently, the purpose of this assignment is to investigate moderator effects. For this assignment, conduct a sequential (or hierarchical) MR that tests for a specific interaction. Remember to center any continuous independent variables. Note that you need not report a statistically significant interaction effect; however, if you do find statistical significance, make sure you follow the procedures for interpreting interactions as outlined in your textbook. For the write up: Tweak the introduction to justify looking at interaction Make sure your research questions/hypotheses indicate examining moderation Write up results (formal) Interpret results (discussion)
SPSS - gender, SE on Achv’t Step 1 - Center the IV of interest (self-efficacy) Run descriptives to determine mean of self-efficacy DESCRIPTIVES VARIABLES=efficyw2 /STATISTICS=MEAN STDDEV MIN MAX.
SPSS - gender, SE on Achv’t Step 1 - Center the IV of interest (self-efficacy) Compute centered IV (you should also center all other IVs in your model) COMPUTE cefficyw2=efficyw2-3.4093. EXECUTE. DESCRIPTIVES VARIABLES=cefficyw2 /STATISTICS=MEAN STDDEV MIN MAX.
SPSS - gender, SE on Achv’t Step 2 -Create cross-products by multiplying centered IV x the dummy variable (gender) COMPUTE efficyXgender = cefficyw2 * gender.. EXECUTE.
SPSS - gender, SE on Achv’t Step 3 - Regress DV on IVs - use centered IV Add the cross-products sequentially
SPSS - gender, SE on Achv’t Step 4 - Regress DV on IVs - use centered IV Is step significant? If so, graph and conduct follow up analyses Graph (plot self-efficacy on Achievement by gender) Split file, run regressions by gender If not, then interpret the findings without the cross- product
Graphing
Make sure you fit line at subtotal Graphing Make sure you fit line at subtotal
Then run your bivariate regression analysis: Split File Then run your bivariate regression analysis: DV – TotalCoursepts IV – efficyw2
Split File No gender effects!
Questions and Clarification What is still confusing?