Categorical Variables in Regression

Slides:



Advertisements
Similar presentations
Selecting a Data Analysis Technique: The First Steps
Advertisements

Kruskal Wallis and the Friedman Test.
More on ANOVA. Overview ANOVA as Regression Comparison Methods.
Statistics for the Social Sciences
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Chi-square Test of Independence
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
Multiple Regression Lab Chapter Topics Multiple Linear Regression Effects Levels of Measurement Dummy Variables 2.
 Slide 1 Two-Way Independent ANOVA (GLM 3) Chapter 13.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Chi-square Test of Independence
Lab 9: Two Group Comparisons. Today’s Activities - Evaluating and interpreting differences across groups – Effect sizes Gender differences examples Class.
ANCOVA. What is Analysis of Covariance? When you think of Ancova, you should think of sequential regression, because really that’s all it is Covariate(s)
Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction.
Handout Twelve: Design & Analysis of Covariance
Handout Eight: Two-Way Between- Subjects Design with Interaction- Assumptions, & Analyses EPSE 592 Experimental Designs and Analysis in Educational Research.
Handout Ten: Mixed Design Analysis of Variance EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr. Amery Wu Handout Ten:
Multiple Regression David A. Kenny January 12, 2014.
Moderated Multiple Regression II Class 25. Regression Models Basic Linear Model Features: Intercept, one predictor Y = b 0 + b 1 + Error (residual) Do.
Outline of Today’s Discussion 1.Seeing the big picture in MR: Prediction 2.Starting SPSS on the Different Models: Stepwise versus Hierarchical 3.Interpreting.
ALISON BOWLING MODERATION AND MEDIATION IN REGRESSION.
ANCOVA.
PROFILE ANALYSIS. Profile Analysis Main Point: Repeated measures multivariate analysis One/Several DVs all measured on the same scale.
Chapter 9 Two-way between-groups ANOVA Psyc301- Spring 2013 SPSS Session TA: Ezgi Aytürk.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Multivariate vs Univariate ANOVA: Assumptions. Outline of Today’s Discussion 1.Within Subject ANOVAs in SPSS 2.Within Subject ANOVAs: Sphericity Post.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Simple Bivariate Regression
MANOVA Dig it!.
Multiple Regression: II
Lecture 10 Regression Analysis
Multiple Regression: I
Dr. Siti Nor Binti Yaacob
Why is this important? Requirement Understand research articles
Inferential Statistics
Moderation, Mediation, and Other Issues in Regression
Learning Objectives For models with dichotomous intendant variables, you will learn: Basic terminology from ANOVA framework How to identify main effects,
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
Multiple Regression.
Multiple Regression Example
Regression Assumptions of OLS.
Regression.
Dr. Siti Nor Binti Yaacob
Advanced Quantitative Analysis
Ass. Prof. Dr. Mogeeb Mosleh
Statistics for the Social Sciences
Prediction/Regression
Prediction/Regression
Multiple Regression – Split Sample Validation
Psych 231: Research Methods in Psychology
Individual Assignment 6
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Presentation transcript:

Categorical Variables in Regression Let’s categorize students. What’s the purpose of ANOVA? If I wanted to know PSGE 7211

Guiding Questions How are ANOVA and regression related? How do I analyze categorical variables in regression?

Reading Test Scores by Sex (T-test)

Reading Test Scores by Sex (BR)

What’s your birthday? Let’s divide the room according to birthdays Count Winter birthday Spring birthday Summer birthday Autumn birthday

vs. Methods of Coding When is your birthday? 1. Winter 2. Spring 3. Summer 4. Fall Is your b-day: YES NO 1. Winter? 1 2. Spring? 3. Summer? 4. Fall? vs.

vs. Methods of Coding What is your religious affiliation? 1. Protestant 2. Catholic 3. Jewish 4. Muslim 5. Other (or none) Are you: YES NO 1. Protestant? 1 2. Catholic? 3. Jewish? 4. Muslim? 5. Other? vs.

Methods of Coding What is your religious affiliation? 1. Protestant 2. Catholic 3. Jewish 4. Muslim 5. Other (or none)

DUMMY CODES Methods of Coding Are you: YES NO 1. Protestant? 1 2. Catholic? 3. Jewish? 4. Muslim? 5. Other? DUMMY CODES

False Memory and Sexual Abuse Bremner, Shobe, & Kihlstrom, 2000

False Memory ANOVA

For one-way ANOVA (one factor) ANOVA in SPSS For one-way ANOVA (one factor)

ANOVA in SPSS

What about in Regression? Step 1: Create categorical variables for group membership Need to create as many dummy variables as there are categories (g) minus 1 3 categories = we need to create 2 dummy variables

Intercept: the predicted value when the other variables=0 Converting to Dummy Variables Group VARIABLE 1: Abused, PTSD VARIABLE 2: Abused, Non-PTSD 1 Nonabused, Non-PTSD Intercept: the predicted value when the other variables=0 Intercept reflects Non-abused, Non-PTSD group; examining effect of Abused PTSD while controlling for Abused, non-PTSD Intercept reflects Non-abused, Non-PTSD group; examining effect of Abused Non-PTSD while controlling for Abused, PTSD

group abusepts no_PTSD. Remember to check your recoding List var group abusepts no_PTSD.

the means for all three groups are provided False Memory with Dummy Vars Post-hoc info and the means for all three groups are provided

Need for g - 1 Dummy Variables DV: Group Membership IVs: 2 dummy variables Note: 100% of variance is accounted for!

Was Regression Necessary? Doesn’t make much sense to run a multiple regression with only categorical IVs (much easier to run an ANOVA) But regression is great if you want to include both categorical and continuous variables

intercept reflects grand mean (or overall mean) of all three groups Effect Coding Group Abused, PTSD (EFFECT 1) Abused, Non-PTSD (EFFECT 2) Abused, PTSD 1 Abused, Non-PTSD Nonabused, Non-PTSD -1 Similar to dummy coding but the variable that won’t be included in the analysis (contrast variable) is assigned -1 intercept reflects grand mean (or overall mean) of all three groups

Criterion Scaling Do you have a categorical variable with a lot of categories? Instead of creating many dummy-coded variables, you can use criterion scaling where you form a single variable but each member of each group is coded with that group’s mean score

Summary

Recoding in SPSS Step 1

Recoding in SPSS Step 2 – select the variable you want to dummy code (gender) and then enter info about new variable in “output variable” and click “change” Variable name for new dummy coded variable

Step 3: Select “Old and New Values” Recoding in SPSS Step 3: Select “Old and New Values”

Step 4: Change old values(1,2) into new ones (0,1) Recoding in SPSS Step 4: Change old values(1,2) into new ones (0,1) OLD NEW

Step 4: Change old values(1,2) into new ones (0,1) Recoding in SPSS Step 4: Change old values(1,2) into new ones (0,1) Press continue, and okay – Done!

RECODE gender (MISSING=SYSMIS) (2=0) (1=1) INTO DummyGender. EXECUTE. Recoding in SPSS RECODE gender (MISSING=SYSMIS) (2=0) (1=1) INTO DummyGender. EXECUTE.

HW 7 The purpose of this HW is to get a better understanding of the interrelationship between ANOVA and MR. To this end, run the following analyses on a dataset of your choice: A one-way ANOVA with an IV with at least three categories or factors (e.g., Repeat this same analysis using Multiple Regression with dummy-coded variables Demonstrate mathematically that these analyses are essentially the same. Make frequent references to the output (specific stats) to demonstrate equivalence across the ANOVA and your regression analysis.

Moderation = Interaction Interaction effects = Moderation, or when the magnitude of the effect of one variable depends on another There is also a handout on this topic in the files containing answers to exercises

Self-esteem, Sex, & Achv’t

Self-esteem on Sex & Achv’t NEWSEX = 1 = FEMALES

Example of a cross-over or disordinal interaction Possible Interaction? Example of a cross-over or disordinal interaction Example of how gender moderates effect of achievement on self-esteem

Step 1 - Center Center the CONTINUOUS IV of interest (the one that will be used in the interaction term)

Centering Multicollinearity occurs when IVs are highly correlated, r > .8 or .9 Multicollinearity makes regression equations unstable It also violates one of the main assumptions of regression (independence of IVs)

Centering To center, you create a new variable by subtracting mean from original achievement variable Compute ACH_CENT=BYTESTS – 51.5758. EXECUTE.

Step 2: Create Cross-Product Term To create interaction (or cross-product) term, you multiply the two variables (gender x achievement), using the centered continuous variable Compute SEX_ACH=Sex * ACH_CENT. EXECUTE.

Without centering, this r would be higher Self-esteem, Sex, and Achv’t Centering reduces multicollinearity Without centering, this r would be higher

Step 3: Run Regression Analysis Model 1 SEX ACH_CENT Model 2 SEX_ACH

Self-esteem, Sex, & Achv’t No significant R2 change - interaction not significant... (regression lines for boys and girls are parallel)

Note: Interpretation on p. 136 Self-esteem, Sex, & Achv’t Since interaction is not significant, concentrate on interpreting coefficients from Model 1...What does the intercept now represent? Note: Interpretation on p. 136

Testing Interactions in MR The procedure is generally the same for testing interactions with two continuous variables Interactions are less stable than main effects; replication of interaction effects are somewhat rare Don’t throw out the model just because you don’t get a statistically significant main effect or interaction! Lack of statistical significance is sometimes equally important

Ethnic background & Achv’t on Self-esteem A significant interaction? Ethnic background & Achv’t on Self-esteem

Ethnicity x Achievement Conduct follow up to see where the statistical significance comes from: Is regression of SE on Achv’t significant for Whites or Non-Whites?

Self-esteem, Ethnicity & Achv’t Majority = 1 Since interaction was statistically significant, interpret coefficients from Model 2...What does the intercept represent?

Recap Center the IV of interest Create cross-products by multiplying centered IV x the dummy variable Regress DV on IVs - use centered IV Add the cross-products sequentially Is step statistically significant? If so, graph and conduct follow up analyses If not, then interpret the findings without the cross- product

HW 8 The purpose of this homework is to investigate interaction effects, specifically interactions between categorical and continuous variables. Said differently, the purpose of this assignment is to investigate moderator effects. For this assignment, conduct a sequential (or hierarchical) MR that tests for a specific interaction. Remember to center any continuous independent variables. Note that you need not report a statistically significant interaction effect; however, if you do find statistical significance, make sure you follow the procedures for interpreting interactions as outlined in your textbook. For the write up: Tweak the introduction to justify looking at interaction Make sure your research questions/hypotheses indicate examining moderation Write up results (formal) Interpret results (discussion)

SPSS - gender, SE on Achv’t Step 1 - Center the IV of interest (self-efficacy) Run descriptives to determine mean of self-efficacy DESCRIPTIVES VARIABLES=efficyw2 /STATISTICS=MEAN STDDEV MIN MAX.

SPSS - gender, SE on Achv’t Step 1 - Center the IV of interest (self-efficacy) Compute centered IV (you should also center all other IVs in your model) COMPUTE cefficyw2=efficyw2-3.4093. EXECUTE. DESCRIPTIVES VARIABLES=cefficyw2 /STATISTICS=MEAN STDDEV MIN MAX.

SPSS - gender, SE on Achv’t Step 2 -Create cross-products by multiplying centered IV x the dummy variable (gender) COMPUTE efficyXgender = cefficyw2 * gender.. EXECUTE.

SPSS - gender, SE on Achv’t Step 3 - Regress DV on IVs - use centered IV Add the cross-products sequentially

SPSS - gender, SE on Achv’t Step 4 - Regress DV on IVs - use centered IV Is step significant? If so, graph and conduct follow up analyses Graph (plot self-efficacy on Achievement by gender) Split file, run regressions by gender If not, then interpret the findings without the cross- product

Graphing

Make sure you fit line at subtotal Graphing Make sure you fit line at subtotal

Then run your bivariate regression analysis: Split File Then run your bivariate regression analysis: DV – TotalCoursepts IV – efficyw2

Split File No gender effects!

Questions and Clarification What is still confusing?