Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.

Slides:



Advertisements
Similar presentations
Sociology 690 Multivariate Analysis Log Linear Models.
Advertisements

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
LEARNING PROGRAMME Hypothesis testing Part 2: Categorical variables Intermediate Training in Quantitative Analysis Bangkok November 2007.
Simple Logistic Regression
Logistic Regression Example: Horseshoe Crab Data
Loglinear Contingency Table Analysis Karl L. Wuensch Dept of Psychology East Carolina University.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Analysis of frequency counts with Chi square
Log-linear Analysis - Analysing Categorical Data
(Hierarchical) Log-Linear Models Friday 18 th March 2011.
CJ 526 Statistical Analysis in Criminal Justice
Chapter Goals After completing this chapter, you should be able to:
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
AS 737 Categorical Data Analysis For Multivariate
Categorical Data Prof. Andy Field.
Inferential Statistics: SPSS
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Multilevel and Random Coefficients Models School of Nursing “Multi-Level Models: What Are They and How Do They Work? ” Melinda K. Higgins, Ph.D. 30 March.
CJ 526 Statistical Analysis in Criminal Justice
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
A. Analysis of count data
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Categorical Data Analysis: Logistic Regression and Log-Linear Regression 26 Nov 2010 CPSY501 Dr. Sean Ho Trinity Western University For discussion: Myers.
Advanced statistics for master students Loglinear models.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Chapter 14: Chi-Square Procedures – Test for Goodness of Fit.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Logistic Regression Analysis Gerrit Rooks
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
STATISTICAL TESTS USING SPSS Dimitrios Tselios/ Example tests “Discovering statistics using SPSS”, Andy Field.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Melinda K. Higgins, Ph.D. 11 April 2008
Categorical Data Aims Loglinear models Categorical data
Multiple logistic regression
Nonparametric Statistics
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Hypothesis Testing and Comparing Two Proportions
Logistic Regression.
Overview and Chi-Square
Lexico-grammar: From simple counts to complex models
Presentation transcript:

Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda K. Higgins, Ph.D. 6 April 2009

School of Nursing Categorical Data Analysis Categorical Data Categorical data can be distinct groups (such as gender: male, female) or it can be due to some “split” of an originally continuous variable (such as BDI-II (Beck Depression Index) 0-13 not-depressed, above 14 is depressed). Begin with 2 x 2 tables – understanding basics of Chi- square test and odds ratios Underlying Logit model  more general Log-linear models What if you have more than 2 categorical variables? Multiway Frequency Analysis (MFA) (or possibly Logistic Regression if one is a an outcome to predict)

School of Nursing Categorical Data Analysis 2 x 2 Tables (Crosstabs) – Chi-square test Example from A. Field “Discovering Statistics Using SPSS” 200 cats – goal: “teach them to line dance” 2 variables: Training – food or affection as reward Dance – did they dance? (yes, no) 2 ways to enter data into SPSS: Raw data file 200 rows – 2 columns (training, dance) Using “weights”

School of Nursing Categorical Data Analysis 2 x 2: Raw Data

School of Nursing Categorical Data Analysis 2 x 2: Using Weights

School of Nursing Categorical Data Analysis 2 x 2: Analysis

School of Nursing Categorical Data Analysis 2 x 2 Results 1 st check to make sure that all cell “expected counts” are greater than 5. You will get a warning if any cell is less than 5. If a cell is less than 5 you may want to consider collapsing categories (assuming you have more than 2). Review %’s – good way to summarize data The Chi-square test – tests whether the two variables are independent or not (is there an association or not)? H0: 2 variables are independent [no group differences] Ha: variables are not independent (are related) [there are differences between the groups]

School of Nursing Categorical Data Analysis

School of Nursing Categorical Data Analysis 2 x 2 Results Chi-square Pval < 0.001, so we reject H0 and conclude there is a relationship between training and whether the cats danced or not. For the cats who danced, 74% received food as a reward compared to only 26% who received food as a reward for the cats who did not dance. Odds: Odds (dancing after food) = number w/food and did dance / number w/food and did not dance = 28/10 = 2.8 Odds (dancing after affection) = number w/affection did dance / number w/affection did not dance = 48/114 = Odds ratio = Odds-dancing w/food / odds-dancing w/affection = 2.8/0.421 = 6.65 “If a cat was trained with food, it was 6.65 times more likely to dance.”

School of Nursing Categorical Data Analysis Logit Model As in logistic regression we are interested in predicting the probability of an outcome occurring (rather than predicting the actual value of the outcome) A “log-likelihood” statistic is used to “assess the fit of the model” [e.g. expected versus observed counts] So, if the “general form” of this 2x2 chi-square test (as a regression model) is: Outcome i = (model i ) + error i Outcome i = (b o + b 1 A i + b 2 B i + b 3 AB i ) +  i Outcome i = (b o + b 1 Training i + b 2 Dance i + b 3 Interaction i ) +  i But we’re really predicting the “probability” – so we take the log: ln(O i ) = (b o + b 1 Training i + b 2 Dance i + b 3 Interaction i ) + ln(  i )

School of Nursing Categorical Data Analysis Multi-way Frequency Analysis [Log-Linear Analysis] The purpose of multi-way frequency analysis (MFA) is to discover associations among discrete variables. [more than 2x2 and more than 2 levels] [Tabacknick, et.al. 2007] After preliminary screening for associations, a model is “fit” that includes only the associations necessary to reproduce to observed frequencies (ideally the “simplest” model) The model’s parameter estimates are used to predict expected frequencies in each “cell.”

School of Nursing Categorical Data Analysis “Log-linear/MFA Model” [for 3 variables] “intercept” “main effects” “first-order effects” “2-way interaction effects” “second-order effects” “3-way interaction effect” “third-order effects” “natural log of the expected frequency in cell ijk”

School of Nursing Categorical Data Analysis Another Example Comparison of Reading Material Preference (Science Fiction vs Spy Novels) by Gender and Profession 155 subjects

School of Nursing Categorical Data Analysis Multi “Layered” Chi-Squares (2x2 Crostabs)

School of Nursing Categorical Data Analysis Layer = Profession [test gender x readingtype]

School of Nursing Categorical Data Analysis Layer = Gender [test profession x reading type]

School of Nursing Categorical Data Analysis Layer = Reading Type [test gender x profession] So it appears there is a difference for Gender x Profession within Reading Type

School of Nursing Categorical Data Analysis Some Notes To Remember If the model contains higher ordered effects, then all lower ordered effects should be retained. For example, if a two-way intereaction (AB) is significant, then both main effects (A) and (B) should be included. Likewise, if a third-order effect (ABC) is significant then all two-way interactions (AB, AC, BC) as well as all main effects (A) (B) and (C) should be included. As such these model are sometimes referred to as “hierarchical or nested” loglinear models.

School of Nursing Categorical Data Analysis Full Model Analysis [SPSS HILOGLINEAR] HILOGLINEAR Profession(1 3) Gender(1 2) ReadingType(1 2) /CWEIGHT=Frequency /METHOD=BACKWARD /CRITERIA MAXSTEPS(10) P(.05) ITERATION(20) DELTA(.5) /PRINT=FREQ RESID ASSOCIATION ESTIM /DESIGN. So, from these results, we can conclude, that at least one 2-way effect is significant.

School of Nursing Categorical Data Analysis HILOGLINEAR (cont’d) So, from these results, we can conclude, that the profession x gender is important and that reading type is also important. So, let’s look at a reduced model with just these effects.

School of Nursing Categorical Data Analysis Reduced Model [Reading Type, Gender, Profession and Profession x Gender] LOGLINEAR Profession (1 3) Gender (1 2) ReadingType (1 2) /PRINT=ESTIM /DESIGN profession*gender profession gender readingtype.

School of Nursing Categorical Data Analysis Results – SPSS LOGLINEAR * * * * * * * * * L O G L I N E A R A N A L Y S I S * * * * * * * * * Correspondence Between Effects and Columns of Design/Model 1 Starting Ending Column Column Effect Name 1 2 profession * gender 3 4 profession 5 5 gender 6 6 readingtype *** ML converged at iteration 4. Maximum difference between successive iterations = Goodness-of-Fit test statistics Likelihood Ratio Chi Square = DF = 5 P =.256 Pearson Chi Square = DF = 5 P =.253

School of Nursing Categorical Data Analysis Estimates for Parameters profession * gender Parameter Coeff. Std. Err. Z-Value Lower 95 CI Upper 95 CI profession Parameter Coeff. Std. Err. Z-Value Lower 95 CI Upper 95 CI gender Parameter Coeff. Std. Err. Z-Value Lower 95 CI Upper 95 CI readingtype Parameter Coeff. Std. Err. Z-Value Lower 95 CI Upper 95 CI

School of Nursing Categorical Data Analysis Summary This is only a quick introduction – I encourage you to work through the exercises in both Andy Field and Tabacknick, et.al. for more thourough explanations. Explore the additional features within the SPSS/Loglinear Models section. Screen your data (for more than 2 categorical variables) using “layers” within the SPSS Crosstabs Procedure.

School of Nursing Categorical Data Analysis References Field, Andy. “Discovering Statistics Using SPSS,” 2 nd edition, SAGE Publications, [Chapter 7 focuses on Logistic Regression; Chapter 16 focuses on Categorical Data.] Tabachnick, Barbara G.; Fidell, Linda S. “Using Multivariate Statistics,” 5 th edition, Pearson Education Inc., [Chapter 15 focuses on Multilevel Linear Modeling.] *

School of Nursing Categorical Data Analysis VIII. Statistical Resources and Contact Info SON S:\Shared\Statistics_MKHiggins\website2\index.htm [updates in process] Working to include tip sheets (for SPSS, SAS, and other software), lectures (PPTs and handouts), datasets, other resources and references Statistics At Nursing Website: [website being updated] And Blackboard Site (in development) for “Organization: Statistics at School of Nursing” Contact Dr. Melinda Higgins Office: / Mobile: