Categorical Data Prof. Andy Field.

Slides:



Advertisements
Similar presentations
LEARNING PROGRAMME Hypothesis testing Part 2: Categorical variables Intermediate Training in Quantitative Analysis Bangkok November 2007.
Advertisements

The Chi-Square Test for Association
Analysis of frequency counts with Chi square
Log-linear Analysis - Analysing Categorical Data
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
CJ 526 Statistical Analysis in Criminal Justice
Chi-square Test of Independence
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Inferential Statistics
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Inferential Statistics: SPSS
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
CJ 526 Statistical Analysis in Criminal Justice
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
1 Chi-Square Heibatollah Baghi, and Mastee Badii.
Chapter 16 The Chi-Square Statistic
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Copyright © 2010 Pearson Education, Inc. Slide
Reasoning in Psychology Using Statistics Psychology
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 14 Chi-Square Tests.  Hypothesis testing procedures for nominal variables (whose values are categories)  Focus on the number of people in different.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square Analyses.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Nonparametric Statistics
STATISTICAL TESTS USING SPSS Dimitrios Tselios/ Example tests “Discovering statistics using SPSS”, Andy Field.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Comparing Counts Chi Square Tests Independence.
I. ANOVA revisited & reviewed
Nonparametric Statistics
Chi-Square (Association between categorical variables)
Categorical Data Aims Loglinear models Categorical data
Chapter 25 Comparing Counts.
Nonparametric Statistics
Reasoning in Psychology Using Statistics
Analyzing the Association Between Categorical Variables
Hypothesis Testing Part 2: Categorical variables
Chapter 26 Comparing Counts.
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Presentation transcript:

Categorical Data Prof. Andy Field

Aims Loglinear Models Categorical Data Contingency Tables Chi-Square test Likelihood Ratio Odds Ratio Loglinear Models Theory Assumptions Interpretation Slide 2

Categorical Data Sometimes we have data consisting of the frequency of cases falling into unique categories Examples: Number of people voting for different politicians Numbers of students who pass or fail their degree in different subject areas. Number of patients or waiting list controls who are ‘free from diagnosis’ (or not) following a treatment. Slide 3

An Example: Dancing Cats and Dogs Analyzing two or more categorical variables The mean of a categorical variable is meaningless The numeric values you attach to different categories are arbitrary The mean of those numeric values will depend on how many members each category has. Therefore, we analyze frequencies. An example Can animals be trained to line-dance with different rewards? Participants: 200 cats Training The animal was trained using either food or affection, not both) Dance The animal either learnt to line-dance or it did not. Outcome: The number of animals (frequency) that could dance or not in each reward condition. We can tabulate these frequencies in a contingency table.

A Contingency Table

Pearson’s Chi-Square Test Use to see whether there’s a relationship between two categorical variables Compares the frequencies you observe in certain categories to the frequencies you might expect to get in those categories by chance. The equation: i represents the rows in the contingency table and j represents the columns. The observed data are the frequencies the contingency table The ‘Model’ is based on ‘expected frequencies’. Calculated for each of the cells in the contingency table. n is the total number of observations (in this case 200). Test Statistic Checked against a distribution with (r − 1)(c − 1) degrees of freedom. If significant then there is a significant association between the categorical variables in the population. The test distribution is approximate so in small samples use Fisher’s exact test.

Pearson’s Chi-Square Test

Likelihood Ratio Statistic An alternative to Pearson’s chi-square Based on maximum-likelihood theory. Create a model for which the probability of obtaining the observed set of data is maximized This model is compared to the probability of obtaining those data under the null hypothesis The resulting statistic compares observed frequencies with those predicted by the model: i and j are the rows and columns of the contingency table and ln is the natural logarithm Test Statistic Has a chi-square distribution with (r − 1)(c − 1) degrees of freedom. Preferred to the Pearson’s chi-square when samples are small.

Likelihood Ratio Statistic

Interpreting Chi-Square The test statistic gives an ‘overall’ result. We can break this result down using standardized residuals There are two important things about these standardized residuals: Standardized residuals have a direct relationship with the test statistic (they are a standardized version of the difference between observed and expected frequencies). These standardized are z-scores (e.g. if the value lies outside of ±1.96 then it is significant at p < .05 etc.). Effect Size The odds ratio can be used as an effect size measure.

Loglinear Analysis When? Example: Dancing Dogs To look for associations between three or more categorical variables Example: Dancing Dogs Same example as before but with data from 70 dogs. Animal Dog or cat Training Food as reward or affection as reward Dance Did they dance or not? Outcome: Frequency of animals

Theory Our model has three predictors and their associated interactions: Animal, Training, Dance, Animal × Training, Animal × Dance, Dance × Training, Animal × Training × Dance Such a linear model can be expressed as: A loglinear Model can also be expressed like this, but the outcome is a log value:

Backward Elimination Begins by including all terms: Animal, Training, Dance, Animal × Training, Animal × Dance, Dance × Training, Animal × Training × Dance Remove a term and compares the new model with the one in which the term was present. Starts with the highest-order interaction Uses the likelihood ratio to ‘compare’ models: If the new model is no worse than the old, then the term is removed and the next highest-order interactions are examined, and so on.

Important Points The chi-square test has two important assumptions: Independence: Each person, item or entity contributes to only one cell of the contingency table. The expected frequencies should be greater than 5. In larger contingency tables up to 20% of expected frequencies can be below 5, but there a loss of statistical power. Even in larger contingency tables no expected frequencies should be below 1. If you find yourself in this situation consider using Fisher’s exact test. Proportionately small differences in cell frequencies can result in statistically significant associations between variables if the sample is large enough Look at row and column percentages to interpret effects.

General Procedure for analysing categorical outcomes

Chi-Square in SPSS: Weighting Cases

Output

Output

The Odds Ratio

Interpretation There was a significant association between the type of training and whether or not cats would dance χ2(1) = 25.36, p < .001. Based on the odds ratio, the odds of cats dancing were 6.65 times higher if they were trained with food than if trained with affection.

Loglinear Models in SPSS

Loglinear Models: Options

Output from a Loglinear Model

Output from a Loglinear Model

Output from a Loglinear Model

Visual Interpretation

Following up with Chi-Square Tests Cats: Dogs:

The Odds Ratio for Dogs

Interpretation Loglinear analysis produced a final model that retained all effects. The animal  training  dance interaction was significant, 2(1) = 20.31, p < .001. Chi-square tests on the training and dance variables were performed separately for dogs and cats. For cats, there was a significant association between the type of training and whether or not cats would dance, 2 (1) = 25.36, p < .001; this was true in dogs also, 2 (1) = 3.93, p = .047. The odds of dancing were 6.65 higher after food than affection in cats, but only 0.35 in dogs (i.e. in dogs, the odds of dancing were 2.90 times lower if trained with food compared to affection). The analysis reveals that cats are more likely to dance for food rather than affection, whereas the opposite is true for dogs.

To Sum Up … We approach categorical data in much the same way as any other kind of data: we fit a model, we calculate the deviation between our model and the observed data, and we use that to evaluate the model we’ve fitted. We fit a linear model. Two categorical variables Pearson’s chi-square test Likelihood ratio test Three or more categorical variables: Loglinear model. For every variable we get a main effect We also get interactions between all combinations of variables. Loglinear analysis evaluates these effects hierarchically. Effect Sizes The odds ratio is a useful measure of the size of effect for categorical data. Slide 31