Statistical Analysis of Categorical Variables

Slides:



Advertisements
Similar presentations
Contingency Table Analysis Mary Whiteside, Ph.D..
Advertisements

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Chapter 18: The Chi-Square Statistic
Chapter 11 Other Chi-Squared Tests
Hypothesis Testing and Comparing Two Proportions Hypothesis Testing: Deciding whether your data shows a “real” effect, or could have happened by chance.
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Basic Statistics The Chi Square Test of Independence.
Hypothesis Testing IV Chi Square.
Chapter 13: The Chi-Square Test
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Analysis of frequency counts with Chi square
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Session 7.1 Bivariate Data Analysis
Chi-square Test of Independence
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Cross Tabulation Statistical Analysis of Categorical Variables.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Copyright © 2012 by Nelson Education Limited. Chapter 10 Hypothesis Testing IV: Chi Square 10-1.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
CHI SQUARE TESTS.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chi-Square Test ( 2 ) Chi-Square Test (  2 ) Used when dependent variable is counts within categories Used when DV has two or more mutually exclusive.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Chapter 14 Chi-Square Tests.  Hypothesis testing procedures for nominal variables (whose values are categories)  Focus on the number of people in different.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Comparing Counts Chi Square Tests Independence.
Introduction to Marketing Research
Cross Tabulation with Chi Square
Basic Statistics The Chi Square Test of Independence.
CHAPTER 26 Comparing Counts.
Chapter 9: Non-parametric Tests
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis mutually exclusive exhaustive.
Chapter 11 Chi-Square Tests.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Making Use of Associations Tests
Hypothesis Testing Review
Chapter 25 Comparing Counts.
Qualitative data – tests of association
Summarising and presenting data - Bivariate analysis
The Chi-Square Distribution and Test for Independence
Testing for Independence
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
Chapter 26 Comparing Counts.
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Chapter 26 Comparing Counts.
Statistical Analysis of Categorical Variables
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Two Way Tables
Chapter 18: The Chi-Square Statistic
Chapter 26 Comparing Counts.
Chapter 11 Chi-Square Tests.
Presentation transcript:

Statistical Analysis of Categorical Variables Cross Tabulation Statistical Analysis of Categorical Variables

To date…. We have examined statistical tests for differences of means and proportions The statistics are measured at the interval level.

New Test… Now we wish to examine statistical tests for questions involving nominal and ordinal variables. To do so we introduce the Chi Square Test.

Cross Tabulation We are interested in the counting the number of cases for the categories of one variable in terms of the categories of a second variable, and…. Implicitly, we are asking if there are differences in the patterns of the counts….

Cross Tabulation and Chi Square Test A cross tabulation cross classifies one variable by another variable. Below is a cross classification of occupational groups and wards for the Simon data for 1905. http://www.uwm.edu/~margo/595/simondata.htm

Cross Tabulation and Chi Square Test We count the number of cases in each occupational category for each ward. At the edges of the table we total the rows and columns.

Graphic Illustration of the Counts of Occupational Groups by Ward

The Simplest Example….a 2 by 2 Table Do the opinions of men and women differ on the War in Iraq? Do the opinions of men and women differ on the importance of capturing Osama Bin Laden? Data: September 2006 ABC News Poll on the War on Terror. A Sample of about 1000 respondents. https://pantherfile.uwm.edu/margo/public/History595/ABCPollSept112005.sav

Basic Frequencies

Basic Frequencies Broken Down by Gender

Another Graphical Illustration: 4 Bins of Counts

Tabular Data

Some Terms and Assumptions Cell frequency: number in the body of the table Marginal total: total of the row or the column Row percent: the proportion of cases in the cell for the particular row. Column percent: the proportion of cases in the cell for the particular column Expected frequency: the number of cases expected based upon the marginal proportions Deviation: the difference between the expected frequency and the actual frequency

Tabular Data Cell Frequency Marginals Q12 War worth fighting NET * Q921 GENDER Crosstabulation Count Q921 GENDER Male Female Total Q12 War worth Worth fighting NET 213 217 430 fighting NET Not worth fighting NET 245 307 552 Total 458 524 982 Marginals

213 245 217 307 Table Counts and Graph

Row Percents

Column Percents

Frequencies, Row and Column Percents

New Concept: Expected Frequencies What would the counts in the cells be if there was no impact of gender on attitudes towards the Iraq War? The marginal proportions would define the cell counts.

Expected Frequencies Row Total * Column Total/ Grand Total Or… Row Proportion * Column Total Column Proportion * Row Total

Another Example: The Importance of Capturing Osama Bin Laden

Frequencies by Gender

Frequencies By Gender

Row Percents

Column Percents

Expected Frequencies

Actual Frequencies, Expected Frequencies, and Deviations (Residual)

Chi Square Chi Square = Sum of [ (Expected – Observed)2 / Expected Frequency ] Chi Square Table: http://www.itl.nist.gov/div898/handbook//eda/section3/eda3674.htm

Examples of Chi Square Distribution http://davidmlane.com/hyperstat/A100557.html; see also http://www.stat.tamu.edu/~west/applets/chisqdemo.html; http://davidmlane.com/hyperstat/chi_square.html

Degrees of Freedom for Chi Square Degrees of Freedom = (r-1)* (c-1) So, 2 by 2 table has 1 degree of freedom 3 by 2 table has (3-1)(2-1)= 2 degrees of freedom

Calculations: Catching Osama bin Laden by Gender 640.09/198.3 = 3.23 640.09/220.7 = 2.90 640.09/251.7 = 2.54 640.09/280.3 = 2.29 Chi Square (SUM) = 10.96

Attitudes toward Iraq War by Gender

Chi Square Test For a larger table, calculation is the same, but the number of terms increases. The number of terms is equal to the number of cells.

Concentration of Occupational Groups by Ward

Cross Tabulation Are the occupational patterns different in the four wards? Or….are the patterns a result of chance? (null hypothesis) How would we decide?

Illustration: Frequencies and Marginals

Row and Column Percents

Expected and Actual Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 Total 314 128 344 288 1074 Expected values OCC$ (rows) by WARD (columns) 14 18 20 22 +-----------------------------------------+ profcler | 44.15 18.00 48.36 40.49 | prop | 46.49 18.95 50.93 42.64 | skilled | 107.88 43.98 118.19 98.95 | skillpart | 26.61 10.85 29.15 24.40 | unskilled | 88.88 36.23 97.37 81.52 |

Deviates Deviates: (Observed-Expected) OCC$ (rows) by WARD (columns) 14 18 20 22 +-----------------------------------------+ profcler | -35.147 37.004 -18.365 16.508 | prop | -19.486 14.050 3.073 2.363 | skilled | -17.883 -27.978 30.810 15.050 | skillpart | -13.605 1.155 10.853 1.598 | unskilled | 86.121 -24.231 -26.371 -35.520 |

Calculations Case number OCC$ WARD FREQUENCY EXPECTED RESIDUAL CHITERM 1 profcler 14.000 9.000 44.147 -35.147 27.982 2 profcler 18.000 55.000 17.996 37.004 76.087 3 profcler 20.000 30.000 48.365 -18.365 6.973 4 profcler 22.000 57.000 40.492 16.508 6.730 5 prop 14.000 27.000 46.486 -19.486 8.168 6 prop 18.000 33.000 18.950 14.050 10.418 7 prop 20.000 54.000 50.927 3.073 0.185 8 prop 22.000 45.000 42.637 2.363 0.131 9 skilled 14.000 90.000 107.883 -17.883 2.964 10 skilled 18.000 16.000 43.978 -27.978 17.799 11 skilled 20.000 149.000 118.190 30.810 8.032 12 skilled 22.000 114.000 98.950 15.050 2.289 13 skillpart 14.000 13.000 26.605 -13.605 6.957 14 skillpart 18.000 12.000 10.845 1.155 0.123 15 skillpart 20.000 40.000 29.147 10.853 4.041 16 skillpart 22.000 26.000 24.402 1.598 0.105 17 unskilled 14.000 175.000 88.879 86.121 83.449 18 unskilled 18.000 12.000 36.231 -24.231 16.205 19 unskilled 20.000 71.000 97.371 -26.371 7.142 20 unskilled 22.000 46.000 81.520 -35.520 15.477

Review: Terms and Assumptions Cell frequency: number in the body of the table Marginal total: total of the row or the column Row percent: the proportion of cases in the cell for the particular row. Column percent: the proportion of cases in the cell for the particular column Expected frequency: the number of cases expected based upon the marginal proportions Deviation: the difference between the expected frequency and the actual frequency

Strength of Relationships Phi: Square root of (Chi Square/N) Cramer’s V: Square root of (Chi Square/n*min(r-1, c-1)) Contingency Coefficient: Square root of (Chi Square/(Chi Square+n))