Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Bivariate Analyses.
Advertisements

Basic Statistics The Chi Square Test of Independence.
Bivariate Analysis Cross-tabulation and chi-square.
Hypothesis Testing IV Chi Square.
Chapter 13: The Chi-Square Test
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
CJ 526 Statistical Analysis in Criminal Justice
Chi-square Test of Independence
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Presentation 12 Chi-Square test.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Copyright © 2005 by Evan Schofer
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
CJ 526 Statistical Analysis in Criminal Justice
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Week 10 Chapter 10 - Hypothesis Testing III : The Analysis of Variance
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Tests of Significance June 11, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Sociology 5811: Lecture 14: ANOVA 2
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Copyright © 2012 by Nelson Education Limited. Chapter 10 Hypothesis Testing IV: Chi Square 10-1.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
1 The  2 test Sections 19.1 and 19.2 of Howell This section actually includes 2 totally separate tests goodness-of-fit test contingency table analysis.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Chi-square Test of Independence
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chapter 11, 12, 13, 14 and 16 Association at Nominal and Ordinal Level The Procedure in Steps.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Nonparametric Tests of Significance Statistics for Political Science Levin and Fox Chapter Nine Part One.
Chapter Outline Goodness of Fit test Test of Independence.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
Chapter 22 Inferences From Nominal Data: the Χ 2 Statistic.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Basic Statistics The Chi Square Test of Independence.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Hypothesis Testing Review
Qualitative data – tests of association
Chapter 14 in 1e Ch. 12 in 2/3 Can. Ed.
The Chi-Square Distribution and Test for Independence
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 18: The Chi-Square Statistic
Chapter 26 Comparing Counts.
Presentation transcript:

Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements Final Project Assignment Handed out Proposal due November 15 Final Project due December 13 Today’s class: New Topic: Crosstabulation Also called “crosstabs” Coming Soon: correlation, regression

Crosstabulation: Introduction T-Test and ANOVA look to see if groups differ on a continuous dependent variable Groups are actually a nominal variable Example: Do different ethnic groups vary in wages? Difference in means for two groups indicates a relationship between two variables Null hypothesis (means are the same) suggests that there is no relationship between variables Alternate hypothesis (means differ) is equivalent to saying that there is a relationship.

Crosstabulation: Introduction T-test and ANOVA determine whether there is a statistical relationship between a nominal variable and a continuous variable in your data But, we may be interested in two nominal variables Examples: Class and unemployment; gender and drug use Crosstabulation: used for nominal/ordinal variables Tools to descriptively examine variables Tools to identify whether there is a relationship between two variables.

Crosstabulation: Introduction What is bivariate crosstabulation? Start two nominal variables in a dataset: Example: gender (Male/Female) and political party (Democrat, Republican) Crosstabulation is simply counting up the number of people in each combined category How many democratic women? democratic men? republican women? Republican men? It is similar to computing frequencies But, for two variables jointly, rather than just one.

Crosstabulation: Introduction Example: Female = 1, Democrat = 1 IDGenderPolitical Party Question: How many Republican Women are in the dataset? Answer: 2

Crosstabulation: Introduction Example: Dataset of 68 people Look and count up the number of people in each combined category Or, determine frequency along the first variable: Frequency: 43 women, 25 men Then break out groups by the second variable Of 43 women, 27 = democrat, 16 = republican Of 25 men, 10 = democrat, 15 = republican.

Crosstabulation: Introduction Crosstab: a table that presents joint frequencies Also called a “joint contingency table” WomenMen Democrat2710 Republican1615 Each box with a value is a “cell” This is a table row This is a table column

Crosstabulation: Introduction Tables may also have additional information: Row and column marginals (i.e., totals) WomenMenTotal Dem Rep Total = + = This is the total N

Crosstabulation: Introduction Tables can also reflect percentages Either of total N, or of row or column marginals This table shows percentage of total N: WomenMenN Dem39.7%14.7%37 Rep23.5%22.1%31 N Just divide each cell value by the total N to get a proportion. Multiply by 100 for a percentage: (10/68)(100)=

Crosstabulation: Introduction In addition, you can calculate percentages with respect to either row or column marginals Here is an example of column percentages WomenMenN Dem62.8%40.0%37 Rep37.2%60.0%31 N Just divide each cell by the column marginal to get a proportion. Multiply by 100 for a percentage: (10/25)(100)=40%

Crosstabulation: Independence Question: How can we tell if there is a relationship between the two variables? Answer: If category on one variable appears to be linked to category on the other: WomenMenN Dem430 Rep025 N432568

Crosstabulation: Independence If there is no relationship between two variables, they are said to be “independent” Neither “depends” on the other If there is a relationship, the variables are said to be “associated” or to “covary” If individuals in one category also consistently fall in another (women=dem, men=rep), you may suspect that there is a relationship between the two variables Just as when the mean of a certain sub-group is much higher or lower than another (in T-test/ANOVA).

Crosstabulation: Independence Relationships aren’t always very clearly visible Widely differing numbers of people in categories make comparisons difficult (e.g., if there were 200 men and only 15 women in the sample) And, large tables become more difficult to interpret (Example: Knoke, p. 157) Looking at row or column percentages can make visual interpretation a bit easier Calculate the percentages within the category you think is the “independent” variable If you think that political party affiliation depends on gender (column variable), look a column percentages.

Crosstabulation: Independence Here, column percentages highlight the relationship among variables: WomenMenN Dem62.8%40.0%37 Rep37.2%60.0%31 N It appears as though women tend to be more democratic, while men tend to be republican

Chi-square Test of Independence In the sample, women appear to be more democratic, men republican How do we know if this difference is merely due to sampling variability? (Thus, there is no relationship in the population?) Or, is it indicative of a relationship at the population level? Answer: A new kind of statistical test The chi-square (  2 ) test Pronunciation: “chi” rhymes with “sky” Chi-square tests: Similar to T-tests, F-tests Another family of distributions with known properties.

Chi-square Test of Independence Chi-Square test is a test of independence Asks “is there a relationship between variables or not?” Independence = no relationship ANOVA, T-Test do this too (same means = independent) Null hypothesis: the two variables are statistically independent H0: Gender and political party are independent There is no relationship between them Alternate hypothesis: the variables are related, not independent of each other H1: Gender and political party are not independent.

Chi-square Test of Independence How does a chi-square test of independence work? It is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables Definitions: Observed values = values in the crosstab cells based on your sample Expected values = crosstab cell values you would expect if your variables were unrelated.

Crosstabs: Notation The value in a cell is referred to as a frequency –Math symbol = f Cells are referred to by row and column numbers –Ex: women republicans = 2 nd row, 1 st column –In general, rows are numbered from 1 to i, columns are numbered from 1 to j Thus, the value in any cell of any table can be written as: – f ij

Expected Cell Values If two variables are independent, cell values will depend only on row & column marginals –Marginals reflect frequencies… And, if frequency is high, all cells in that row (or column) should be high The formula for the expected value in a cell is: f i and f j are the row and column marginals N is the total sample size

Expected Cell Values Expected cell values are easy to calculate –Expected = row marginal * column marginal / N WomenMenN Dem Rep N RowM * ColM / N (25*37)/68=13.6

Expected Cell Values Question: What makes these values “expected”? A: They simply reflect percentages of marginals Look at column %’s based on expected values: WomenMenN Dem54% 37 (54%) Rep46% 31 (46%) N432568

Expected Cell Values Expected values are “expected” because they mirror the properties of the sample. If the sample is 63% women, you’d expect: –63% of democrats would be women and –63% of republicans would be women If not, the variables (gender & political view) would not be “independent” of each other

Chi-Square Test of Independence The Chi-square test is a comparison of expected and observed values For each cell, compute: Then, sum this up for all cells If cells all deviate a lot from the expected values, then the sum is large Maybe we can reject H0

Chi-square Test of Independence The actual Chi-square formula: R = total number of rows in the table C = total number of columns in the table E ij = the expected frequency in row i, column j O ij = the observed frequency in row i, column j Question: Why square E – O ?

Chi-square Test of Independence Assumptions require for Chi-square test: Only one: Sample size is large, N > 100 Hypotheses –H0: Variables are statistically independent –H1: Variables are not statistically independent The critical value can be looked up in a Chi- square table –See Knoke, p –Calculate degrees of freedom: (#Rows-1)(#Col-1)

Chi-square Test of Independence Example: Gender and Political Views –Let’s pretend that N of 68 is sufficient WomenMen Democrat O 11 : 27 E 11 : 23.4 O 12 : 10 E 12 : 13.6 Republican O 21 : 16 E 21 : 19.6 O 22 : 15 E 22 : 11.4

Chi-square Test of Independence Compute (E – O) 2 /E for each cell WomenMen Democrat (23.4 – 27) 2 /23.4 =.55 (13.6 – 10) 2 /13.6 =.95 Republican (19.6 – 16) 2 /19.6 =.66 (11.4 – 15) 2 /15 =.86

Chi-Square Test of Independence Finally, sum up to compute the Chi-square  2 = = 3.02 What is the critical value for  =.05? Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1 According to Knoke, p. 509: Critical value is 3.84 Question: Can we reject H0? No.  2 of 3.02 is less than the critical value We cannot conclude that there is a relationship between gender and political party affiliation.

Chi-square Test of Independence Weaknesses of chi-square tests: 1. If the sample is very large, we almost always reject H0. Even tiny covariations are statistically significant But, they may not be socially meaningful differences 2. It doesn’t tell us how strong the relationship is It doesn’t tell us if it is a large, meaningful difference or a very small one It is only a test of “independence” vs. “dependence” Measures of Association address this shortcoming.

Measures of Association Separate from the issue of independence, statisticians have created measures of association –They are measures that tell us how strong the relationship is between two variables Weak Association Strong Association WomenMen Dem.5149 Rep.4951 WomenMen Dem.1000 Rep.0100

Crosstab Association:Yule’s Q #1: Yule’s Q –Appropriate only for 2x2 tables (2 rows, 2 columns) Label cell frequencies a through d: ab cd Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship. Yule’s Q captures that in a measure 0 = no association. -1, +1 = strong association

Crosstab Association:Yule’s Q Rule of Thumb for interpreting Yule’s Q: Bohrnstedt & Knoke, p. 150 Absolute value of Q Strength of Association 0 to.24“virtually no relationship”.25 to.49“weak relationship”.50 to.74“moderate relationship”.75 to 1.0“strong relationship”

a b c d Crosstab Association:Yule’s Q Example: Gender and Political Party Affiliation WomenMen Dem2710 Rep1615 Calculate “bc” bc = (10)(16) = 160 Calculate “ad” ad = (27)(15) = = “weak association”, almost “moderate”

Association: Other Measures Phi (  ) Very similar to Yule’s Q Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc. Gamma (G) Based on a very different method of calculation Not limited to 2x2 tables Requires ordered variables Tau c (  c ) and Somer’s d (d yx ) Same basic principle as Gamma Several Others discussed in Knoke, Norusis.