Download presentation
Presentation is loading. Please wait.
Published byBrent Page Modified over 8 years ago
1
Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
2
Announcements Final Project Assignment Handed out Proposal due November 15 Final Project due December 13 Today’s class: New Topic: Crosstabulation Also called “crosstabs” Coming Soon: correlation, regression
3
Crosstabulation: Introduction T-Test and ANOVA look to see if groups differ on a continuous dependent variable Groups are actually a nominal variable Example: Do different ethnic groups vary in wages? Difference in means for two groups indicates a relationship between two variables Null hypothesis (means are the same) suggests that there is no relationship between variables Alternate hypothesis (means differ) is equivalent to saying that there is a relationship.
4
Crosstabulation: Introduction T-test and ANOVA determine whether there is a statistical relationship between a nominal variable and a continuous variable in your data But, we may be interested in two nominal variables Examples: Class and unemployment; gender and drug use Crosstabulation: used for nominal/ordinal variables Tools to descriptively examine variables Tools to identify whether there is a relationship between two variables.
5
Crosstabulation: Introduction What is bivariate crosstabulation? Start two nominal variables in a dataset: Example: gender (Male/Female) and political party (Democrat, Republican) Crosstabulation is simply counting up the number of people in each combined category How many democratic women? democratic men? republican women? Republican men? It is similar to computing frequencies But, for two variables jointly, rather than just one.
6
Crosstabulation: Introduction Example: Female = 1, Democrat = 1 IDGenderPolitical Party 101 210 311 400 500 610 711 801 Question: How many Republican Women are in the dataset? Answer: 2
7
Crosstabulation: Introduction Example: Dataset of 68 people Look and count up the number of people in each combined category Or, determine frequency along the first variable: Frequency: 43 women, 25 men Then break out groups by the second variable Of 43 women, 27 = democrat, 16 = republican Of 25 men, 10 = democrat, 15 = republican.
8
Crosstabulation: Introduction Crosstab: a table that presents joint frequencies Also called a “joint contingency table” WomenMen Democrat2710 Republican1615 Each box with a value is a “cell” This is a table row This is a table column
9
Crosstabulation: Introduction Tables may also have additional information: Row and column marginals (i.e., totals) WomenMenTotal Dem271037 Rep161531 Total432568 += + = This is the total N
10
Crosstabulation: Introduction Tables can also reflect percentages Either of total N, or of row or column marginals This table shows percentage of total N: WomenMenN Dem39.7%14.7%37 Rep23.5%22.1%31 N432568 Just divide each cell value by the total N to get a proportion. Multiply by 100 for a percentage: (10/68)(100)=14.7 27 10 16 15
11
Crosstabulation: Introduction In addition, you can calculate percentages with respect to either row or column marginals Here is an example of column percentages WomenMenN Dem62.8%40.0%37 Rep37.2%60.0%31 N432568 27 10 16 15 Just divide each cell by the column marginal to get a proportion. Multiply by 100 for a percentage: (10/25)(100)=40%
12
Crosstabulation: Independence Question: How can we tell if there is a relationship between the two variables? Answer: If category on one variable appears to be linked to category on the other: WomenMenN Dem430 Rep025 N432568
13
Crosstabulation: Independence If there is no relationship between two variables, they are said to be “independent” Neither “depends” on the other If there is a relationship, the variables are said to be “associated” or to “covary” If individuals in one category also consistently fall in another (women=dem, men=rep), you may suspect that there is a relationship between the two variables Just as when the mean of a certain sub-group is much higher or lower than another (in T-test/ANOVA).
14
Crosstabulation: Independence Relationships aren’t always very clearly visible Widely differing numbers of people in categories make comparisons difficult (e.g., if there were 200 men and only 15 women in the sample) And, large tables become more difficult to interpret (Example: Knoke, p. 157) Looking at row or column percentages can make visual interpretation a bit easier Calculate the percentages within the category you think is the “independent” variable If you think that political party affiliation depends on gender (column variable), look a column percentages.
15
Crosstabulation: Independence Here, column percentages highlight the relationship among variables: WomenMenN Dem62.8%40.0%37 Rep37.2%60.0%31 N432568 It appears as though women tend to be more democratic, while men tend to be republican
16
Chi-square Test of Independence In the sample, women appear to be more democratic, men republican How do we know if this difference is merely due to sampling variability? (Thus, there is no relationship in the population?) Or, is it indicative of a relationship at the population level? Answer: A new kind of statistical test The chi-square ( 2 ) test Pronunciation: “chi” rhymes with “sky” Chi-square tests: Similar to T-tests, F-tests Another family of distributions with known properties.
17
Chi-square Test of Independence Chi-Square test is a test of independence Asks “is there a relationship between variables or not?” Independence = no relationship ANOVA, T-Test do this too (same means = independent) Null hypothesis: the two variables are statistically independent H0: Gender and political party are independent There is no relationship between them Alternate hypothesis: the variables are related, not independent of each other H1: Gender and political party are not independent.
18
Chi-square Test of Independence How does a chi-square test of independence work? It is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables Definitions: Observed values = values in the crosstab cells based on your sample Expected values = crosstab cell values you would expect if your variables were unrelated.
19
Crosstabs: Notation The value in a cell is referred to as a frequency –Math symbol = f Cells are referred to by row and column numbers –Ex: women republicans = 2 nd row, 1 st column –In general, rows are numbered from 1 to i, columns are numbered from 1 to j Thus, the value in any cell of any table can be written as: – f ij
20
Expected Cell Values If two variables are independent, cell values will depend only on row & column marginals –Marginals reflect frequencies… And, if frequency is high, all cells in that row (or column) should be high The formula for the expected value in a cell is: f i and f j are the row and column marginals N is the total sample size
21
Expected Cell Values Expected cell values are easy to calculate –Expected = row marginal * column marginal / N WomenMenN Dem23.413.637 Rep19.611.431 N432568 RowM * ColM / N (25*37)/68=13.6
22
Expected Cell Values Question: What makes these values “expected”? A: They simply reflect percentages of marginals Look at column %’s based on expected values: WomenMenN Dem54% 37 (54%) Rep46% 31 (46%) N432568
23
Expected Cell Values Expected values are “expected” because they mirror the properties of the sample. If the sample is 63% women, you’d expect: –63% of democrats would be women and –63% of republicans would be women If not, the variables (gender & political view) would not be “independent” of each other
24
Chi-Square Test of Independence The Chi-square test is a comparison of expected and observed values For each cell, compute: Then, sum this up for all cells If cells all deviate a lot from the expected values, then the sum is large Maybe we can reject H0
25
Chi-square Test of Independence The actual Chi-square formula: R = total number of rows in the table C = total number of columns in the table E ij = the expected frequency in row i, column j O ij = the observed frequency in row i, column j Question: Why square E – O ?
26
Chi-square Test of Independence Assumptions require for Chi-square test: Only one: Sample size is large, N > 100 Hypotheses –H0: Variables are statistically independent –H1: Variables are not statistically independent The critical value can be looked up in a Chi- square table –See Knoke, p. 509-510 –Calculate degrees of freedom: (#Rows-1)(#Col-1)
27
Chi-square Test of Independence Example: Gender and Political Views –Let’s pretend that N of 68 is sufficient WomenMen Democrat O 11 : 27 E 11 : 23.4 O 12 : 10 E 12 : 13.6 Republican O 21 : 16 E 21 : 19.6 O 22 : 15 E 22 : 11.4
28
Chi-square Test of Independence Compute (E – O) 2 /E for each cell WomenMen Democrat (23.4 – 27) 2 /23.4 =.55 (13.6 – 10) 2 /13.6 =.95 Republican (19.6 – 16) 2 /19.6 =.66 (11.4 – 15) 2 /15 =.86
29
Chi-Square Test of Independence Finally, sum up to compute the Chi-square 2 =.55 +.95 +.66 +.86 = 3.02 What is the critical value for =.05? Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1 According to Knoke, p. 509: Critical value is 3.84 Question: Can we reject H0? No. 2 of 3.02 is less than the critical value We cannot conclude that there is a relationship between gender and political party affiliation.
30
Chi-square Test of Independence Weaknesses of chi-square tests: 1. If the sample is very large, we almost always reject H0. Even tiny covariations are statistically significant But, they may not be socially meaningful differences 2. It doesn’t tell us how strong the relationship is It doesn’t tell us if it is a large, meaningful difference or a very small one It is only a test of “independence” vs. “dependence” Measures of Association address this shortcoming.
31
Measures of Association Separate from the issue of independence, statisticians have created measures of association –They are measures that tell us how strong the relationship is between two variables Weak Association Strong Association WomenMen Dem.5149 Rep.4951 WomenMen Dem.1000 Rep.0100
32
Crosstab Association:Yule’s Q #1: Yule’s Q –Appropriate only for 2x2 tables (2 rows, 2 columns) Label cell frequencies a through d: ab cd Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship. Yule’s Q captures that in a measure 0 = no association. -1, +1 = strong association
33
Crosstab Association:Yule’s Q Rule of Thumb for interpreting Yule’s Q: Bohrnstedt & Knoke, p. 150 Absolute value of Q Strength of Association 0 to.24“virtually no relationship”.25 to.49“weak relationship”.50 to.74“moderate relationship”.75 to 1.0“strong relationship”
34
a b c d Crosstab Association:Yule’s Q Example: Gender and Political Party Affiliation WomenMen Dem2710 Rep1615 Calculate “bc” bc = (10)(16) = 160 Calculate “ad” ad = (27)(15) = 405 -.48 = “weak association”, almost “moderate”
35
Association: Other Measures Phi ( ) Very similar to Yule’s Q Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc. Gamma (G) Based on a very different method of calculation Not limited to 2x2 tables Requires ordered variables Tau c ( c ) and Somer’s d (d yx ) Same basic principle as Gamma Several Others discussed in Knoke, Norusis.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.