Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.

Similar presentations


Presentation on theme: "Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared."— Presentation transcript:

1 Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared tests (8.3) – chi-squared residuals 1

2 8.2 Chi-squared statistical significance test for contingency tables. support tax reform? YesNoTot support Yes150100250 environment? No20050250 Tot350150500 “Is the level of support for the environment independent of the level of support for tax reform?” – If so, these two measures may have some causal link worth investigating. – Q: which causes which? 2

3 2x2 table: a t-test for proportions With a 2x2 table, we can use a t-test for independent-sample proportions (review 7.2).. prtesti 250.6 250.8 Two-sample test of proportion x: Number of obs = 250 y: Number of obs = 250 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x |.6.0309839.5392727.6607273 y |.8.0252982.7504164.8495836 -------------+---------------------------------------------------------------- diff | -.2.04 -.2783986 -.1216014 | under Ho:.0409878 -4.88 0.000 ------------------------------------------------------------------------------ diff = prop(x) - prop(y) z = -4.8795 Ho: diff = 0 Ha: diff 0 Pr(Z z) = 1.0000 3

4 Moving beyond 2x2 tables: Comparing conditional probabilities is fine when there are only two comparisons and two possible outcomes for each comparison. The Chi-Square (  2 ) test is a new technique for making comparisons more flexible.  2 is like a null hypothesis that every cell should have the frequency you would expect if the variables were independently distributed. f e is the expected count for each cell. –f e = product of row totals * column totals / table total (A&F 254) –f e = total N * unconditional row probability * unconditional column probability –f e = column N * unconditional row A test for the whole table will combine tests for f e for every cell. 4

5 Calculating expected cell counts: The expected cell count is the count we would expect in a cell if – environmental support among tax reform advocates and among tax reform opponents were identical, or if – environmental support among tax reform advocates were the same as environmental support among the whole sample, or if – tax reform support among environmentalists were the same as among non-enviornmentalists 50% of sample supports environmental spending, so – f e(1,1) =.5 * 350 = 175 – f e(1,1) = 250 * 350 / 500 (A&F) – f e(1,1) = 500*(350/500) taxes *(250/500) environment = 175 f e(1,2) = 75 f e(2,1) = 175 f e(2,2) = 75 5

6 Testing independence of support for tax reform and environmental spending: New Approach: Chi Squared test for independence of attitudes toward taxes and the environment. Test statistic: –  2 =  ((f o – f e ) 2 / f e ) – where f o is the observed count in each cell – and where f e is the expected count for each cell, assuming that attitudes toward taxes will be the same for people who support environmental issues as for people who do not support environmental issues. 6

7 Assumptions and hypothesis for a chi-squared test: Assumptions: – two categorical variables (for this course) – random sample or stratified random sample – f e  5 for all cells Hypothesis: H o : the two variables are statistically independent. – this means that the distribution of each variable is independent of the score of the other variable 7

8 Using expected cell counts to calculate a Chi-squared test statistic The test statistic is analogous to a t-statistic… – but the form of the equation makes it difficult to see that the X 2 statistic is a difference between the observed and expected values, divided by an estimate of the typical variation we would expect from random sampling error. Test statistic: –  2 =  ((f o – f e ) 2 / f e ) = ((150 –175) 2 /175 + (100-75) 2 /75 + (200-175) 2 /175 + (50-75) 2 /75 ) = 3.5714 + 8.3333 + 3.5714 + 8.3333 = 23.81 8

9 Degrees of freedom for a Chi-squared statistic: We now have a test statistic:  2 = 23.81 How do we assign a p-value to this? Step 1: calculate the degrees of freedom. – Given the row and column marginal totals, how many cells need we fill in before we can do the rest automatically? – Answer: 1 in this case, so df = 1. – General answer: df = (r-1)*(c-1), where r is the number of rows and c is the number of columns. 9

10 p-value for a Chi-squared statistic: Assign a p-value to the statistic:  2 = 23.81, df = 1 Given the degrees of freedom, look up the p-value. – Go to Table C on page 670. – Go down to the row for df = 1 – Move across X 2 values to the largest tabled value that is smaller than the measured X 2 – Look up the corresponding p-value at the top of the column: p <.001 – The chi-squared test is always a 1-tailed test: we always use the right tail of the distribution. 10

11 Do your own chi-squared test: You watch 50 beachcombers to see if they are wearing sandals and if they are wearing shorts.wearing shorts? YesNoTot sandals?Yes201030.No101020.Tot302050 Q: Does a beachcomber’s chance of wearing sandals depend on their chance of wearing shorts? 11

12 Chi-Squared Tests for tables larger than 2X2 Here is a command to run a chi-squared test on the gender and partyid data from the 1996 GSS (cf. 8.1). tab partyid3 sex, chi2 | respondents sex partyid3 | male female | Total ------------+----------------------+---------- Democrat | 350 627 | 977 Independent | 514 557 | 1,071 Republican | 400 407 | 807 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 Pearson chi2(2) = 43.4391 Pr = 0.000 12

13 Add expected cell counts. tab partyid3 sex, chi2 exp +--------------------+ | Key | |--------------------| | frequency | | expected frequency | +--------------------+ | respondents sex partyid3 | male female | Total ------------+----------------------+---------- Democrat | 350 627 | 977 | 432.5 544.5 | 977.0 ------------+----------------------+---------- Independent | 514 557 | 1,071 | 474.2 596.8 | 1,071.0 ------------+----------------------+---------- Republican | 400 407 | 807 | 357.3 449.7 | 807.0 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 | 1,264.0 1,591.0 | 2,855.0 Pearson chi2(2) = 43.4391 Pr = 0.000 13

14 8.3 When not to do a chi-squared test 1.) Do not do a Chi-squared test when the expected value of a cell is less than 5. The Problem: The total  2 is 6.28, so p<.05, but 4.5 of the total comes from one cell with f e = 2. (It is okay to do a Chi-squared test if a cell has an expected value above 5 and an observed value below 5!) ageParty identification Democrat Indep. RepublicanTotal <6542 (40)5 (8) 33 (32)80 65 8 (10)5 (2) 7 (8)20 total5010 40100 14

15 A small sample alternative to a chi-squared test When the sample size is too small for a chi-squared test, you may treat the contingency table as a small sample comparison of two population proportions. This means you should do a Fisher’s exact test for population proportions. A Fisher’s exact test will also work okay on large samples, but you sometimes will bog down the computer with lengthy computations. (This is especially likely to happen when the tables are 5X4 or larger). 15

16 Fisher’s exact test in STATA (not necessary in this case because of large N).. tab partyid3 sex, chi exact Enumerating sample-space combinations: stage 3: enumerations = 1 stage 2: enumerations = 158 stage 1: enumerations = 0 | respondents sex partyid3 | male female | Total ------------+----------------------+---------- Democrat | 350 627 | 977 Independent | 514 557 | 1,071 Republican | 400 407 | 807 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 Pearson chi2(2) = 43.4391 Pr = 0.000 Fisher's exact = 0.000 16

17 When not do a chi-squared test (#2) 2.) Do not do a Chi-squared test for cell values that are not observed frequencies. The Problem: If you use percentages, you misstate the sample size as 100. sexVoted in last election? Yes No Total women35%15% 50% men20%30% 50% total55%45% 100% 17

18 When not to do a chi-squared test (#3) 3.) Do not do a Chi-squared test to find a difference in population proportions for dependent samples. The Problem: You want to know if the speech changed people’s opinions. A  2 test would tell you if opinions after the speech depend on opinions before the speech. Before speech: Number supporting death penalty: After hearing speech: Yes No Total Yes8020100 No4060100 total12080200 18

19 Residual Analysis for Chi-Squared Tests This part of section 8.3 will not be on the exam. I don’t use this stuff, but Agresti covers it, and you should have a reference in case a referee ever asks for it. The problem: If a Chi-squared test produces a statistically significant result, we only know that somewhere in the table the data depart from what independence predicts. To find the level of statistical significance associated with a single cell value, we conduct a residual analysis. 19

20 Residual Analysis for Chi-Squared Tests Terms: residual: ( f o - f e ) The difference between an observed and an expected cell frequency. adjusted residual: The standardized difference between an observed and an expected cell frequency. (Like a z-score for cells in independence tests.) 20

21 Residual Analysis for Chi-Squared Tests Example: adjusted residual for cell (1,1) =(279-261.4)/sqrt((261.4)(1-444/980)(1-577/980) = 2.295 (treat as a z-score, so p =.011) sex Party identification Democrat Indep. RepublicanTotal female279(261.4)73 (70.65)225(244.9)577 male165(182.6)47 (49.35)191(171.1)403 total444120 416980 21

22 Residual Analysis for Chi-Squared Tests Cautions about the adjusted residual: 1.) the adjusted residual is like a z-score for a two-sided test of the difference between the proportion in the cell and the average proportion for all other cells in the column. ( not the z-score for f o - f e ) 2.) An adjusted residual of “z” = 1.96 for one cell does not mean that the whole Chi-squared test is statistically significant at the.05 level. (A Chi-squared test adjusts for the fact that you are doing df t-tests at the same time.) 22


Download ppt "Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared."

Similar presentations


Ads by Google