1 Applied Statistics Using SAS and SPSS Topic: Chi-square tests By Prof Kelly Fan, Cal. State Univ., East Bay
2 Outline ALL variables must be categorical Goal one: verify a distribution of Y One-sample Chi-square test (SPSS lesson 40; SAS handout) Goal two: test the independence between two categorical variables Chi-square test for two-way contingency table (SPSS lesson 41; SAS section 3.G) McNemar’s test for paired data (SPSS lesson 44; SAS section 3.L) Measure the dependence (Phil and Kappa coefficients) (SPSS lesson 41, 44; SAS section 3.G, 3.M)
3 Example: Postpartum Depression Study Are women equally likely to show an increase, no change, or a decrease in depression as a function of childbirth? Are the proportions associated with a decrease, no change, and an increase in depression from before to after childbirth the same?
4 Example: Postpartum Depression Study Depression after birth in comparison with before birth Observed frequencies Hypothesized proportions Expected frequencies Less depressed (-1)141/320 Neither less nor more depressed (0) 331/320 More depressed (1)131/320 From a random sample of 60 women
5 One-sample Chi-Square Test Must be a random sample The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories
6 One-sample Chi-Square Test Test statistic: Oi = the observed frequency of i-th category e i = the expected frequency of i-th category
7 SPSS Output 1.Weight your data by count first 2.Analyze >> Nonparametric Tests >> Legacy Dialogs >> Chi Square, count as test variable
8 Conclusion Reject Ho The proportions associated with a decrease, no change, and an increase in depression from before to after childbirth are significantly different to 1/3, 1/3, 1/3.
9 Example: Postpartum Depression Study Are the proportions associated with a change and no change from before to after childbirth the same?
10 Example: Postpartum Depression Study Depression after birth in comparison with before birth Observed frequencies Hypothesized proportions Expected frequencies Same amount of depression (0) 331/230 More or less depressed (1) 271/230 From a random sample of 60 women
11 SPSS Output
12 Two-way Contingency Tables Report frequencies on two variables Such tables are also called crosstabs.
13 Contingency Tables (Crosstabs) 1991 General Social Survey FrequencyParty Identification DemocratIndependentRepublican RaceWhite Black
14 Crosstabs Analysis (Two-way Chi- square test) Chi-square test for testing the independence between two variables: 1.For a fixed column, the distribution of frequencies over rows keeps the same regardless of the column 2.For a fixed row, the distribution of frequencies over columns keeps the same regardless of the row
15 Measure of dependence for 2x2 tables The phi coefficient measures the association between two categorical variables -1 < phi < 1 | phi | indicates the strength of the association If the two variables are both ordinal, then the sign of phi indicate the direction of association
SPSS Output P. 332 –
17 SAS Output Statistic DF Value Prob Chi-Square <.0001 Likelihood Ratio Chi-Square <.0001 Mantel-Haenszel Chi-Square <.0001 Phi Coefficient Contingency Coefficient Cramer's V Sample Size = 980
Measure of dependence for non-2x2 tables Cramers V Range from 0 to 1 V may be viewed as the association between two variables as a percentage of their maximum possible variation. V= phi for 2x2, 2x3 and 3x2 tables 18
19 Fisher’s Exact Test for Independence The Chi-squared tests are ONLY for large samples: The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories
20 SAS/SPSS Output SAS output: Fisher's Exact Test Table Probability (P) 3.823E-22 Pr <= P 2.787E-20 SPSS output: in “crosstabs” window, click “exact”, then tick “exact”:
21 Matched-pair Data Comparing categorical responses for two “paired” samples When either Each sample has the same subjects (or say subjects are measured twice) Or A natural pairing exists between each subject in one sample and a subject form the other sample (eg. Twins)
22 Example: Rating for Prime Minister Second Survey First SurveyApproveDisapprove Approve Disapprove86570
23 Marginal Homogeneity The probabilities of “success” for both samples are identical Eg. The probability of approve at the first and 2 nd surveys are identical
24 McNemar Test (for 2x2 Tables only) SAS: Section 3.L; SPSS: Lesson 44 Ho: marginal homogeneity Ha: no marginal homogeneity Exact p-value Approximate p-value (When n 12 +n 21 >10)
25 SAS Output McNemar's Test Statistic (S) DF 1 Asymptotic Pr > S <.0001 Exact Pr >= S 3.716E-05 Simple Kappa Coefficient Kappa ASE % Lower Conf Limit % Upper Conf Limit Sample Size = 1600 Level of agreement
SPSS Output 26 SPSS: p. 361 and in “two-samples tests” window tick McNemar and click “exact”, then tick “exact”: