Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.

Slides:



Advertisements
Similar presentations
The Chi-Square Test for Association
Advertisements

Bivariate Analysis Cross-tabulation and chi-square.
Sociology 601 Class 8: September 24, : Small-sample inference for a proportion 7.1: Large sample comparisons for two independent sample means.
Chapter 13: The Chi-Square Test
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
 Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Sociology 601 Class 10: October 1, : Small sample comparisons for two independent groups. o Difference between two small sample means o Difference.
Sociology 601 Lecture 11: October 6, 2009 No office hours Oct. 15, but available all day Oct. 16 Homework Contingency Tables for Categorical Variables.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
Sociology 601 Class 7: September 22, 2009
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Sociology 601: Midterm review, October 15, 2009
Chi-square Test of Independence
Crosstabs and Chi Squares Computer Applications in Psychology.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Presentation 12 Chi-Square test.
Review A.Null hypothesis, critical value, alpha, test statistic B.Alternative hypothesis, critical region, p-value, independent variable C.Null hypothesis,
Chapter 10 Analyzing the Association Between Categorical Variables
How Can We Test whether Categorical Variables are Independent?
AS 737 Categorical Data Analysis For Multivariate
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 33: Chapter 12, Section 2 Two Categorical Variables More.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
ANOVA (Analysis of Variance) by Aziza Munir
Quantitative Methods Partly based on materials by Sherry O’Sullivan Part 3 Chi - Squared Statistic.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chi-Square Distributions. Recap Analyze data and test hypothesis Type of test depends on: Data available Question we need to answer What do we use to.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Chapter 14: Chi-Square Procedures – Test for Goodness of Fit.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chi Square Classifying yourself as studious or not. YesNoTotal Are they significantly different? YesNoTotal Read ahead Yes.
Copyright © 2010 Pearson Education, Inc. Slide
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Chapter Outline Goodness of Fit test Test of Independence.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi-Square hypothesis testing
Chapter 9: Non-parametric Tests
8. Association between Categorical Variables
Hypothesis Testing Review
Data Analysis for Two-Way Tables
Chapter 10 Analyzing the Association Between Categorical Variables
Presentation transcript:

Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared tests (8.3) – chi-squared residuals 1

8.2 Chi-squared statistical significance test for contingency tables. support tax reform? YesNoTot support Yes environment? No Tot “Is the level of support for the environment independent of the level of support for tax reform?” – If so, these two measures may have some causal link worth investigating. – Q: which causes which? 2

2x2 table: a t-test for proportions With a 2x2 table, we can use a t-test for independent-sample proportions (review 7.2).. prtesti Two-sample test of proportion x: Number of obs = 250 y: Number of obs = Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] x | y | diff | | under Ho: diff = prop(x) - prop(y) z = Ho: diff = 0 Ha: diff 0 Pr(Z z) =

Moving beyond 2x2 tables: Comparing conditional probabilities is fine when there are only two comparisons and two possible outcomes for each comparison. The Chi-Square (  2 ) test is a new technique for making comparisons more flexible.  2 is like a null hypothesis that every cell should have the frequency you would expect if the variables were independently distributed. f e is the expected count for each cell. –f e = product of row totals * column totals / table total (A&F 254) –f e = total N * unconditional row probability * unconditional column probability –f e = column N * unconditional row A test for the whole table will combine tests for f e for every cell. 4

Calculating expected cell counts: The expected cell count is the count we would expect in a cell if – environmental support among tax reform advocates and among tax reform opponents were identical, or if – environmental support among tax reform advocates were the same as environmental support among the whole sample, or if – tax reform support among environmentalists were the same as among non-enviornmentalists 50% of sample supports environmental spending, so – f e(1,1) =.5 * 350 = 175 – f e(1,1) = 250 * 350 / 500 (A&F) – f e(1,1) = 500*(350/500) taxes *(250/500) environment = 175 f e(1,2) = 75 f e(2,1) = 175 f e(2,2) = 75 5

Testing independence of support for tax reform and environmental spending: New Approach: Chi Squared test for independence of attitudes toward taxes and the environment. Test statistic: –  2 =  ((f o – f e ) 2 / f e ) – where f o is the observed count in each cell – and where f e is the expected count for each cell, assuming that attitudes toward taxes will be the same for people who support environmental issues as for people who do not support environmental issues. 6

Assumptions and hypothesis for a chi-squared test: Assumptions: – two categorical variables (for this course) – random sample or stratified random sample – f e  5 for all cells Hypothesis: H o : the two variables are statistically independent. – this means that the distribution of each variable is independent of the score of the other variable 7

Using expected cell counts to calculate a Chi-squared test statistic The test statistic is analogous to a t-statistic… – but the form of the equation makes it difficult to see that the X 2 statistic is a difference between the observed and expected values, divided by an estimate of the typical variation we would expect from random sampling error. Test statistic: –  2 =  ((f o – f e ) 2 / f e ) = ((150 –175) 2 /175 + (100-75) 2 /75 + ( ) 2 /175 + (50-75) 2 /75 ) = =

Degrees of freedom for a Chi-squared statistic: We now have a test statistic:  2 = How do we assign a p-value to this? Step 1: calculate the degrees of freedom. – Given the row and column marginal totals, how many cells need we fill in before we can do the rest automatically? – Answer: 1 in this case, so df = 1. – General answer: df = (r-1)*(c-1), where r is the number of rows and c is the number of columns. 9

p-value for a Chi-squared statistic: Assign a p-value to the statistic:  2 = 23.81, df = 1 Given the degrees of freedom, look up the p-value. – Go to Table C on page 670. – Go down to the row for df = 1 – Move across X 2 values to the largest tabled value that is smaller than the measured X 2 – Look up the corresponding p-value at the top of the column: p <.001 – The chi-squared test is always a 1-tailed test: we always use the right tail of the distribution. 10

Do your own chi-squared test: You watch 50 beachcombers to see if they are wearing sandals and if they are wearing shorts.wearing shorts? YesNoTot sandals?Yes No Tot Q: Does a beachcomber’s chance of wearing sandals depend on their chance of wearing shorts? 11

Chi-Squared Tests for tables larger than 2X2 Here is a command to run a chi-squared test on the gender and partyid data from the 1996 GSS (cf. 8.1). tab partyid3 sex, chi2 | respondents sex partyid3 | male female | Total Democrat | | 977 Independent | | 1,071 Republican | | Total | 1,264 1,591 | 2,855 Pearson chi2(2) = Pr =

Add expected cell counts. tab partyid3 sex, chi2 exp | Key | | | | frequency | | expected frequency | | respondents sex partyid3 | male female | Total Democrat | | 977 | | Independent | | 1,071 | | 1, Republican | | 807 | | Total | 1,264 1,591 | 2,855 | 1, ,591.0 | 2,855.0 Pearson chi2(2) = Pr =

8.3 When not to do a chi-squared test 1.) Do not do a Chi-squared test when the expected value of a cell is less than 5. The Problem: The total  2 is 6.28, so p<.05, but 4.5 of the total comes from one cell with f e = 2. (It is okay to do a Chi-squared test if a cell has an expected value above 5 and an observed value below 5!) ageParty identification Democrat Indep. RepublicanTotal <6542 (40)5 (8) 33 (32) (10)5 (2) 7 (8)20 total

A small sample alternative to a chi-squared test When the sample size is too small for a chi-squared test, you may treat the contingency table as a small sample comparison of two population proportions. This means you should do a Fisher’s exact test for population proportions. A Fisher’s exact test will also work okay on large samples, but you sometimes will bog down the computer with lengthy computations. (This is especially likely to happen when the tables are 5X4 or larger). 15

Fisher’s exact test in STATA (not necessary in this case because of large N).. tab partyid3 sex, chi exact Enumerating sample-space combinations: stage 3: enumerations = 1 stage 2: enumerations = 158 stage 1: enumerations = 0 | respondents sex partyid3 | male female | Total Democrat | | 977 Independent | | 1,071 Republican | | Total | 1,264 1,591 | 2,855 Pearson chi2(2) = Pr = Fisher's exact =

When not do a chi-squared test (#2) 2.) Do not do a Chi-squared test for cell values that are not observed frequencies. The Problem: If you use percentages, you misstate the sample size as 100. sexVoted in last election? Yes No Total women35%15% 50% men20%30% 50% total55%45% 100% 17

When not to do a chi-squared test (#3) 3.) Do not do a Chi-squared test to find a difference in population proportions for dependent samples. The Problem: You want to know if the speech changed people’s opinions. A  2 test would tell you if opinions after the speech depend on opinions before the speech. Before speech: Number supporting death penalty: After hearing speech: Yes No Total Yes No total

Residual Analysis for Chi-Squared Tests This part of section 8.3 will not be on the exam. I don’t use this stuff, but Agresti covers it, and you should have a reference in case a referee ever asks for it. The problem: If a Chi-squared test produces a statistically significant result, we only know that somewhere in the table the data depart from what independence predicts. To find the level of statistical significance associated with a single cell value, we conduct a residual analysis. 19

Residual Analysis for Chi-Squared Tests Terms: residual: ( f o - f e ) The difference between an observed and an expected cell frequency. adjusted residual: The standardized difference between an observed and an expected cell frequency. (Like a z-score for cells in independence tests.) 20

Residual Analysis for Chi-Squared Tests Example: adjusted residual for cell (1,1) =( )/sqrt((261.4)(1-444/980)(1-577/980) = (treat as a z-score, so p =.011) sex Party identification Democrat Indep. RepublicanTotal female279(261.4)73 (70.65)225(244.9)577 male165(182.6)47 (49.35)191(171.1)403 total

Residual Analysis for Chi-Squared Tests Cautions about the adjusted residual: 1.) the adjusted residual is like a z-score for a two-sided test of the difference between the proportion in the cell and the average proportion for all other cells in the column. ( not the z-score for f o - f e ) 2.) An adjusted residual of “z” = 1.96 for one cell does not mean that the whole Chi-squared test is statistically significant at the.05 level. (A Chi-squared test adjusts for the fact that you are doing df t-tests at the same time.) 22