Chi-Square (Association between categorical variables)

Slides:



Advertisements
Similar presentations
Chapter 11 Other Chi-Squared Tests
Advertisements

Basic Statistics The Chi Square Test of Independence.
The Chi-Square Test for Association
Hypothesis Testing IV Chi Square.
Chapter 13: The Chi-Square Test
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 12 Chicago School of Professional Psychology.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 17: Chi-Square.
CHI-SQUARE statistic and tests
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
GOODNESS OF FIT TEST & CONTINGENCY TABLE
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Week 10 Chapter 10 - Hypothesis Testing III : The Analysis of Variance
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Tests of Significance June 11, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
1 Chi-Square Heibatollah Baghi, and Mastee Badii.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chapter 16 The Chi-Square Statistic
1 Inferences About The Pearson Correlation Coefficient.
CHI SQUARE TESTS.
Copyright © 2010 Pearson Education, Inc. Slide
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square Analyses.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chi Square Test Dr. Asif Rehman.
Comparing Counts Chi Square Tests Independence.
Introduction to Marketing Research
Cross Tabulation with Chi Square
Basic Statistics The Chi Square Test of Independence.
CHI-SQUARE(X2) DISTRIBUTION
Test of independence: Contingency Table
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chi-Square hypothesis testing
Presentation 12 Chi-Square test.
Chi-square test or c2 test
Chapter 11 Chi-Square Tests.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 16: Research with Categorical Data.
Hypothesis Testing Review
Chapter 25 Comparing Counts.
Qualitative data – tests of association
Different Scales, Different Measures of Association
Reasoning in Psychology Using Statistics
Chapter 11: Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables (cross tabs)
Contingency Tables: Independence and Homogeneity
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts.
Reasoning in Psychology Using Statistics
Parametric versus Nonparametric (Chi-square)
Reasoning in Psychology Using Statistics
CHAPTER 11 Inference for Distributions of Categorical Data
BUSINESS MARKET RESEARCH
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 18: The Chi-Square Statistic
TESTS FOR NOMINAL SCALEDATA: CHI-SQUARE TEST
Chapter 26 Comparing Counts.
Chapter 11 Chi-Square Tests.
CHAPTER 11 Inference for Distributions of Categorical Data
Hypothesis Testing - Chi Square
Presentation transcript:

Chi-Square (Association between categorical variables) Benjamin Kamala kamala8086@gmail.com

Different Scales, Different Measures of Association Scale of Both Variables Measures of Association Nominal Scale Pearson Chi-Square: χ2 Ordinal Scale Spearman’s rho Interval or Ratio Scale Pearson r

Chi-Square (χ2) and Frequency Data Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale. The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.

Steps in Test of Hypothesis Determine the appropriate test Establish the level of significance:α Formulate the statistical hypothesis Calculate the test statistic Determine the degree of freedom Compare computed test statistic against a tabled/critical value

1. Determine Appropriate Test Chi Square is used when both variables are measured on a nominal scale. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

2. Establish Level of Significance α is a predetermined value The convention α = .05 α = .01 α = .001

3. Determine The Hypothesis: Whether There is an Association or Not Ho : The two variables are independent Ha : The two variables are associated

When using chi-square in , there is some vocabulary we must know: Hypothesis = a proposed explanation of an observed phenomenon Observed results = what you can observe during the course of an experiment Expected results = what you expect to see based on your hypothesis (predictions)

4. Calculating Test Statistics Contrasts observed frequencies in each cell of a contingency table with expected frequencies. The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N Mean difference between pairs of values

The formula includes: Our final formula: X2 = chi-square (o - e) = observed minus expected [sometimes you may see this represented with a d which means the difference between the expected and observed results] e = expected results o = observed results and = sum of Our final formula:

4. Calculating Test Statistics Continued 4. Calculating Test Statistics Mean difference between pairs of values

4. Calculating Test Statistics Continued 4. Calculating Test Statistics Observed frequencies Expected frequency Mean difference between pairs of values Expected frequency

5. Determine Degrees of Freedom df = (R-1)(C-1) Number of levels in column variable Number of levels in row variable

6. Compare computed test statistic against a tabled/critical value The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable The critical tabled values are based on sampling distributions of the Pearson chi-square statistic If calculated 2 is greater than 2 table value, reject Ho

Example Suppose a researcher is interested in voting preferences on gun control issues. A questionnaire was developed and sent to a random sample of 90 voters. The researcher also collects information about the political party membership of the sample of 90 respondents.

Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90

Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90 Observed frequencies

Bivariate Frequency Table or Contingency Table Row frequency Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90

Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90 Column frequency

1. Determine Appropriate Test Party Membership ( 2 levels) and Nominal Voting Preference ( 3 levels) and Nominal

2. Establish Level of Significance Alpha of .05

3. Determine The Hypothesis Ho : There is no difference between D & R in their opinion on gun control issue. Ha : There is an association between responses to the gun control survey and the party membership in the population.

4. Calculating Test Statistics Favor Neutral Oppose f row D fo =10 fe =13.9 fo =30 fe=22.2 50 R fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90

4. Calculating Test Statistics Continued 4. Calculating Test Statistics Favor Neutral Oppose f row Democrat fo =10 fe =13.9 fo =30 fe=22.2 50 Republican fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90 = 50*25/90

4. Calculating Test Statistics Continued 4. Calculating Test Statistics Favor Neutral Oppose f row Democrat fo =10 fe =13.9 fo =30 fe=22.2 50 Republican fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90 = 40* 25/90

4. Calculating Test Statistics Continued 4. Calculating Test Statistics = 11.03

5. Determine Degrees of Freedom df = (R-1)(C-1) = (2-1)(3-1) = 2

6. Compare computed test statistic against a tabled/critical value α = 0.05 df = 2 Critical tabled value = 5.991 Test statistic, 11.03, exceeds critical value Null hypothesis is rejected Democrats & Republicans differ significantly in their opinions on gun control issues

SPSS Output for Gun Control Example

Example You want to look at the association between a certain disease and sex of the person. You have collected data as shown in the table below. Complete the table Calculate the expected values Calculate the chi square Calculate the degrees of freedom Comment on the association at 95% level of confidence. Sick Healthy Total Men 50 150 Women 60 940

The results of the clinical trial in which the proportions of patients dying who received either treatment A or B were compared, can be presented in a 2 x 2 table as follows: Treatment Outcome Total Died Survived A 41 216 257 B 64 180 244 105 396 501

Observed (O) Expected (E) O-E (O-E)2/E 41 53.86 - 12.86 3.07 216 203.14 12.86 0.81 64 51.14 3.23 180 192.86 0.86 501 501.00 0.00 7.97

Oral Hygiene among 10-year-olds, by type of school The following data show a sample of 10-year-old children classified according to the state of oral hygiene and type of school attended. Oral Hygiene among 10-year-olds, by type of school   Oral Hygiene Total Type of School Good Fair+ Fair- Poor Below average 62 103 57 11 233 Average 50 36 26 7 119 Above average 80 69 18 2 169 192 208 101 20 521

Please calculate the 2 value. Question In the study of the factors affecting the utilization of antenatal clinics found that 64% of the women lived within 10 km of the clinic came for antenatal care, compared to only 47% of those who lived more than 10 km away. This suggests that antenatal care is used more often by women who live close to the clinics. The complete results are presented below: Utilization of Antenatal Clinic by Women Living Far From and Near the Clinics From the table we determine that there seems to be a difference in utilization of antenatal care between those who live close to and those who live far from the clinic. We want to know whether this observable difference is statistically significant. Please calculate the 2 value. Distance from ANC Used ANC Did not use ANC Total Less than 10 km 51 (64%) 29 (36%) 80 (100%) 10 km or more 35 (47%) 40 (53%) 75 (100%) 86 69 155

Additional Information in SPSS Output Exceptions that might distort χ2 Assumptions Associations in some but not all categories Low expected frequency per cell Extent of association is not same as statistical significance Demonstrated through an example

Another Example Heparin Lock Placement Time: 1 = 72 hrs 2 = 96 hrs from Polit Text: Table 8-1

Hypotheses in Heparin Lock Placement Continued Hypotheses in Heparin Lock Placement Ho: There is no association between complication incidence and length of heparin lock placement. (The variables are independent). Ha: There is an association between complication incidence and length of heparin lock placement. (The variables are related).

Continued More of SPSS Output

Pearson Chi-Square Pearson Chi-Square = .250, p = .617 Since the p > .05, we fail to reject the null hypothesis that the complication rate is unrelated to heparin lock placement time. Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.

Continued More SPSS Output

Phi Coefficient Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship Phi coefficient is the measure of the strength of the association

Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

Smallest of number of rows or columns Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable. Number of cases Smallest of number of rows or columns

How to Test Association between Frequency of Two Nominal Variables Take Home Lesson How to Test Association between Frequency of Two Nominal Variables