Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-Square (Association between categorical variables)

Similar presentations


Presentation on theme: "Chi-Square (Association between categorical variables)"— Presentation transcript:

1 Chi-Square (Association between categorical variables)
Benjamin Kamala

2 Different Scales, Different Measures of Association
Scale of Both Variables Measures of Association Nominal Scale Pearson Chi-Square: χ2 Ordinal Scale Spearman’s rho Interval or Ratio Scale Pearson r

3 Chi-Square (χ2) and Frequency Data
Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale. The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.

4 Steps in Test of Hypothesis
Determine the appropriate test Establish the level of significance:α Formulate the statistical hypothesis Calculate the test statistic Determine the degree of freedom Compare computed test statistic against a tabled/critical value

5 1. Determine Appropriate Test
Chi Square is used when both variables are measured on a nominal scale. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

6 2. Establish Level of Significance
α is a predetermined value The convention α = .05 α = .01 α = .001

7 3. Determine The Hypothesis: Whether There is an Association or Not
Ho : The two variables are independent Ha : The two variables are associated

8 When using chi-square in , there is some vocabulary we must know:
Hypothesis = a proposed explanation of an observed phenomenon Observed results = what you can observe during the course of an experiment Expected results = what you expect to see based on your hypothesis (predictions)

9 4. Calculating Test Statistics
Contrasts observed frequencies in each cell of a contingency table with expected frequencies. The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N Mean difference between pairs of values

10 The formula includes: Our final formula: X2 = chi-square
(o - e) = observed minus expected [sometimes you may see this represented with a d which means the difference between the expected and observed results] e = expected results o = observed results and = sum of Our final formula:

11 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Mean difference between pairs of values

12 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Observed frequencies Expected frequency Mean difference between pairs of values Expected frequency

13 5. Determine Degrees of Freedom
df = (R-1)(C-1) Number of levels in column variable Number of levels in row variable

14 6. Compare computed test statistic against a tabled/critical value
The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable The critical tabled values are based on sampling distributions of the Pearson chi-square statistic If calculated 2 is greater than 2 table value, reject Ho

15 Example Suppose a researcher is interested in voting preferences on gun control issues. A questionnaire was developed and sent to a random sample of 90 voters. The researcher also collects information about the political party membership of the sample of 90 respondents.

16 Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90

17 Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90 Observed frequencies

18 Bivariate Frequency Table or Contingency Table
Row frequency Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90

19 Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row D 10 30 50 R 15 40 f column 25 n = 90 Column frequency

20 1. Determine Appropriate Test
Party Membership ( 2 levels) and Nominal Voting Preference ( 3 levels) and Nominal

21 2. Establish Level of Significance
Alpha of .05

22 3. Determine The Hypothesis
Ho : There is no difference between D & R in their opinion on gun control issue. Ha : There is an association between responses to the gun control survey and the party membership in the population.

23 4. Calculating Test Statistics
Favor Neutral Oppose f row D fo =10 fe =13.9 fo =30 fe=22.2 50 R fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90

24 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Favor Neutral Oppose f row Democrat fo =10 fe =13.9 fo =30 fe=22.2 50 Republican fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90 = 50*25/90

25 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Favor Neutral Oppose f row Democrat fo =10 fe =13.9 fo =30 fe=22.2 50 Republican fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90 = 40* 25/90

26 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics =

27 5. Determine Degrees of Freedom
df = (R-1)(C-1) = (2-1)(3-1) = 2

28 6. Compare computed test statistic against a tabled/critical value
α = 0.05 df = 2 Critical tabled value = 5.991 Test statistic, 11.03, exceeds critical value Null hypothesis is rejected Democrats & Republicans differ significantly in their opinions on gun control issues

29 SPSS Output for Gun Control Example

30 Example You want to look at the association between a certain disease and sex of the person. You have collected data as shown in the table below. Complete the table Calculate the expected values Calculate the chi square Calculate the degrees of freedom Comment on the association at 95% level of confidence. Sick Healthy Total Men 50 150 Women 60 940

31 The results of the clinical trial in which the proportions of patients dying who received either treatment A or B were compared, can be presented in a 2 x 2 table as follows: Treatment Outcome Total Died Survived A 41 216 257 B 64 180 244 105 396 501

32 Observed (O) Expected (E) O-E (O-E)2/E 41 53.86 3.07 216 203.14 12.86 0.81 64 51.14 3.23 180 192.86 0.86 501 501.00 0.00 7.97

33 Oral Hygiene among 10-year-olds, by type of school
The following data show a sample of 10-year-old children classified according to the state of oral hygiene and type of school attended. Oral Hygiene among 10-year-olds, by type of school Oral Hygiene Total Type of School Good Fair+ Fair- Poor Below average 62 103 57 11 233 Average 50 36 26 7 119 Above average 80 69 18 2 169 192 208 101 20 521

34 Please calculate the 2 value.
Question In the study of the factors affecting the utilization of antenatal clinics found that 64% of the women lived within 10 km of the clinic came for antenatal care, compared to only 47% of those who lived more than 10 km away. This suggests that antenatal care is used more often by women who live close to the clinics. The complete results are presented below: Utilization of Antenatal Clinic by Women Living Far From and Near the Clinics From the table we determine that there seems to be a difference in utilization of antenatal care between those who live close to and those who live far from the clinic. We want to know whether this observable difference is statistically significant. Please calculate the 2 value. Distance from ANC Used ANC Did not use ANC Total Less than 10 km 51 (64%) 29 (36%) 80 (100%) 10 km or more 35 (47%) 40 (53%) 75 (100%) 86 69 155

35 Additional Information in SPSS Output
Exceptions that might distort χ2 Assumptions Associations in some but not all categories Low expected frequency per cell Extent of association is not same as statistical significance Demonstrated through an example

36 Another Example Heparin Lock Placement
Time: 1 = 72 hrs 2 = 96 hrs from Polit Text: Table 8-1

37 Hypotheses in Heparin Lock Placement
Continued Hypotheses in Heparin Lock Placement Ho: There is no association between complication incidence and length of heparin lock placement. (The variables are independent). Ha: There is an association between complication incidence and length of heparin lock placement. (The variables are related).

38 Continued More of SPSS Output

39 Pearson Chi-Square Pearson Chi-Square = .250, p = .617
Since the p > .05, we fail to reject the null hypothesis that the complication rate is unrelated to heparin lock placement time. Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.

40 Continued More SPSS Output

41 Phi Coefficient Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship Phi coefficient is the measure of the strength of the association

42 Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

43 Smallest of number of rows or columns
Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable. Number of cases Smallest of number of rows or columns

44 How to Test Association between Frequency of Two Nominal Variables
Take Home Lesson How to Test Association between Frequency of Two Nominal Variables


Download ppt "Chi-Square (Association between categorical variables)"

Similar presentations


Ads by Google