1 Chi-Square Heibatollah Baghi, and Mastee Badii.

1 Chi-Square Heibatollah Baghi, and Mastee Badii

2 Different Scales, Different Measures of Association Scale of Both Variables Measures of Association Nominal ScalePearson Chi-Square: χ 2 Ordinal ScaleSpearman’s rho Interval or Ratio Scale Pearson r

3 Chi-Square (χ 2 ) and Frequency Data Up to this point, the inference to the population has been concerned with “scores” on one or more variables, such as CAT scores, mathematics achievement, and hours spent on the computer. We used these scores to make the inferences about population means. To be sure not all research questions involve score data. Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale. The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.

4 Steps in Test of Hypothesis 1.Determine the appropriate test 2.Establish the level of significance:α 3.Formulate the statistical hypothesis 4.Calculate the test statistic 5.Determine the degree of freedom 6.Compare computed test statistic against a tabled/critical value

5 1. Determine Appropriate Test Chi Square is used when both variables are measured on a nominal scale. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

6 2. Establish Level of Significance α is a predetermined value The convention α =.05 α =.01 α =.001

7 3. Determine The Hypothesis: Whether There is an Association or Not H o : The two variables are independent H a : The two variables are associated

8 4. Calculating Test Statistics Contrasts observed frequencies in each cell of a contingency table with expected frequencies. The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. F e = F r F c / N

9 4. Calculating Test Statistics

10 4. Calculating Test Statistics Observed frequencies Expected frequency

11 5. Determine Degrees of Freedom df = (R-1)(C-1) Number of levels in column variable Number of levels in row variable

12 6. Compare computed test statistic against a tabled/critical value The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable The critical tabled values are based on sampling distributions of the Pearson chi- square statistic If calculated  2 is greater than  2 table value, reject H o

13 Example Suppose a researcher is interested in voting preferences on gun control issues. A questionnaire was developed and sent to a random sample of 90 voters. The researcher also collects information about the political party membership of the sample of 90 respondents.

14 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Democrat10 3050 Republican15 1040 f column 25 40n = 90

15 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Democrat10 3050 Republican15 1040 f column 25 40n = 90 Observed frequencies

16 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Democrat10 3050 Republican15 1040 f column 25 40n = 90 Row frequency

17 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Democrat10 3050 Republican15 1040 f column 25 40n = 90 Column frequency

18 1. Determine Appropriate Test 1.Party Membership ( 2 levels) and Nominal 2.Voting Preference ( 3 levels) and Nominal

19 2. Establish Level of Significance Alpha of.05

20 3. Determine The Hypothesis Ho : There is no difference between D & R in their opinion on gun control issue. Ha : There is an association between responses to the gun control survey and the party membership in the population.

21 4. Calculating Test Statistics FavorNeutralOpposef row Democratf o =10 f e =13.9 f o =10 f e =13.9 f o =30 f e =22.2 50 Republicanf o =15 f e =11.1 f o =15 f e =11.1 f o =10 f e =17.8 40 f column 25 40n = 90

22 4. Calculating Test Statistics FavorNeutralOpposef row Democratf o =10 f e =13.9 f o =10 f e =13.9 f o =30 f e =22.2 50 Republicanf o =15 f e =11.1 f o =15 f e =11.1 f o =10 f e =17.8 40 f column 25 40n = 90 = 50*25/90

23 4. Calculating Test Statistics FavorNeutralOpposef row Democratf o =10 f e =13.9 f o =10 f e =13.9 f o =30 f e =22.2 50 Republicanf o =15 f e =11.1 f o =15 f e =11.1 f o =10 f e =17.8 40 f column 25 40n = 90 = 40* 25/90

24 4. Calculating Test Statistics = 11.03

25 5. Determine Degrees of Freedom df = (R-1)(C-1) = (2-1)(3-1) = 2

26 6. Compare computed test statistic against a tabled/critical value α = 0.05 df = 2 Critical tabled value = 5.991 Test statistic, 11.03, exceeds critical value Null hypothesis is rejected Democrats & Republicans differ significantly in their opinions on gun control issues

27 SPSS Output for Gun Control Example

28 Additional Information in SPSS Output Exceptions that might distort χ 2 Assumptions –Associations in some but not all categories –Low expected frequency per cell Extent of association is not same as statistical significance Demonstrated through an example

29 Another Example Heparin Lock Placement from Polit Text: Table 8-1 Time: 1 = 72 hrs 2 = 96 hrs

30 Hypotheses in Heparin Lock Placement H o :There is no association between complication incidence and length of heparin lock placement. (The variables are independent). H a :There is an association between complication incidence and length of heparin lock placement. (The variables are related).

31 More of SPSS Output

32 Pearson Chi-Square Pearson Chi-Square =.250, p =.617 Since the p >.05, we fail to reject the null hypothesis that the complication rate is unrelated to heparin lock placement time. Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.

33 More SPSS Output

34 Phi Coefficient Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship Phi coefficient is the measure of the strength of the association

35 Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

36 Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable. Number of cases Smallest of number of rows or columns

37 Take Home Lesson How to Test Association between Frequency of Two Nominal Variables

1 Chi-Square Heibatollah Baghi, and Mastee Badii.

Similar presentations

Presentation on theme: "1 Chi-Square Heibatollah Baghi, and Mastee Badii."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chi-Square Heibatollah Baghi, and Mastee Badii.

Similar presentations

Presentation on theme: "1 Chi-Square Heibatollah Baghi, and Mastee Badii."— Presentation transcript:

Similar presentations

About project

Feedback