Statistical Inference Chapter 17 Statistical Inference For Frequency Data I Three Applications of Pearson’s 2 Testing goodness of fit Testing independence Testing equality of proportions
A. Testing Goodness of Fit 1. Statistical hypotheses H0: OPop 1 = EPop 1, . . . , OPop k = EPop k H1: OPop j ≠ EPop j for some j and j 2. Randomization Plan One random sample of n elements Each element is classified in terms of membership in one of k mutually exclusive categories
B. Testing Independence 1. Statistical hypotheses H0: p(A and B) = p(A)p(B) H1: p(A and B) ≠ p(A)p(B) 2. Randomization Plan One random sample of n elements Each element is classified in terms of two variables, denoted by A and B, where each variable has two or more categories.
C. Testing Equality of Proportions 1. Statistical hypotheses H0: p1 = p2 = . . . = pc H1: pj ≠ pj for some j and j 2. Randomization Plan c random samples, where c ≥ 2 For each sample, elements are classified in terms of membership in one of r = 2 mutually exclusive categories
II Testing Goodness of Fit A. Chi-Square Distribution
B. Pearson’s chi-square statistic 1. Oj and Ej denote, respectively, observed and expected frequencies. k denotes the number of categories. 2. Critical value of chi square is with = k – 1 degrees of freedom.
1. Is the distribution of grades for summer-school C. Grade-Distribution Example 1. Is the distribution of grades for summer-school students in a statistics class different from that for the fall and spring semesters? Fall and Spring Summer Grade Proportion Obs. frequency A .12 15 B .23 21 C .47 30 D .13 6 F .05 0 1.00 24
2. The statistical hypotheses are H0: OPop 1 = EPop 1, . . . , OPop 5 = EPop 5 H1: OPop j ≠ EPop j for some j and j 3. Pearson’s chi-square statistic is 4. Critical value of chi square for = .05, k = 5 categories, and = 5 – 1 = 4 degrees of freedom is
Table 1. Computation of Pearson’s Chi-Square for n = 72 Table 1. Computation of Pearson’s Chi-Square for n = 72 Summer-School Students (1) (2) (3) (4) (5) (6) Grade Oj pj npj = Ej Oj – Ej A 15 .12 72(.12) = 8.6 6.4 4.763 B 21 .23 72(.23) =16.6 4.4 1.166 C 30 .47 72(.47) = 33.8 –3.8 0.427 D 6 .13 72(.13) = 9.4 –-3.4 1.230 F 0 .05 72(.05) = 3.6 –3.6 3.600 72 1.00 72.0 0 2 = 11.186* *p < .025
5. Degrees of freedom when e parameters of a theoretical distribution must be estimated is k – 1 – e. D. Practical Significance 1. Cohen’s w observed and and expected proportions in the jth category.
2. Simpler equivalent formula for Cohen’s 3. Cohen’s guidelines for interpreting w 0.1 is a small effect 0.3 is a medium effect 0.5 is a large effect
1. When = 1, Yates’ correction can be applied to E. Yates’ Correction 1. When = 1, Yates’ correction can be applied to make the sampling distribution of the test statistic for Oj – Ej , which is discrete, better approximate the chi-square distribution.
F. Assumptions of the Goodness-of-Fit Test 1. Every observation is assigned to one and only one category. 2. The observations are independent 3. If = 1, every expected frequency should be at least 10. If > 1, every expected frequency should be at least 5.
A. Statistical Hypotheses III Testing Independence A. Statistical Hypotheses H0: p(A and B) = p(A)p(B) H1: p(A and B) ≠ p(A)p(B) B. Chi-Square Statistic for an r c Contingency Table with i = 1, . . . , r Rows and j = 1, . . . , c Columns
C. Computational Example: Is Success on an C. Computational Example: Is Success on an Employment-Test Item Independent of Gender? Observed Expected b1 b2 b1 b2 Fail Pass Fail Pass a1 Man 84 18 102 88.9 13.1 a2 Women 93 8 101 88.1 12.9 177 26 203
p(ai and bj) = p(ai)p(bj) D. Computation of expected frequencies 1. A and B are statistically independent if p(ai and bj) = p(ai)p(bj) 2. Expected frequency, for the cell in row i and column j
Observed Expected b1 b2 b1 b2 a1 84 18 102 88.9 13.1 a2 93 8 101 88.1 12.9 177 26 203
E. Degrees of Freedom for an r c Contingency Table df = k – 1 – e = rc – 1 – [(r – 1) + (c – 1)] = rc – 1 – r + 1 – c + 1 = rc – r – c + 1 = (r – 1)(c – 1) = (2 – 1)(2 – 1) = 1
F. Strength of Association and Practical Significance where s is the smaller of the number of rows and columns.
3. For a contingency table, an alternative formula for is
1. Motivation and education of conscientious G. Three-By-Three Contingency Table 1. Motivation and education of conscientious objectors during WWII High Grade College School School Total Coward 12 25 35 72 Partly Coward 19 23 30 72 Not Coward 71 56 24 151 Total 102 104 89 295
2. Strength of Association, Cramér’s 3. Practical significance
H. Assumptions of the Independence Test 1. Every observation is assigned to one and only one cell of the contingency table. 2. The observations are independent 3. If = 1, every expected frequency should be at least 10. If > 1, every expected frequency should be at least 5.
IV Testing Equality of c ≥ 2 Proportions A. Statistical Hypotheses H0: p1 = p2 = . . . = pc H1: pj ≠ pj for some j and j 1. Computational example: three samples of n = 100 residents of nursing homes were surveyed. Variable A was age heterogeneity in the home; variable B was resident satisfaction.
Table 2. Nursing Home Data Age Heterogeneity Low b1 Medium b2 High b3 Satisfied a1 O = 56 O = 58 O = 38 E = 50.67 E = 50.67 E = 50.67 Not Satisfied a2 O = 44 O = 42 O = 52 E = 49.33 E = 49.33 E = 49.33
B. Assumptions of the Equality of Proportions Test 1. Every observation is assigned to one and only one cell of the contingency table.
C. Test of Homogeneity of Proportions 2. The observations are independent 3. If = 1, every expected frequency should be at least 10. If > 1, every expected frequency should be at least 5. C. Test of Homogeneity of Proportions 1. Extension of the test of equality of proportions when variable A has r > 2 rows
2. Statistical hypotheses for columns j and j'