Download presentation
Presentation is loading. Please wait.
Published byDamian Greer Modified over 8 years ago
1
Goodness-of-Fit and Contingency Tables Chapter 11
2
2 Is the nurse, Gilbert, a serial killer? This chapter will include methods for analyzing data in tables, such as Table 11-1.
3
1. Overview 2. Goodness-of-fit (Multinomial Experiment) 3. Contingency Tables. Goodness-of-Fit and Contingency Tables
4
Section 11-1 & 11-2 Overview and Multinomial Experiments: Goodness of Fit
5
Overview We focus on analysis of categorical (qualitative or attribute) data that can be separated into different categories (often called cells). M&Ms => the color categories: red, orange, yellow, brown, blue, and green Soft Drink => preference: Coke, Pepsi, 7-up, Fanta, etc. Pain following Surgery => none, mild, moderate, sever.
6
After finding the frequency count for each category, we might proceed to test the claim that the frequencies fit the color distribution claimed by the manufacturer. Our objective is to test the hypothesis that an observed frequency distribution fits some claimed or expected distribution. => A goodness-of-fit test Overview
7
Multinomial Experiments ( 다항분포 ) Use the 2 (chi-square) test statistic (Table A-4). The goodness-of-fit test uses a one-way frequency table (single row or column). The contingency table uses a two-way frequency table (two or more rows and columns). Key Concept
8
A random sample of 150 voters were asked which of 3 universities they preferred : the results were as follows; 하버드, 옥스포드, 건국 One-way frequency table=one-way contingency table ( 분할표, 자료정리표 ) Multinomial Experiment: more than 2 categories. Example Category 건국옥스포드하버드 Frequency615336
9
1. The number of trials is fixed. 2. The trials are independent. 3. All outcomes of each trial must be classified into exactly one of several different categories. 4. The probabilities for the different categories remain constant for each trial. Multinomial Experiment Properties
10
Data set 1: 40 male & 40 female’s weights. When people report their weights, they typically round to a whole number, so reported weights tend to have many last digits consisting of 0. In contrast, if people are actually weighted with a scale having precision to the nearest 0.1 lb, the weights tend to have last digits that are uniformly distributed with 0, 1, 2, …, 9. So how can researchers verify that weights were obtained through actual measurements instead of asking subjects? Example (p.588): Last Digits of Weights
11
1.Fixed trials (last digits): ____ 2.The trials are independent 3.___ different categories. 4.the 10 digits are equally likely occurred. Each possible digit has a probability of ____.
12
12 A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution. Goodness-of-fit Test O: the observed frequency of an outcome. E : the expected frequency of an outcome. k : the number of different categories n : the total number of trials.
13
1.If all expected frequencies are equal: 2.If expected frequencies are not all equal: Expected Frequencies
14
Example: Observed vs. Expected frequency Expected 건국옥스포드하버드 Frequency50 Observed 건국옥스포드하버드 Frequency615336 We compare the observed and expected frequency. If the differences between them are so big that they couldn't be explained by sampling variation, we reject the null hypothesis of no difference.
15
Critical Values 1. Found in Table A- 4 using k – 1 degrees of freedom, where k = number of categories. 2. Goodness-of-fit hypothesis tests are always right-tailed. Test Statistics Goodness-of-fit Test
16
A close agreement between observed and expected values will lead to a small value of 2 and a large P- value: Do not reject H 0. A large disagreement between observed and expected values will lead to a large value of 2 and a small P-value: rejection of H 0. Consequently we reject H 0 if χ 2 > critical value. Goodness-of-fit Test
17
Relationships Among the 2 Test Statistic, P-Value, and Goodness-of-Fit Figure 11-2
18
Test the claim that the digits in Table 11-2 do not occur with the same frequency. H 0 : ________________________ H 1 : At least one of the probabilities is different from the others. = 0.05 k – 1 = 9 2.05, 9 = 16.919 Example 1 (p.588)
20
From Table 11-3, the test statistic is 2 = 11.250. Since the critical value is 16.919, we do not reject the null hypothesis of equal probabilities. Example 1 (p.588)
21
Section 11-3 Contingency Tables: Independence and Homogeneity
22
통계수업을 듣는 학생 120 명에 대한 출신지를 조사. 동일 학생 120 명에게 OO 후보 지지여부를 물었다. Category 서울대전기타지역 Frequency403050 Contingency Table Category 지지반대 Frequency8040 ___________________________________________
23
통계수업을 듣는 학생 120 명에 대한 출신지별 OO 후보의 지지 여부 차이에 대한 조사. Category 지지반대합계 서울 301040 대전 102030 기타지역 401050 합계 8040120 => Contingency Table (Two-way frequency table) Contingency Table
24
We present a method for testing the claim that the row and column variables are independent of each other. We will use the same method for a test of homogeneity, whereby we test the claim that different populations have the same proportion of some characteristics. Independence & Homogeneity
25
A test of independence tests the null hypothesis that there is no association between the row variable and the column variable in a contingency table. For the null hypothesis, we will use the statement that “the row and column variables are independent.” Test of Independence
26
1.The sample data are randomly selected and are represented as frequency counts in a two-way table. 2.H 0 : the row and column variables are independent H 1 : the row and column variables are dependent. 3.For every cell in the contingency table, the expected frequency E is at least 5. Requirements
27
Critical Values 1. Found in Table A-4 using degrees of freedom = (r – 1)(c – 1) O is the observed frequency in a cell r is the number of rows and c is the number of columns 2 = ( O – E ) 2 E Test of Independence
28
E = grand total row total column total grand total E = (row total) (column total) (grand total) Expected Frequency
29
29 Example 2, (p.600) Refer to Table 11-6 and find the expected frequency for the first cell, where the observed frequency is 88. The first cell lies in the first row (with a total frequency of 178) and the first column (with total frequency of 103). The “grand total” is the sum of all frequencies in the table, which is 207. The expected frequency of the first cell is
30
30 Example 2 The first cell has an observed frequency of O = 88. An expected frequency of E = 88.570. We can interpret the expected value by stating that if we assume that getting an infection is independent of the treatment, then we expect to find that 88.570 of the subjects would be given a placebo and would get an infection. There is a discrepancy between O = 88 and E = 88.570.
31
31. Example 3, (p.601) Common colds are typically caused by a rhinovirus. In a test of the effectiveness of echinacea, some test subjects were treated with echinacea extracted with 20% ethanol, some were treated with echinacea extracted with 60% ethanol, and others were given a placebo. All of the test subjects were then exposed to rhinovirus. Use a 0.05 significance level to test the claim that getting an infection (cold) is independent of the treatment group. What does the result indicated about the effectiveness of echinacea as a treatment for colds?
32
32 Example 3, (p.601) Requirements are satisfied: randomly assigned to treatment groups, frequency counts, expected frequencies are all at least 5 H 0 :Getting an infection is independent of the treatment H 1 :Getting an infection and the treatment are dependent
33
33 Example 3, (p.601) Significance level is = 0.05. Contingency table: use 2 distribution The critical value of 2 = 5.991 from Table A-4. The number of degrees of freedom given by (r – 1)(c – 1) = __________________.
34
34. Example 3, (p.601) Because the test statistic does not fall within the critical region, we fail to reject the null hypothesis of independence between getting an infection and treatment. It appears that getting an infection is independent of the treatment group. This suggests that echinacea is not an effective treatment for colds.
35
35. Relationships Among Key Components in Test of Independence Figure 11-6
36
In a test of homogeneity, we test the claim that different populations have the same proportions of some characteristics. Test of Homogeneity
37
Using Table 11-6 with a 0.05 significance level, test the effect of pollster gender on survey responses by men. Example 5, (p. 604)
38
H 0 : The proportions of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women. H 1 : The proportions are different. Using Table 11-6 with a 0.05 significance level, test the effect of pollster gender on survey responses by men. Example 5, (p. 604)
39
Using Table 11-6 with a 0.05 significance level, test the effect of pollster gender on survey responses by men. Example 5, (p. 604)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.