Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 12 Tests with Qualitative Data

Similar presentations


Presentation on theme: "Chapter 12 Tests with Qualitative Data"β€” Presentation transcript:

1 Chapter 12 Tests with Qualitative Data

2 Testing Goodness of Fit
Section 12.1

3 Objectives Find critical values of the chi-square distribution
Perform goodness-of-fit tests

4 Find critical values of the chi-square distribution
Objective 1 Find critical values of the chi-square distribution

5 The Chi-Square Distribution
Hypothesis tests for qualitative data, also called categorical data, are based on the chi-square distribution. The test statistic used for these tests is called the chi-square statistic, denoted πœ’ 2 . There are actually many different chi-square distributions, each with a different number of degrees of freedom. The figure below shows several examples of chi-square distributions. Recall that the chi-square distributions are skewed to the right, and the values of the Ο‡2 statistic are always greater than or equal to 0. They are never negative.

6 Example Find the 𝛼 = 0.05 critical value for the chi-square distribution with 12 degrees of freedom. Solution: The critical value is It is found at the intersection of the row corresponding to 12 degrees of freedom and the column corresponding to 𝛼 = 0.05.

7 Perform goodness-of-fit tests
Objective 2 Perform goodness-of-fit tests

8 Goodness-of-Fit Tests
Imagine that a gambler wants to test a die to determine whether it is fair. The roll of a die has six possible outcomes: 1, 2, 3, 4, 5, and 6; and the die is fair if each of these outcomes is equally likely. The gambler rolls the die 60 times, and counts the number of times each number comes up. The results are presented below. These counts are called the observed frequencies. The null hypothesis for this test says that the die is fair; in other words, it says that each of the six outcomes has probability 1 6 of occurring. Let 𝑝 1 be the probability of rolling a 1, 𝑝 2 be the probability of rolling a 2, and so on. Then the null hypothesis is 𝐻 0 : 𝑝 1 = 𝑝 2 = 𝑝 3 = 𝑝 4 = 𝑝 5 = 𝑝 6 = 1 6 The alternate hypothesis says that the roll of a die does not follow the distribution specified by 𝐻 0 ; in other words, it states that not all of the 𝑝 𝑖 are equal to 1 6 . Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8

9 Expected Frequencies To test the gambler’s null hypothesis, 𝐻 0 , we begin by computing expected frequencies. The expected frequencies are the mean counts that would occur if 𝐻 0 were true. If the probabilities specified by 𝐻 0 are 𝑝 1 , 𝑝 2 , ..., and the total number of trials is 𝑛, the expected frequencies are 𝐸 1 = 𝑛 𝑝 1 , 𝐸 2 = 𝑛 𝑝 2 , and so on

10 Example Compute the expected frequencies for the gambler rolling a die. Solution: The probabilities specified by 𝐻 0 : 𝑝 1 = 𝑝 2 = 𝑝 3 = 𝑝 4 = 𝑝 5 = 𝑝 6 = The total number of trials is 𝑛 = 60. Therefore, the expected frequencies are 𝐸 1 = 𝐸 2 = 𝐸 3 = 𝐸 4 = 𝐸 5 = 𝐸 6 = (60) 1 6 = 10 If a die is tossed 60 and the outcomes on the die is equally likely, we would expect each outcome to occur 10 times. The table below shows the outcomes, observed frequencies and expected frequencies when a die is tossed 60 times by the gambler. If 𝐻 0 is true, the observed and expected frequencies should be fairly close. The larger the differences are between the observed and expected frequencies, the stronger the evidence is against 𝐻 0 . Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8 Expected 10

11 The Chi-Square Statistic
The statistic that measures how large the differences are between the observed and expected frequencies is called the chi-square statistic. When 𝐻 0 is true, the chi-square statistic has approximately a chi-square distribution, provided that all the expected frequencies are 5 or more. Let k be the number of categories, let 𝑂 1 , …, 𝑂 π‘˜ be the observed frequencies, and let 𝐸 1 , …, 𝐸 π‘˜ be the expected frequencies. The chi-square statistic is πœ’ 2 = π‘‚βˆ’πΈ 2 𝐸

12 Performing a Goodness-of-Fit Test
Step 1: State the null and alternate hypotheses. Step 2: Compute the expected frequencies and check to be sure that all of them are 5 or more. If they are, then proceed. Step 3: Choose a significance level Ξ±. Step 4: Compute the test statistic: πœ’ 2 = π‘‚βˆ’πΈ 2 𝐸 Step 5: Find the critical value from Table A.4, using k βˆ’ 1 degrees of freedom, where k is the number of categories. If πœ’ 2 is greater than or equal to the critical value, reject 𝐻 0 . Otherwise, do not reject 𝐻 0 . Step 6: State a conclusion.

13 Example Using the observed frequencies for the die example, can you conclude at the 𝛼 = 0.05 level that the die is not fair? Solution: The null and alternate hypotheses are: 𝐻 0 : 𝑝 1 = 𝑝 2 = 𝑝 3 = 𝑝 4 = 𝑝 5 = 𝑝 6 = 1 6 𝐻 1 : Some or all of the 𝑝 𝑖 differ from 1 6 Next, we compute the expected frequencies: Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8 Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8 Expected 10

14 Example Solution: We compute the value of the test statistic. The table shows the calculations. The test statistic is πœ’ 2 = π‘‚βˆ’πΈ 2 𝐸 =9.4. There are six categories, so there are 6 βˆ’ 1 = 5 degrees of freedom. From Table A.4, we find that the 𝛼 = 0.05 critical value for 5 degrees of freedom is The value of the test statistic is πœ’ 2 = 9.4. Because 9.4 < , we do not reject 𝐻 0 . There is not enough evidence to conclude that the die is not fair. Outcome 1 2 3 4 5 6 Observed (𝑂) 12 7 14 15 8 Expected (𝐸) 10 π‘‚βˆ’πΈ –3 –6 –2 π‘‚βˆ’πΈ 2 9 16 25 36 π‘‚βˆ’πΈ 2 𝐸 0.4 0.9 1.6 2.5 3.6

15 Hypothesis Testing on the TI-84 PLUS
The Ο‡2-GOF-Test command will perform a goodness of fit hypothesis test. This command is accessed by pressing STAT and highlighting the TESTS menu. Enter the location of the observed frequencies in the Observed field and the location of the expected frequencies in the Expected field. Enter the number of degrees of freedom in the df field.

16 Example (TI-84 PLUS) Using the observed frequencies for the die example, can you conclude at the 𝛼 = 0.05 level that the die is not fair? Solution: The null and alternate hypotheses are: 𝐻 0 : 𝑝 1 = 𝑝 2 = 𝑝 3 = 𝑝 4 = 𝑝 5 = 𝑝 6 = 1 6 𝐻 1 : Some or all of the 𝑝 𝑖 differ from 1 6 Next, we compute the expected frequencies: Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8 Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8 Expected 10

17 Example (TI-84 PLUS) Solution: We enter the observed frequencies in L1 and the expected frequencies in L2. Press STAT and highlight the TESTS menu and Ο‡2-GOF-Test . Enter L1 in the Observed field and L2 in the Expected field. Since there are 5 degrees of freedom, enter 5 in the df field. Select Calculate. The P-value is Because P > 0.05, we do not reject 𝐻 0 at the 𝛼 = 0.05 level. There is not enough evidence to conclude that the die is not fair. Outcome 1 2 3 4 5 6 Observed 12 7 14 15 8 Expected 10

18 You Should Know… How to find critical values of the chi-square distribution How to perform goodness-of-fit tests

19 Tests for Independence and Homogeneity
Section 12.2

20 Objectives Interpret contingency tables Perform tests of independence
Perform tests of homogeneity

21 Interpret contingency tables
Objective 1 Interpret contingency tables

22 Contingency Tables Do some college majors require more studying than others? The table below shows the results from the 2009 National Survey of Student Engagement survey where a sample of 1000 college freshmen were asked their major and the average number of hours per week spent studying. The table is called a contingency table. Hours Studying Per Week Major Humanities Social Science Business Engineering 0–10 68 106 131 40 11–20 119 103 127 81 More than 20 70 52 51

23 Contingency Tables A contingency table relates two qualitative variables. One of the variables, called the row variable, has one category for each row of the table. The other variable, called the column variable, has one category for each column of the table. The intersection of a row and a column is called a cell. The total number of individuals in the table, 1000, is called the grand total. Hours Studying Per Week Major Humanities Social Science Business Engineering 0–10 68 106 131 40 11–20 119 103 127 81 More than 20 70 52 51

24 Perform tests of independence
Objective 2 Perform tests of independence

25 Dependent versus Independent
We are interested in determining whether the distribution of one variable differs, depending on the value of the other variable. If so, the variables are dependent. If the distribution of one variable is the same for all values of the other variable, the variables are independent. In the survey about major and hours spent studying, if these variables are independent, then the distribution of hours studied will be the same for all majors, and the distribution of majors will be the same for all categories of hours studied. Hours Studying Per Week Major Humanities Social Science Business Engineering 0–10 68 106 131 40 11–20 119 103 127 81 More than 20 70 52 51

26 Performing a Test of Independence
If we are to conduct a test to determine if the two variables in the previous survey about student majors and hours spent studying are independent, the null and alternate hypotheses are 𝐻 0 : Hours studying and major are independent 𝐻 1 : Hours studying and major are not independent We will use the chi-square statistic to test the null hypothesis that hours studying and major are independent. If we reject 𝐻 0 , we will conclude that the variables are dependent.

27 Expected Frequency The null hypothesis states that hours studying and major are independent. By using the multiplication rule for independent events, we can compute the expected frequencies. For example, 𝑃(Studying 0 – 10 hours and Humanities) = 𝑃(Studying 0 – 10 hours) βˆ™ 𝑃(Humanities major) = βˆ™ We obtain the expected frequency for those who study 0 – 10 hours and are humanities majors by multiplying this probability by the grand total. Expected frequency = 1000βˆ™ βˆ™ = 345βˆ™ = Hours Studying Per Week Major Row Total Humanities Social Science Business Engineering 0–10 68 106 131 40 345 11–20 119 103 127 81 430 More than 20 70 52 51 225 Column Total 257 261 309 173 1000

28 Expected Frequency From the previous example and applying some arithmetic, we can see that each expected frequency, 𝐸, can be obtained by the following: The expected frequency for a cell represents the number of individuals we would expect to find in that cell under the assumption that the two variables are independent. If the differences between the observed and expected frequencies tend to be large, we will reject the null hypothesis of independence. 𝐸= π‘…π‘œπ‘€ π‘‘π‘œπ‘‘π‘Žπ‘™ βˆ™πΆπ‘œπ‘™π‘’π‘šπ‘› π‘‘π‘œπ‘‘π‘Žπ‘™ πΊπ‘Ÿπ‘Žπ‘›π‘‘ π‘‘π‘œπ‘‘π‘Žπ‘™

29 Performing a Test of Independence
Step 1: State the null and alternate hypotheses. Step 2: Compute the row and column totals. Step 3: Compute the expected frequencies. 𝐸= π‘Ÿπ‘œπ‘€ π‘‘π‘œπ‘‘π‘Žπ‘™ βˆ™π‘π‘œπ‘™π‘’π‘šπ‘› π‘‘π‘œπ‘‘π‘Žπ‘™ πΊπ‘Ÿπ‘Žπ‘›π‘‘ π‘‘π‘œπ‘‘π‘Žπ‘™ Check to be sure that all the expected frequencies are at least 5 Step 4: Choose a level of significance 𝛼, and compute the test statistic πœ’ 2 = π‘‚βˆ’πΈ 2 𝐸 Step 5: Find the critical value from Table A.4, using (r – 1)(c – 1) degrees of freedom, where r is the number of rows and c is the number of columns. If πœ’ 2 is greater than or equal to the critical value, reject 𝐻 0 . Otherwise do not reject 𝐻 0 . Step 6: State a conclusion.

30 Example Perform a test of the null hypothesis that major and hours studying are independent. Use 𝛼 = Solution: The null and alternate hypotheses are: 𝐻 0 : Hours studying and major are independent 𝐻 1 : Hours studying and major are not independent Hours Studying Per Week Major Humanities Social Science Business Engineering 0–10 68 106 131 40 11–20 119 103 127 81 More than 20 70 52 51

31 Example Solution (continued): To compute the expected frequencies, we first compute the row totals, column totals, and the grand total. Hours Studying Per Week Major Row Total Humanities Social Science Business Engineering 0–10 68 106 131 40 345 11–20 119 103 127 81 430 More than 20 70 52 51 225 Column Total 257 261 309 173 1000

32 Example The expected frequency is 𝐸= 430βˆ™309 1000 = 132.87
Solution (continued): We compute the expected frequencies. As an example, we compute the expected frequency for the cell corresponding to business major, studying 11–20 hours. The row total is 430, the column total is 309, and the grand total is The expected frequency is 𝐸= 430βˆ™ = The expected frequencies are given on the next slide. Hours Studying Per Week Major Row Total Humanities Social Science Business Engineering 0–10 345 11–20 430 More than 20 225 Column Total 257 261 309 173 1000 The expected frequency is 𝐸= 430βˆ™ =

33 Example Solution (continued): The expected frequencies are given as: All of the expected frequencies are at least 5, so we may proceed. Hours Studying Per Week Major Humanities Social Science Business Engineering 0–10 88.665 90.045 59.685 11–20 74.390 More than 20 57.825 58.725 69.525 38.925

34 Example The following table shows the observed and expected frequencies. To compute the test statistic, we use the observed frequencies and the expected frequencies. πœ’ 2 = 68βˆ’ …+ 52βˆ’ =34.638 Hours Studying Per Week Major Humanities Social Science Business Engineering 0–10 68 (88.665) 106 (90.045) 131 ( ) 40 (59.685) 11–20 119 ( ) 103 ( ) 127 ( ) 81 (74.390) More than 20 70 (57.825) 52 (58.725) 51 (69.525) 52 (38.925)

35 Example There are r = 3 rows and c = 4 columns, so the number of degrees of freedom is (3 βˆ’ 1)(4 βˆ’ 1) = 6. From Table A.4, we find that the critical value corresponding to 6 degrees of freedom and Ξ± = 0.01 is The value of the test statistic is πœ’ 2 = Because > , we reject 𝐻 0 . We conclude that the choice of major and the number of hours spent studying are not independent. The numbers of hours that students study varies among majors. Remember πœ’ 2 =34.638

36 Hypothesis Testing on the TI-84 PLUS
The 𝝌 𝟐 -Test will perform a test for independence. This command is accessed by pressing STAT and highlighting the TESTS menu. This procedure requires that the observed frequencies be entered into a matrix.

37 Example (TI-84 PLUS) Perform a test of the null hypothesis that major and hours studying are independent. Use 𝛼 = Solution: We begin by entering the observed values into matrix [A]. We press STAT and highlight the TESTS menu and select 𝝌 𝟐 -Test . Enter [A] in the Observed field. Select Calculate. The P-value is which is less than the level of significance 𝛼=0.01. Therefore, we reject the null hypothesis and conclude that the choice of major and the number of hours spent studying are not independent. The numbers of hours that students study varies among majors.

38 Perform tests of homogeneity
Objective 3 Perform tests of homogeneity

39 Test of Homogeneity In the contingency table we have seen so far, the individuals in the table were sampled from a single population. For each individual, the values of both the row and column variables were random. In some cases, values of one of the variables (say, the row variable) are assigned by the investigator, and are not random. In these cases, we consider the rows as representing separate populations, and we are interested in testing the hypothesis that the distribution of the column variable is the same for each row. This is known as a test of homogeneity.

40 Test of Homogeneity The drugs telmisartan and ramipril are designed to reduce high blood pressure. In a clinical trial to compare the effectiveness of these drugs in preventing heart attacks, 25,620 patients were divided into three groups. One group took one telmisartan tablet each day, another took one ramipril tablet each day, and the third group took one tablet of each drug each day. The patients were followed for 56 months, and the numbers who suffered fatal and nonfatal heart attacks were counted. In this experiment, the patients were assigned a row category depending on which drug was administered, so only the column variable is random. Fatal Heart Attack Nonfatal Heart Attack No Heart Attack Telmisartan only 598 431 7513 Ramipril only 603 400 7573 Both drugs 620 424 7458

41 Perform a Test of Homogeneity
We are interested in performing a test of homogeneity, to test the hypothesis that the distribution of outcomes is the same for each row. The method for performing a test of homogeneity is identical to the method for performing a test of independence.

42 Example Test the hypothesis that the distribution of outcomes is the same for all three treatment groups in the example about the drugs telmisartan and rampiral. Use the Ξ± = 0.05 level. Solution: The null hypothesis says that the distribution of outcomes (fatal heart attack, nonfatal heart attack, no heart attack) is the same for all the drug treatments. The alternate hypothesis says that the distribution of outcomes is not the same for all the treatments. Fatal Heart Attack Nonfatal Heart Attack No Heart Attack Telmisartan only 598 431 7513 Ramipril only 603 400 7573 Both drugs 620 424 7458

43 Solution We use the row and column totals to compute the expected frequencies. The expected frequencies are: All of the expected frequencies are at least 5, so we may proceed. Fatal Heart Attack Nonfatal Heart Attack No Heart Attack Row Total Telmisartan only 598 431 7513 8542 Ramipril only 603 400 7573 8576 Both drugs 620 424 7458 8502 Column Total 1821 1255 22,554 25,620 Fatal Heart Attack Nonfatal Heart Attack No Heart Attack Telmisartan only 607.14 418.43 Ramipril only 609.56 420.10 Both drugs 604.30 416.47

44 Solution The following table shows the observed and expected frequencies. The test statistic is πœ’ 2 = 598βˆ’ …+ 8502βˆ’ = There are r = 3 rows and c = 3 columns, so the number of degrees of freedom is (3 βˆ’ 1)(3 βˆ’ 1) = 4. From Table A.4, we find that the critical value corresponding to 4 degrees of freedom and Ξ± = 0.05 is Since < 9.488, we do not reject 𝐻 0 . There is not enough evidence to conclude that the distribution of outcomes is different for different drug treatments. Fatal Heart Attack Nonfatal Heart Attack No Heart Attack Telmisartan only 598 (607.14) 431 (418.43) 7513 ( ) Ramipril only 603 (609.56) 400 (420.10) 7573 ( ) Both drugs 620 (604.30) 424 (416.47) 7458 ( )

45 You Should Know… How to interpret contingency tables
How to perform tests of independence How to perform tests of homogeneity


Download ppt "Chapter 12 Tests with Qualitative Data"

Similar presentations


Ads by Google