Chapter 16: Analysis of Categorical Data
LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along a single dimension. LO2Use the chi-square test of independence to perform contingency analysis. Learning Objectives
In chapter 5 the binomial distribution was used to analyze experiments or trials that had only two possible outcomes An extension of this problem is a multinomial distribution in which more than two possible outcomes can occur The χ 2 goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single dimension. It compares expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what was expected and what was observed. 2 Goodness-of-Fit Test LO1
Hypothesize – Step1: The hypotheses Test – Step 2: The appropriate statistical tests for the problem – Step 3: Set α value – Step 4: Determine the degrees of freedom – Step 5: Determine the expected frequencies – Step 6: Calculate the observed value of chi-square Action – Step 7: Make decision to accept or reject null hypothesis Business Implication Use the information to answer research questions Formulating Test of Hypothesis LO1
When the expected value of a category is small, a large chi- square value can be obtained erroneously, leading to a type I error Control: to control for this potential error, the chi-square goodness of fit test should not be used when any of the expected frequencies is less than 5 If the observed data produce expected values of less than 5, combining adjacent categories (when meaningful) to create larger frequencies may be possible Small Expected Values of a Category LO1
The formula which is used to compute the test statistic for a chi-square goodness-of-fit test is given below. 2 Goodness-of-Fit Test LO1
Milk Sales Data for Demonstration Problem 16.1 MonthLitres of Milk January1,610 February1,585 March1,649 April1,590 May1,540 June1,397 July1,410 August1,350 September1,495 October1,564 November1,602 December1,655 TOTAL18,447 LO1
Hypotheses and Decision Rules for Demonstration Problem 16.1 LO1
Calculations for Demonstration Problem 16.1 Monthf0f0 fefe (f 0 –f e ) 2 / f e January1,6101, February1,5851, March1,6491, April1,5901, May1,5401, June1,3971, July21,4101, August1,3501, September1,4951, October1,5641, November1,6021, December1,6551, Totals18, LO1
The observed chi-square value of is greater than the critical value of The decision is to reject the null hypothesis. The data provides enough evidence to indicate that the distribution of milk sales is not uniform. Calculations for Demonstration Problem 16.1 LO1
Calculations for Demonstration Problem 16.1 LO1
Bank Customer Arrival Data for Demonstration Problem 16.2 Number of Arrivals Observed Frequencies 55 5 LO1
Hypotheses and Decision Rules for Demonstration Problem 16.2 LO1
Calculations for Demonstration Problem 16.2: Estimating the Mean Arrival Rate Mean Arrival Rate Number of Arrivals X Observed Frequencies f f·X 5 LO1
Calculations for Demonstration Problem 16.2: Poisson Probabilities for = 2.3 Number of Arrivals X Expected Probabilities P(X) Expected Frequencies n·P(X) Poisson Probabilities for = 2.3 Poisson Probabilities for = 2.3 LO1
2 Calculations for Demonstration Problem 16.2 Number of Arrivals X Observed Frequencies f Expected Frequencies nP(X) (f o - f e ) 2 f e 5 LO1
The observed chi-square value of 1.74 is less than the critical value of The decision is not to reject the null hypothesis. The data does not provide enough evidence to indicate that the distribution of bank arrivals is Poisson. Calculations for Demonstration Problem 16.2 LO1
Calculations for Demonstration Problem 16.2 LO1
Used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent. 2 Test of Independence Qualitative Variables Nominal Data LO2
2 Test of Independence: Investment Example Where do you reside? A.Large townB. Medium town C. Small town D. Rural area Which type of financial investment are you most likely to make today? E. StocksF. BondsG. Treasury bills Type of financial Investment EFG AO 13 nAnA Geographic BnBnB Region CnCnC DnDnD nEnE nFnF nGnG N Contingency Table LO2
2 Test of Independence: Investment Example Type of Financial Investment EFG Ae 12 nAnA Geographic BnBnB Region CnCnC DnDnD nEnE nFnF nGnG N Contingency Table LO2
2 Test of Independence: Formulas ij ij e nn N where : i= the row j= the columnn the total of row i the total of column j N= the total of all frequencies i j n n 2 2 oe where ff f e : df= (r- 1)(c- 1) r= the number of rows c= the number of columns Expected Frequencies Calculated (Observed ) LO2
2 Test of Independence: Gasoline Preference Versus Income Category LO2
Contingency Table for the Gas Consumer Example LO2
Gasoline Preference Versus Income Category: Expected Frequencies Type of Gasoline Income RegularPremium Extra Premium Less than $30,000(66.15) (24.46) (16.40) $30,000 to $49,999(87.78) (32.46) (21.76) $50,000 to $99,000(45.13) (16.69) (11.19) At least $100,000(38.95) (14.40) (9.65) ij ij e nn e e e N LO2
Gasoline Preference Versus Income Category: 2 Calculation LO2
The observed chi-square value of is greater than the critical value of The decision is to reject the null hypothesis. The data does provide enough evidence to indicate that the type of gasoline preferred is not independent of income. Gasoline Preference Versus Income Category LO2
Gasoline Preference Versus Income Category: 2 Calculation LO2
Gasoline Preference Versus Income Category: Minitab Output LO2
Chi-square tests indicate whether two distributions are the same or are not. They do not tell you in what specific way they are different The chi-square test of independence indicates whether two variables are independent or not. But it does not tell you in which way they are dependent: it does not tell the nature of the relationship between the two variables Chi-square techniques are an outgrowth of the binomial distribution and the inferential techniques for analyzing population proportions Both the chi-square test of independence and the chi-square goodness-of –fit test require that expected values be greater than or equal to 5. If they are not, add adjacent rows or columns until all expected values are five or greater. Important Points of Interests LO2
COPYRIGHT Copyright © 2014 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Requests for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his or her own use only and not for distribution or resale. The author and the publisher assume no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.